This Week in AI: Musk Launches xAI, Anthropic Takes Claude 2 Public, Did OpenAI Nerf GPT-4?
The artificial intelligence space has been characteristically busy this week. Elon Musk is officially throwing his hat into the AI ring, Anthropic has released a more powerful version of its Claude chatbot, and OpenAI is under an FTC investigation for data privacy concerns while rumors abound that the company has redesigned GPT-4, lowering its performance.
Elon Musk Officially Launches an AI Startup
Tesla and SpaceX CEO Elon Musk debuted his new AI company dubbed xAI. The company’s goal is to “understand the true nature of the universe,” its website says. There will presumably be more information about the new AI firm during a live event on Twitter Spaces scheduled for Friday.
The xAI site also has team members listed and says they are alumni of DeepMind, OpenAI, Google Research, Microsoft Research, Twitter, and Tesla and have worked on projects including DeepMind’s AlphaCode and OpenAI’s GPT-3.5 and GPT-4 models.
A news report from April suggested Musk was launching an AI startup after he incorporated xAI in Nevada on March 9 and had secured “thousands” of Nvidia GPUs. Musk previously shared details of plans for an AI tool called TruthGPT during an interview on Fox News in April where he claimed companies like OpenAI are creating politically correct systems, positioning his AI as an anti-woke alternative.
Executive Director of the Center for AI Safety Dan Hendrycks has been tapped as an advisor for Musk’s new startup. The Center for AI Safety authored a May letter urging prompt global action in mitigating the risk of extinction from AI that many AI ethicists saw as a distraction from current problems algorithmic bias is causing for marginalized groups.
Anthropic Takes Claude 2 to the Masses
There’s a new chatbot available to test out: Anthropic’s Claude 2. The company released a blog post Tuesday announcing this new model, boasting its improved performance and longer responses, as well as API access and a new public-facing beta website, claude.ai.
Anthropic has also increased the length of Claude’s input and output. Users can now input up to 100,000 tokens in each prompt, which adds up to hundreds of pages, the company claims. Claude can now write longer responses up to a few thousand tokens, as well.
The company said it has made improvements from previous modes on coding, math, and reasoning. Claude 2 scored 71.2% on the CodexHumanEval python coding test, compared to the first Claude’s score of 56%. On GSM8k, a large set of grade-school math problems, Claude 2 scored 88.0% up from 85.2%.
Safety improvements were also a consideration when training Claude 2, as the company noted it has been iterating the model to be more harmless and more difficult to prompt for offensive or dangerous output. “We have an internal red-teaming evaluation that scores our models on a large representative set of harmful prompts, using an automated test while we also regularly check the results manually. In this evaluation, Claude 2 was 2x better at giving harmless responses compared to Claude 1.3,” the company wrote.
Anthropic said it will be rolling out a roadmap of capability improvements for Claude 2 to be deployed over the coming months.
Did OpenAI Nerf GPT-4? Rumors Swirl Amid FTC Investigation
Rumors have been swirling around the internet that OpenAI has nerfed the performance of GPT-4, its largest and most capable model available to the public. Users on Twitter and the OpenAI developer forum were calling the model “lazier” and “dumber” after it appeared to be giving faster but less accurate answers compared to the slower but more precise responses it initially gave.
An Insider report says the industry insiders are questioning whether OpenAI has redesigned its GPT-4 model. Some have said the company could be creating a group of smaller GPT-4 models that could act as one model and be less expensive to run. This approach is called a Mixture of Experts, or MOE, where smaller expert models are trained on specific tasks and subject areas. When asked a question, GPT-4 would know which model to query and might send a query to more than one of these expert models and mash up the results. OpenAI did not respond to Insider’s request for comment on this matter.
Whether or not GPT-4 is actually “dumber,” OpenAI is also in the news this week due to a new investigation opened by the Federal Trade Commission.
The FTC is looking into whether ChatGPT has harmed consumers through its collection of data and publication of false information on individuals. The agency sent a 20-page letter to OpenAI this week with dozens of questions about how the startup trains its models and how it governs personal data.
The letter detailed how the FTC is examining whether OpenAI “engaged in unfair or deceptive privacy or data security practices or engaged in unfair or deceptive practices relating to risks of harm to consumers.”
“It is very disappointing to see the FTC's request start with a leak and does not help build trust,” wrote OpenAI CEO Sam Altman in a tweet. “That said, it’s super important to us that our technology is safe and pro-consumer, and we are confident we follow the law. Of course we will work with the FTC.”
Altman went on to say the company built GPT-4 on top of years of safety research and spent six months aligning it before release. “We protect user privacy and design our systems to learn about the world, not private individuals,” he said.