OpenAI and other leading tech firms are exploring new approaches to advance AI as existing methods reach their limits.
Nov 11 (Reuters) – AI companies, including OpenAI, are working to tackle unforeseen delays and obstacles in building larger language models by creating training methods that enable algorithms to “think” in ways more similar to human reasoning.
A dozen AI scientists, researchers, and investors informed Reuters that they think these techniques, which support OpenAI’s newly released GPT-4 model, could transform the AI race and impact the kinds of resources AI companies increasingly require, including energy and specialized chips.
OpenAI refused to comment on this story. Since the release of the viral ChatGPT chatbot two years ago, tech companies—whose valuations have soared due to the AI boom—have publicly asserted that “scaling up” current models with additional data and computing power will continually enhance AI performance.
However, some leading AI scientists are now voicing concerns about the limitations of the “bigger is better” approach.
Ilya Sutskever, co-founder of AI labs Safe Superintelligence (SSI) and OpenAI, recently told Reuters that the outcomes from scaling up pre-training—a phase where an AI model uses massive amounts of unlabeled data to learn language patterns and structures—have reached a plateau.
Sutskever, known as an early proponent of advancing generative AI by leveraging more data and computing power in pre-training—a strategy that ultimately led to the development of ChatGPT—left OpenAI earlier this year to establish SSI.
“The 2010s were all about scaling, but now we’ve returned to an era of exploration and discovery,” Sutskever remarked. “Finding the right focus for scaling is more crucial than ever.”
Sutskever declined to elaborate further but noted that SSI is pursuing an alternative to scaling up pre-training. Meanwhile, researchers at major AI labs have faced setbacks and underwhelming results in their efforts to develop a large language model that surpasses OpenAI’s nearly two-year-old GPT-4, according to three sources familiar with private developments.
“Training runs” for large models can cost tens of millions of dollars, as they require hundreds of chips running simultaneously. These complex systems have a higher risk of hardware-related failures, and researchers often won’t know the model’s final performance until the run completes, which can take several months.
Another challenge is that large language models consume massive amounts of data, and AI models have already used up most of the readily available data. Power shortages have also disrupted training runs, as the process demands significant energy.
To address these challenges, researchers are investigating “test-time compute,” a technique that improves existing AI models during the “inference” phase, when the model is actively used. For instance, rather than selecting a single answer right away, a model could generate and assess multiple options in real time, ultimately choosing the best solution. This approach enables models to allocate more processing power to difficult tasks like math or coding problems, or complex operations that require human-like reasoning and decision-making.
“It turned out that having a bot think for just 20 seconds in a poker hand produced the same performance boost as scaling up the model by 100,000 times and training it for 100,000 times longer,” said Noam Brown, an OpenAI researcher who worked on GPT-4, at the TED AI conference in San Francisco last month.
OpenAI has adopted this technique in their newly launched model, “o1,” previously called Q* and Strawberry, as first reported by Reuters in July. The O1 model can “think” through problems step-by-step, much like human reasoning. It also incorporates data and feedback curated by PhDs and industry experts. The key innovation of the o1 series is additional training on top of ‘base’ models like GPT-4, and the company plans to expand this technique to larger and more advanced base models.
Meanwhile, researchers at leading AI labs such as Anthropic, xAI, and Google DeepMind have also been working on developing their own versions of the technique, according to five sources familiar with the efforts.
“We see plenty of easy opportunities to quickly improve these models,” said Kevin Weil, chief product officer at OpenAI, during a tech conference in October. “By the time others catch up, we’ll aim to be three steps ahead.”
Google and xAI did not respond to requests for comment, and Anthropic had no immediate statement. The potential implications could shift the competitive landscape for AI hardware, which has been largely driven by the massive demand for Nvidia’s AI chips. Leading venture capital firms, such as Sequoia and Andreessen Horowitz, which have invested billions into the costly development of AI models across several labs including OpenAI and xAI, are closely watching this transition and considering its impact on their significant investments.
“This shift will transition us from large pre-training clusters to inference clouds, which are distributed, cloud-based servers used for inference,” Sonya Huang, a partner at Sequoia Capital, told Reuters.
The demand for Nvidia’s cutting-edge AI chips has played a major role in its rise to become the world’s most valuable company, surpassing Apple in October. While Nvidia dominates the training chip market, it may face increased competition in the inference market. When asked about the potential impact on demand for its products, Nvidia highlighted recent company presentations emphasizing the technique behind the o1 model. CEO Jensen Huang has discussed the growing demand for using Nvidia’s chips for inference.
“We’ve now identified a second scaling law, one that applies during inference… All these factors have caused the demand for Blackwell to be exceptionally high,” Huang said last month at a conference in India, referring to the company’s latest AI chip.