OpenAI’s latest ChatGPT is pushing the boundaries of AI with improved reasoning capabilities. Here are my thoughts on this new leap in tech!

September 19, 2024

In this week’s newsletter: OpenAI claims its ‘o1’ model can reason, a major hurdle for groundbreaking AI. But is it really that straightforward?

We’re nearing two years since the generative AI revolution began with OpenAI’s release of ChatGPT in November 2022. The journey has been a blend of successes and challenges.

OpenAI recently revealed it surpassed 200 million weekly active users—a significant milestone, though it reached 100 million in just two months after launch. A YouGov study shows that including AI in a product is just as likely to deter buyers as it is to attract them.

Despite concerns, investment in AI continues to surge, with OpenAI seeking new funding that could value the company at $150 billion—on par with giants like Cisco, Shell, and McDonald’s. Last week, it introduced its latest model, “o1,” which is being hailed as a major leap in generative AI development.

The o1 model, formerly known as Strawberry, is built to reason like humans when making decisions. Though it’s a bit slower and smaller than previous versions, it’s considered more of a GPT-4.5 upgrade rather than the anticipated GPT-5, which is still in development.

An Impossible Mission?

On the surface, o1 might seem underwhelming, but it addresses what Alex once called the “Tom Cruise problem” in this newsletter. Previously, ChatGPT struggled with related questions—like identifying Tom Cruise’s mother (Mary Lee Pfeiffer) and then correctly linking her son (Tom Cruise). The o1 model fixes this issue, improving its ability to handle such queries smoothly.

Ask o1 that same pair of questions, and it nails them. It even shows its reasoning process, which OpenAI cleverly—but misleadingly—calls “thoughts,” since AI doesn’t actually think. (For more on why anthropomorphizing AI is problematic, see my February article.) When asked the second question, o1 “thought” for four seconds, mapping out the family connections and verifying the details.

So far, so good. OpenAI claims o1 can reason, though not everyone agrees with such a bold statement. For marketing’s sake, let’s give them that. If true, it marks a major shift in generative AI—moving from merely repeating facts or giving likely user-pleasing responses to actually considering information and delivering thoughtful answers.

The term “could” is crucial here. We still don’t fully understand how these systems operate—developers included. OpenAI claims that the ability to reason is a significant advancement, even going so far as to dub o1 their most dangerous model yet (though this might be more marketing hype than reality). While many agree that o1 shows improvement in reasoning, they are less convinced about the model being particularly dangerous.

Don’t focus on what’s happening behind the scenes!

Sort of. The probing has its limits. To delve into the thought process behind o1—Simon Willison is a reliable source for a thorough introduction—users have been seeking more details on its “thought” process. Currently, users are only given a brief summary of each step in the chain of thought.

As a result, users have been asking the model directly about how it generates its answers. However, they’ve also received warnings from OpenAI via email, threatening account suspension if they continue this practice.

This leaves us somewhat in the dark. It appears to be a significant advancement in AI, potentially transforming the tool from something you view with skepticism to an essential resource.

What’s particularly noteworthy is that OpenAI’s dominance has overshadowed coverage of other competitors recently. Last week, Mistral, the prominent French competitor, launched its first multimodal model, the Pixtral 12B, which integrates image recognition with text generation. It deserved significant praise, but OpenAI and o1 have dominated the spotlight.

Nevertheless, the progress in AI continues, and it’s beginning to fulfill its potential. The real question is whether those who initially found ChatGPT lacking will be convinced to return and try these newer, advanced models.