New study reveals that ChatGPT provides incorrect programming answers 52% of the time.

June 7, 2024

Can we trust AI for coding assistance? 🖥️🤔

One of the major advantages of using generative AI chatbots like Copilot, Gemini, and ChatGPT is their ability to save time by providing faster responses to coding and programming questions compared to human responses. However, a recent report indicates that ChatGPT frequently gives incorrect answers to programming queries.

In a study presented in May at the Computer-Human Interaction Conference (reported by Gizmodo), researchers from Purdue University examined 517 Stack Overflow questions answered by ChatGPT.

The research team found that ChatGPT’s responses were incorrect 52 percent of the time.

Although some errors from an AI chatbot might be expected, the Purdue team noted that human programmers still favored ChatGPT’s answers 35 percent of the time because of their thoroughness and articulate language. Alarmingly, human programmers failed to identify ChatGPT’s mistakes 39 percent of the time.

This study highlights that generative AI bots are still prone to frequent errors, and humans may not always recognize these mistakes.

Meanwhile, Google’s AI Overviews, introduced in the US for Google Search in early May, have also been producing odd and error-laden summaries for some search queries. Google has attempted to justify these errors in statements to news outlets like Gizmodo saying :

The examples we’ve seen are generally very uncommon queries, and aren’t representative of most people’s experiences. The vast majority of AI Overviews provide high quality information,with links to dig deeper on the web.

The statement mentioned that Google would utilize these “isolated examples” to help “improve our systems.”