Google’s AI Overviews Give Strange and Incorrect Summaries
According to MIT Technology Review, Google promised that “Google will do the googling for you” regarding its new AI Overviews, which provide brief, AI-generated summaries highlighting key information and links on top of search results. So far, it’s unreliable.
Following the release of the AI Overviews, there were many examples of strange or inaccurate responses. Suggestions included adding glue to pizza, eating a minimum of one small rock daily, and that U.S. president Andrew Johnson earned his university degrees between 1947 and 2012, although he died in 1875. Does that constitute beyond lifelong learning?
Liz Reid, head of Google Search, announced Google was making technical improvements, making AI Overviews less likely to generate incorrect answers. It is also limiting satirical, humorous, and user-generated content in responses, since such material could result in misleading advice.
What went wrong?
AI Overviews’ new generative AI model in Gemini is customized for Google Search and integrated with Google’s core web ranking systems. LLMs often simply predict the next word (or token) in a sequence, which makes them prone to making things up. That leads to hallucinations. AI Overviews uses an AI technique called retrieval-augmented generation (RAG), allowing an LLM to check specific sources outside of the data it’s trained on. It checks a user’s query against the documents that comprise the system’s information sources, generating a response. Since the system can match the original query to specific parts of web pages, it can cite where it drew its answer from, which normal LLMs cannot do.
This generates more up-to-date and factually accurate responses. The technique is often used to try to prevent LLMs from hallucinating. The downside of RAG is that for an LLM to come up with a good answer, it must retrieve the information correctly and generate the response correctly. A failure of one or both means a bad answer.
When a RAG system comes across conflicting information, it cannot work out which one to use and may combine information from both, creating a misleading answer.
According to the Google spokesperson, in many cases, when AI Overviews returns incorrect answers, it’s because there’s not a lot of high-quality information available on the web to show for the query—or because the query most closely matches satirical sites or joke posts.
Google added a label to AI Overviews answers reading “Generative AI is experimental,” but it should consider making it much more apparent that the feature is in beta and not ready to provide fully reliable answers.