Recent studies have revealed significant shortcomings in the work of Google’s flagship generative artificial intelligence models, Gemini 1.5 Pro and 1.5 Flash, as reported by TechCrunch. Google has repeatedly highlighted Gemini’s ability to process massive amounts of data thanks to its large context window, claiming its models can analyze documents that are hundreds of pages long and search for information in video recordings. However, two independent studies have shown that in practice, these models cope much worse with such tasks.
Scientists from UMass Amherst, the Allen Institute for AI, and Princeton University tested Gemini’s ability to answer questions about the content of fiction books. The book submitted for the test contained approximately 260,000 words (about 520 pages). The results were disappointing. Gemini 1.5 Pro answered correctly only 46.7% of the time, and Gemini 1.5 Flash only 20% of the time. Further averaging the results showed that none of the models could achieve accuracy in answering questions above chance. Marzena Karpinska, co-author of the study, noted: “Although models such as Gemini 1.5 Pro can technically handle long contexts, we have seen many cases indicating that the models do not actually understand the content.”
Challenges in Analyzing Multimedia Data
The second study, conducted by scientists at the University of California, Santa Barbara, focused on Gemini 1.5 Flash’s ability to analyze video content, specifically image slides. The results were also unsatisfactory – out of 25 images, the AI gave correct answers only in half of the cases, and with an increase in the number of pictures, the accuracy of the answers dropped to 30%, which casts doubt on the effectiveness of the model in working with multimedia data. We’ll keep you updated on further developments.
However, it is noted that none of the studies went through the peer review process, and moreover, the latest versions of the models with a context of 2 million tokens were not tested. Nonetheless, the findings raise serious questions about the real capabilities of generative AI models in general, and the validity of the tech giants’ marketing claims.
This research comes amid growing skepticism about generative AI, notes NIXsolutions. Recent surveys by the international consulting company Boston Consulting Group showed that about half of the senior executives surveyed do not expect a significant increase in productivity from using generative AI and are concerned about possible errors and data security problems. Experts are calling for the development of more objective criteria for assessing the capabilities of AI and for greater attention and independent criticism. Google has not yet commented on the results of these studies.