Google has surprised the tech world by launching its highly-anticipated AI model called Gemini. This model is said to rival OpenAI’s advanced GPT-4 model, and it has been released alongside the December Feature Drop for Google Pixel phones. Gemini is a multimodal AI model, meaning that it can seamlessly work with text, images, audio, video, and code. It is available in three sizes, namely Ultra, Pro, and Nano, allowing it to run on both phones and data centers. Google claims that Gemini is its most capable and advanced model yet.
Comparing Gemini to GPT-4, Google has revealed that Gemini Ultra outperforms OpenAI’s model in seven out of eight text-focused tests. These tests include reasoning, math, and coding abilities. In addition, Gemini has surpassed human experts in MMLU tasks, which combine multiple subjects to test world knowledge and problem-solving abilities. Google also conducted multimodal benchmarks, where Gemini came out on top in all 10 image, video, and audio benchmarks. However, these results will need to be independently verified by researchers.
Google believes that Gemini’s multimodal design is responsible for its impressive performance. By natively working with both images and text, Gemini doesn’t require the assistance of OCR systems to read text from images and documents. Google also trained Gemini on different modalities right from the beginning, unlike the conventional approach of stitching separate modes together after training.
Addressing concerns about safety and responsibility in AI, Google emphasizes that Gemini is built with these factors at its core. The company has implemented safety classifiers to prevent violence and stereotypes and ensure factual correctness. However, the real effectiveness of these measures will only be determined through real-world testing.
In terms of multilingual performance, it is unclear how well Gemini performs in languages other than English. Some reports have suggested concerns about poor multilingual capabilities. Currently, Gemini is only available in English, aligning with the report’s claim. However, Google may expand Gemini’s language support in the future.
Surprisingly, Google has already started rolling out Gemini. The Pro version of the model is being integrated into Google’s ChatGPT competitor, Bard. This version is specifically tuned for advanced reasoning, planning, and understanding. Google plans to introduce Bard Advanced, giving users access to their most advanced models and capabilities. The availability and pricing of the Advanced version are yet to be confirmed. Gemini is also being integrated into the Google Pixel 8 Pro as part of the December Feature Drop. The Nano variant of Gemini will power features like Summarize in the exclusive Recorder app and a developer preview of Smart Replies in Gboard. Initially, Smart Replies will be available in WhatsApp before expanding to other communication apps.
Google has further announced its plan to make Gemini available for more products, including Search, Ads, Chrome, and Duet AI. The company has already begun testing Gemini in its Search Generative Experience (SGE) and has achieved a 40% reduction in latency for English in the US. Android developers can also sign up for a preview of Gemini Nano, allowing them to integrate the AI into their own applications. This preview is part of Google’s AICore app and will be available on more devices in the coming months. The machine learning capabilities of processors made by Qualcomm, Samsung S.LSI, MediaTek, and Google itself will power the app.
While Gemini Pro and Nano are already available, Google is still optimizing the Ultra version of the AI. This version requires further safety testing, including input from industry partners. Google plans to open up Gemini Ultra to select partners and aims to roll it out to developers and enterprise customers early next year. With Gemini’s release, Google seems determined to compete head-on with OpenAI’s GPT-4, offering an advanced and multimodal AI model that has already shown promising results in various benchmarks.