NIX Solutions: Llama 4 Models Push AI Boundaries

Meta Platforms has introduced the Llama 4 family of open-source AI models, featuring Scout, Maverick, and Behemoth. These models support multimodal interaction, allowing them to process not only text but also images, videos, and other formats. They were trained on “large amounts of unlabeled text, image, and video data” to provide “broad visual understanding.”

The release comes amid growing competition, especially from Chinese company DeepSeek, whose AI models are reportedly performing on par or better than previous generations of Llama algorithms. In response, Meta is accelerating its efforts. According to reports, company employees are working to understand how DeepSeek managed to lower development and deployment costs for AI models like R1 and V3.

NIX Solutions

Meta’s Llama 4 Scout model features 17 billion active parameters, 16 “experts,” and 109 billion total parameters. It supports a 10 million-token context window and, according to Meta, outperforms Gemma 3, Gemini 2.0 Flash-Lite, and Mistral 3.1 in handling various tasks.

Powerful AI With Multimodal Capabilities

The Llama 4 Maverick model contains 17 billion active parameters and 128 “experts,” amounting to a total of 400 billion parameters. Meta claims it performs better than GPT-4o and Gemini 2.0 Flash in benchmarks and shows similar reasoning and coding abilities as DeepSeek V3. Scout can run on a single Nvidia H100 GPU, while Maverick requires an Nvidia H100 DGX system or an equivalent setup.

Llama 4 Behemoth is the most advanced of the three, with 288 billion active parameters and 16 “experts,” totaling around 2 trillion parameters, notes NIX Solutions. This model reportedly outperforms GPT-4.5, Claude Sonnet 3.7, and Gemini 2.0 Pro in various tests. However, Behemoth is still undergoing training and is not yet publicly accessible. For now, Scout and Maverick are available on Llama.com and Hugging Face.

Meta’s AI assistant—used in WhatsApp, Messenger, and Instagram—has been upgraded to Llama 4 in 40 countries. The multimodal functionality currently supports English only and is limited to users in the US.

In its blog post, Meta highlighted the efficiency of Llama 4’s “mixture of experts” (MoE) architecture, which allows tasks to be broken down and handled by smaller, specialized models. According to the company, “The Llama 4 models mark the beginning of a new era for the Llama ecosystem. This is just the beginning for the Llama 4 family.” Yet we’ll keep you updated as more integrations become available.

It’s important to note that none of the Llama 4 models qualify as “reasoning” models, unlike OpenAI’s GPT-o1 or GPT-o3-mini. While reasoning models check their outputs for reliability, they take longer to generate responses compared to traditional models.