Meta’s Llama 4 Large Language Models now available on Snowflake Cortex AI

At Snowflake, we are committed to providing our customers with industry-leading LLMs. We’re pleased to bring Meta’s latest Llama 4 models to Snowflake Cortex AI!

Llama 4 models deliver performant inference so customers can build enterprise-grade generative AI applications and deliver personalized experiences. The Llama 4 Maverick and Llama 4 Scout models can be accessed within the secure Snowflake perimeter on Cortex AI. According to Meta, Llama 4 Scout is the best multimodal model in the world in its class and supports an industry-leading context window of up to 10M tokens. According to Meta, these models are trained with large amounts of unlabeled text, image and video data for rich end-user experiences. These models are designed for native multimodality, incorporating early fusion to seamlessly integrate text and vision tokens into a unified model backbone. This design accommodates a range of use cases and developer needs. This allows developers to build enterprise-grade AI applications.

Faster and high-quality inference with a Mixture of Experts Architecture (MoE)

Llama 4 are the first models from Meta to use a MoE architecture — a single token activates only a fraction of the total parameters. As a result, MoE architectures are more compute efficient for both model training and inference and deliver higher quality inference compared to other architectures. Within Snowflake, Llama 4 Maverick and Llama 4 Scout can be integrated with gen AI applications.

Llama 4 Maverick offers industry-leading performance in image and text understanding with support for 12 languages to bridge language barriers. As a general-purpose LLM, Llama 4 Maverick contains 17 billion active parameters (400 billion total parameters), offering high-quality inference compared to Llama 3.3 70B. The model is well suited for precise image understanding and creative writing. It provides state-of-the-art intelligence with high speed, optimized for best response quality on tone, and refusals.
Llama 4 Scout is a smaller general-purpose model with 17 billion active parameters (109 billion total parameters) and supports an industry-leading context window size of 10 million tokens. This opens up a world of possibilities, including multi-document summarization, parsing extensive user activity for personalized tasks, and reasoning over vast codebases.

Snowflake’s commitment to open source

Meta’s open-source Llama models have empowered enterprises to create unique AI experiences. At Snowflake, we’re leveraging these models within Cortex AI to build tailored solutions that meet evolving business needs. Customers can use Llama models to power AI agents that handle complex tasks and integrate with tools like Cortex Analyst and Cortex Search – unlocking the full value of their data on a single platform.

Our AI Research team has been actively developing cutting-edge technologies on top of these Llama models. For example, Arctic Ulysses is a novel technology we developed that’s optimized for low-latency and high-throughput inference, and is beneficial for long sequence tasks. Furthermore, SwiftKV, another recent innovation built upon Meta’s Llama models and available in Snowflake-Llama-3.3-70B and Snowflake-Llama-3.1-405B, achieves a reduction in the inference costs of Llama LLMs by up to 75% on Cortex AI compared to the baseline Meta Llama models in Cortex AI that are not SwiftKV optimized. This directly translates to tangible cost savings and improved performance for our customers, driving scalable deployment of generative AI initiatives. By optimizing the prefill stage of inference, SwiftKV ensures the efficient processing of lengthy input prompts, a critical requirement for many enterprise applications.

Integrated access via SQL and Python

The Llama 4 series now available in preview on Cortex AI offer easy access through established SQL functions and standard REST API endpoints. Customers can use Llama 4’s advanced inference capabilities into existing applications and data pipelines without complex integration procedures. The new Llama 4 models can be called using a simple COMPLETE function within Cortex AI.

Integrated access via REST API

To enable services or applications running outside of Snowflake to make low-latency inference calls to Cortex AI, the REST API interface is the way to go. Here is an example of what that looks like:

The trusted path to advanced inference capabilities

Snowflake is the only cloud data platform with native integration to premier models from both OpenAI and Anthropic, as well as others. By integrating Llama 4 into Snowflake Cortex AI, we are providing our customers with access to leading-edge AI models so they can build intelligent applications and data agents, all within the security, governance and unified environment of Snowflake. This powerful combination will enable enterprises to automate repetitive tasks, gain deeper insights from their data, and deliver more value to their customers.

Stay tuned for more updates on how you can start building the next generation of AI applications with Llama 4 on Snowflake Cortex AI.

Learn more

Join us at Summit 2025 to learn more about our latest AI innovations.
Get the guide to industry-leading AI and data use cases — download now.
Read more about Meta’s latest announcements here.

Top 5 This Week

Data Analytics Is Revolutionizing Medical Credentialing

OpenAI pursued Cursor maker before entering into talks to buy Windsurf for $3B

Access PyPI Packages in Snowpark via UDFs and Stored Procedures

Achieve top 3 priorities faster

Foundation EGI Launches Engineering Platform

Related Posts

Access PyPI Packages in Snowpark via UDFs and Stored Procedures

How Public Sector & Healthcare Leaders Use Data and AI

Simplifying Multimodal Data Analysis with Snowflake Cortex AI

Early Gen AI Adopters See 41% ROI

The Future of Data Management Is Agentic AI

Snowflake Achieves DOD IL5 for Secure Government and Defense Data

Meta’s Llama 4 Large Language Models now available on Snowflake Cortex AI

Faster and high-quality inference with a Mixture of Experts Architecture (MoE)

Snowflake’s commitment to open source

Integrated access via SQL and Python

Integrated access via REST API

The trusted path to advanced inference capabilities

Learn more

LEAVE A REPLY Cancel reply

Popular Articles

Data Analytics Is Revolutionizing Medical Credentialing

OpenAI pursued Cursor maker before entering into talks to buy Windsurf for $3B

Access PyPI Packages in Snowpark via UDFs and Stored Procedures

Achieve top 3 priorities faster

Foundation EGI Launches Engineering Platform

Rzolve

About us

Latest Articles

Data Analytics Is Revolutionizing Medical Credentialing

OpenAI pursued Cursor maker before entering into talks to buy Windsurf for $3B

Access PyPI Packages in Snowpark via UDFs and Stored Procedures

Most Popular

Data Analytics Is Revolutionizing Medical Credentialing

OpenAI pursued Cursor maker before entering into talks to buy Windsurf for $3B

Access PyPI Packages in Snowpark via UDFs and Stored Procedures

Subscribe