Saturday, April 19, 2025
spot_imgspot_img

Top 5 This Week

spot_img

Related Posts

Meta’s Llama 4 Large Language Models now available on Snowflake Cortex AI


At Snowflake, we are committed to providing our customers with industry-leading LLMs. We’re pleased to bring Meta’s latest Llama 4 models to Snowflake Cortex AI! 

Llama 4 models deliver performant inference so customers can build enterprise-grade generative AI applications and deliver personalized experiences. The Llama 4 Maverick and Llama 4 Scout models can be accessed within the secure Snowflake perimeter on Cortex AI. According to Meta, Llama 4 Scout is the best multimodal model in the world in its class and supports an industry-leading context window of up to 10M tokens. According to Meta, these models are trained with large amounts of unlabeled text, image and video data for rich end-user experiences. These models are designed for native multimodality, incorporating early fusion to seamlessly integrate text and vision tokens into a unified model backbone. This design accommodates a range of use cases and developer needs. This allows developers to build enterprise-grade AI applications. 

Faster and high-quality inference with a Mixture of Experts Architecture (MoE)

Llama 4 are the first models from Meta to use a MoE architecture — a single token activates only a fraction of the total parameters. As a result, MoE architectures are more compute efficient for both model training and inference and deliver higher quality inference compared to other architectures. Within Snowflake, Llama 4 Maverick and Llama 4 Scout can be integrated with gen AI applications.

  • Llama 4 Maverick offers industry-leading performance in image and text understanding with support for 12 languages to bridge language barriers. As a general-purpose LLM, Llama 4 Maverick contains 17 billion active parameters (400 billion total parameters), offering high-quality inference compared to Llama 3.3 70B. The model is well suited for precise image understanding and creative writing. It provides state-of-the-art intelligence with high speed, optimized for best response quality on tone, and refusals.

  • Llama 4 Scout is a smaller general-purpose model with 17 billion active parameters (109 billion total parameters) and supports an industry-leading context window size of 10 million tokens. This opens up a world of possibilities, including multi-document summarization, parsing extensive user activity for personalized tasks, and reasoning over vast codebases. 

Snowflake’s commitment to open source

Meta’s open-source Llama models have empowered enterprises to create unique AI experiences. At Snowflake, we’re leveraging these models within Cortex AI to build tailored solutions that meet evolving business needs. Customers can use Llama models to power AI agents that handle complex tasks and integrate with tools like Cortex Analyst and Cortex Search – unlocking the full value of their data on a single platform.

Our AI Research team has been actively developing cutting-edge technologies on top of these Llama models. For example, Arctic Ulysses is a novel technology we developed that’s optimized for low-latency and high-throughput inference, and is beneficial for long sequence tasks. Furthermore, SwiftKV, another recent innovation built upon Meta’s Llama models and available in Snowflake-Llama-3.3-70B and Snowflake-Llama-3.1-405B, achieves a reduction in the inference costs of Llama LLMs by up to 75% on Cortex AI compared to the baseline Meta Llama models in Cortex AI that are not SwiftKV optimized. This directly translates to tangible cost savings and improved performance for our customers, driving scalable deployment of generative AI initiatives. By optimizing the prefill stage of inference, SwiftKV ensures the efficient processing of lengthy input prompts, a critical requirement for many enterprise applications.

Integrated access via SQL and Python

The Llama 4 series now available in preview on Cortex AI offer easy access through established SQL functions and standard REST API endpoints. Customers can use Llama 4’s advanced inference capabilities into existing applications and data pipelines without complex integration procedures. The new Llama 4 models can be called using a simple COMPLETE function within Cortex AI. 

Integrated access via REST API

To enable services or applications running outside of Snowflake to make low-latency inference calls to Cortex AI, the REST API interface is the way to go. Here is an example of what that looks like:

The trusted path to advanced inference capabilities

Snowflake is the only cloud data platform with native integration to premier models from both OpenAI and Anthropic, as well as others. By integrating Llama 4 into Snowflake Cortex AI, we are providing our customers with access to leading-edge AI models so they can build intelligent applications and data agents, all within the security, governance and unified environment of Snowflake. This powerful combination will enable enterprises to automate repetitive tasks, gain deeper insights from their data, and deliver more value to their customers.

Stay tuned for more updates on how you can start building the next generation of AI applications with Llama 4 on Snowflake Cortex AI.

Learn more

  • Join us at Summit 2025 to learn more about our latest AI innovations.

  • Get the guide to industry-leading AI and data use cases — download now.

  • Read more about Meta’s latest announcements here.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Popular Articles

cczz. com