Thousands of enterprises already use Llama models on the Databricks Data Intelligence Platform to power AI applications, agents, and workflows. Today, we’re excited to partner with Meta to bring you their latest model series—Llama 4—available today in many Databricks workspaces and rolling out across AWS, Azure, and GCP.
Llama 4 marks a major leap forward in open, multimodal AI—delivering industry-leading performance, higher quality, larger context windows, and improved cost efficiency from the Mixture of Experts (MoE) architecture. All of this is accessible through the same unified REST API, SDK, and SQL interfaces, making it easy to use alongside all your models in a secure, fully governed environment.
Llama 4 is higher quality, faster, and more efficient
The Llama 4 models raise the bar for open foundation models—delivering significantly higher quality and faster inference compared to any previous Llama model.
At launch, we’re introducing Llama 4 Maverick, the largest and highest-quality model from today’s release from Meta. Maverick is purpose-built for developers building sophisticated AI products—combining multilingual fluency, precise image understanding, and safe assistant behavior. It enables:
- Enterprise agents that reason and respond safely across tools and workflows
- Document understanding systems that extract structured data from PDFs, scans, and forms
- Multilingual support agents that respond with cultural fluency and high-quality answers
- Creative assistants for drafting stories, marketing copy, or personalized content
And you can now build all of this with significantly better performance. Compared to Llama 3.3 (70B), Maverick delivers:
- Higher output quality across standard benchmarks
- >40% faster inference, thanks to its Mixture of Experts (MoE) architecture, which activates only a subset of model weights per token for smarter, more efficient compute.
- Longer context windows (will support up to 1 million tokens), enabling longer conversations, bigger documents, and deeper context.
- Support for 12 languages (up from 8 in Llama 3.3)
Coming soon to Databricks is Llama 4 Scout—a compact, best-in-class multimodal model that fuses text, image, and video from the start. With up to 10 million tokens of context, Scout is built for advanced long-form reasoning, summarization, and visual understanding.
“With Databricks, we could automate tedious manual tasks by using LLMs to process one million+ files daily for extracting transaction and entity data from property records. We exceeded our accuracy goals by fine-tuning Meta Llama and, using Mosaic AI Model Serving, we scaled this operation massively without the need to manage a large and expensive GPU fleet.”
— Prabhu Narsina, VP Data and AI, First American
Build Domain-Specific AI Agents with Llama 4 and Mosaic AI
Connect Llama 4 to Your Enterprise Data
Connect Llama 4 to your enterprise data using Unity Catalog-governed tools to build context-aware agents. Retrieve unstructured content, call external APIs, or run custom logic to power copilots, RAG pipelines, and workflow automation. Mosaic AI makes it easy to iterate, evaluate, and improve these agents with built-in monitoring and collaboration tools—from prototype to production.
Run Scalable Inference with Your Data Pipelines
Apply Llama 4 at scale—summarizing documents, classifying support tickets, or analyzing thousands of reports—without needing to manage any infrastructure. Batch inference is deeply integrated with Databricks workflows, so you can use SQL or Python in your existing pipeline to run LLMs like Llama 4 directly on governed data with minimal overhead.
Customize for Accuracy and Alignment
Customize Llama 4 to better fit your use case—whether it’s summarization, assistant behavior, or brand tone. Use labeled datasets or adapt models using techniques like Test-Time Adaptive Optimization (TAO) for faster iteration without annotation overhead. Reach out to your Databricks account team for early access.
“With Databricks, we were able to quickly fine-tune and securely deploy Llama models to build multiple GenAI use cases like a conversation simulator for counselor training and a phase classifier for maintaining response quality. These innovations have improved our real-time crisis interventions, helping us scale faster and provide critical mental health support to those in crisis.”
— Matthew Vanderzee, CTO, Crisis Text Line
Govern AI Usage with Mosaic AI Gateway
Ensure safe, compliant model usage with Mosaic AI Gateway, which adds built-in logging, rate limiting, PII detection, and policy guardrails—so teams can scale Llama 4 securely like any other model on Databricks.
What’s Coming Next
We’re launching Llama 4 in phases, starting with Maverick on Azure, AWS, and GCP. Coming soon:
- Llama 4 Scout – Ideal for long-context reasoning with up to 10M tokens
- Higher scale Batch Inference – Run batch jobs today, with higher throughput support coming soon
- Multimodal Support – Native vision capabilities are on the way
As we expand support, you’ll be able to pick the best Llama model for your workload—whether it’s ultra-long context, high-throughput jobs, or unified text-and-vision understanding.
Get Ready for Llama 4 on Databricks
Llama 4 will be rolling out to your Databricks workspaces over the next few days.