Skip to content

Gemma

Free

Google's open-weight model family released under Apache 2.0 with multimodal Gemma 4 flagships

What is Gemma?

Gemma is Google DeepMind's family of open-weight language models designed to give developers and researchers access to the same research lineage behind the flagship Gemini models, packaged as models you can download, fine-tune, and deploy however you want. Unlike Gemini itself (which remains closed-weight and API-only), Gemma is released under the Apache 2.0 license — one of the most permissive open-source licenses in existence — which means you can use Gemma commercially, modify it, redistribute it, and self-host it without paying Google or asking permission. The Gemma 4 generation released in 2026 is multimodal (text + vision + audio) and comes in sizes ranging from a 1B mobile-friendly checkpoint up through a 31B instruction-tuned flagship that competes with Llama 3.3 70B and Mistral Large on open benchmarks. Gemma 3 lives on as a mature, stable option for production workloads that prioritize reliability over frontier performance. Where Gemma really differentiates is in its optimization for on-device and edge deployment: Google ships official Gemma packages for MediaPipe, Keras, JAX, PyTorch, and Hugging Face Transformers, plus a CodeGemma variant tuned for programming tasks and RecurrentGemma for efficient long-context inference. If you want a free, safety-conscious open model with Google-grade pretraining data and a genuinely permissive license, Gemma is the obvious pick alongside Llama and Mistral.

⚡ Quick Verdict

Best for

Teams that want an Apache-licensed open model with Google-grade training and strong on-device deployment options

Not ideal for

Users who want the absolute largest open model or the biggest community ecosystem

Starting price

Free to download · Vertex AI API from $0.14 input / $0.40 output per million tokens

Free plan

Yes — all weights are free under Apache 2.0

Key strength

Apache 2.0 licensing plus Google-quality pretraining — the most commercially clean open model family

Limitation

Smaller ecosystem and flagship size than Llama

Bottom line: Gemma scores 4.3/5 — the top pick when you need an Apache-licensed open model or efficient on-device deployment. Go Gemma 4 31B-IT for server, Gemma 1B/2B for mobile.

Pricing

Weights — Free: All Gemma models are released under the Apache 2.0 license and can be downloaded from Hugging Face, Kaggle, or ai.google.dev/gemma at no cost. Commercial use, modification, and redistribution are all permitted with standard Apache 2.0 attribution.

Google Cloud Vertex AI (if using managed API): Gemma 4 31B-IT from roughly $0.14 per million input tokens and $0.40 per million output tokens. Smaller Gemma models are proportionally cheaper.

Third-party inference: Groq, Together AI, and Fireworks serve Gemma models at competitive per-token rates, typically $0.05-$0.50 per million tokens depending on size. Self-hosting is free beyond hardware costs — Gemma 1B and 2B run on consumer laptops.

Key Features

  • Apache 2.0 license — unrestricted commercial use
  • Gemma 4 multimodal (text, vision, audio) flagship family
  • Sizes from 1B (mobile) up to 31B (server-class)
  • CodeGemma variant tuned for programming tasks
  • RecurrentGemma for efficient long-context inference
  • Official Keras, JAX, PyTorch, and Transformers support
  • MediaPipe integration for on-device mobile deployment
  • Strong safety tuning and Responsible AI Toolkit

Pros & Cons

Pros

  • Apache 2.0 license is genuinely permissive — easier than Llama
  • Excellent on-device story with MediaPipe and 1B/2B sizes
  • Backed by Google DeepMind research and pretraining quality
  • Multimodal Gemma 4 competes with much larger closed models

Cons

  • Ecosystem smaller than Llama — fewer fine-tunes and tools
  • Flagship Gemma 4 31B is smaller than Llama 3.3 70B
  • Less community-driven support than Mistral or Qwen
✅ Pricing verified April 2026 · ✅ Independently reviewed · ✅ Scoring methodology

FAQ

Is Gemma really Apache 2.0?

Yes. Unlike Llama's Community License, Gemma is released under the standard Apache 2.0 license with no monthly-active-user threshold, no attribution requirement beyond normal Apache notices, and no restrictions on the types of applications you can build. This makes Gemma arguably the cleanest open-weight LLM family for commercial use in 2026 — there is no hidden scale clause to worry about.

How does Gemma 4 compare to Llama 4?

Llama 4 Maverick and Scout are larger (17B active / 109B-400B total) and score higher on most reasoning benchmarks. Gemma 4 31B is smaller but punches above its weight on multimodal and coding benchmarks, and its Apache 2.0 license is more permissive than Llama's community license. For most production workloads, Llama still wins on raw capability, but Gemma is often preferred for on-device and strict-licensing deployments.

Can I run Gemma on my laptop?

Absolutely. Gemma 1B and 2B run comfortably on a modern MacBook or any machine with 8-16GB of RAM via Ollama, LM Studio, or llama.cpp. Gemma 4 9B works well with a consumer GPU like an RTX 4070 or Mac M3 with 24GB unified memory. The 31B flagship needs server-class GPUs (A100/H100 or 2x consumer cards with quantization).

What is CodeGemma?

CodeGemma is a specialized Gemma variant fine-tuned for programming tasks — code completion, infilling, and instruction following for coding. It comes in 2B and 7B sizes and is optimized for low-latency code assistance scenarios like IDE plugins and inline completion. CodeGemma competes with StarCoder2 and DeepSeek Coder in the open-weight coding model category.

Where can I use Gemma via API?

Google Cloud Vertex AI hosts Gemma with official Google SLAs. Third-party providers including Groq, Together AI, Fireworks, and OpenRouter all serve Gemma at pay-per-token rates. You can also run it locally with Ollama or Hugging Face Transformers for free.

Does Gemma support function calling?

Gemma 4 supports tool use and structured JSON output, though the ecosystem is less mature than Llama or GPT-4 function calling. For production agent workflows, you may need to use a wrapper library or rely on prompt-level JSON coercion. Google continues to improve tool-use training in newer Gemma releases.

Is Gemma safe for sensitive data?

When self-hosted, Gemma keeps all data on your own infrastructure — ideal for healthcare, finance, and government workloads where closed APIs are a compliance barrier. Google also publishes a Responsible AI Toolkit with Gemma, including safety classifiers, prompt filters, and red-teaming guidance.

📋 Good to know

Setup

Download weights from Hugging Face or Kaggle, run with Ollama, Keras, Transformers, or MediaPipe for mobile.

Privacy

Self-hosting keeps all data local. Google Vertex AI usage subject to standard GCP data policies.

When to upgrade

Move from Gemma 1B/2B to Gemma 4 9B or 31B when you need reasoning capability.

Learning curve

Low — Ollama and Hugging Face Transformers work out of the box with one-line installs.

Explore more

Compare Gemma with alternatives

Gemma vs LlamaFull comparison → Gemma vs MistralFull comparison → Gemma vs QwenFull comparison → Gemma vs GeminiFull comparison →
📝 Report incorrect info about Gemma