Cohere Rerank

Paid

Cohere's Rerank API for boosting RAG retrieval quality at $2 per 1,000 search requests

What is Cohere Rerank?

Cohere Rerank is a specialized API product from Cohere designed to dramatically improve the quality of retrieval in RAG (retrieval-augmented generation) pipelines. In a typical RAG setup, you embed documents into a vector database, retrieve the top 20-100 most relevant documents for a user query using vector similarity, and pass them to an LLM as context. The problem: vector similarity is a noisy signal, and the top 5 documents by cosine similarity are often not the truly most relevant ones. Rerank solves this by taking your top-K vector search results and re-scoring them with a specialized cross-encoder model that reads each query-document pair together and outputs a much more accurate relevance score. Teams that adopt Rerank routinely see 10-30% improvements in end-to-end RAG answer quality without changing their embedding model or LLM. Pricing is simple: $2 per 1,000 search requests, where each search is one query paired with up to 100 documents that are under 500 tokens each. Documents longer than 500 tokens get automatically chunked. As of 2026, Cohere offers Rerank 3.5 as the production default and Rerank 4 Fast as a newer, faster variant with strong accuracy for latency-sensitive use cases. Rerank is available via the Cohere API directly, AWS Bedrock, Azure AI, and through aggregators like OpenRouter. It is one of the cheapest ways to meaningfully improve RAG quality.

⚡ Quick Verdict

Best for

Any team running a RAG pipeline that wants a major quality boost without switching embedding models or LLMs

Not ideal for

Apps that don't use RAG or don't need to rank more than a few documents

Starting price

$2 per 1,000 search requests

Free plan

Limited-rate trial key available

Key strength

Drop-in RAG quality improvement at predictable per-search pricing

Limitation

Adds latency and cost to every retrieval call

Bottom line: Cohere Rerank scores 4.4/5 — the highest ROI improvement you can make to an existing RAG pipeline. Use Rerank 3.5 for production quality, Rerank 4 Fast when latency matters most.

Pricing

Rerank 3.5 — $2 per 1,000 search requests: The standard Rerank model with strong multilingual support. One search equals one query with up to 100 documents under 500 tokens each. Documents longer than 500 tokens are automatically split into chunks, and each chunk counts toward the document total.

Rerank 4 Fast: Newer variant optimized for lower latency with competitive accuracy. Pricing similar to Rerank 3.5, available on Cohere API and OpenRouter.

Availability: Cohere API, AWS Bedrock, Azure AI Studio, and third-party gateways like OpenRouter. Identical per-search pricing across platforms.

Free trial tier: Cohere offers a limited-rate free trial key for development and testing.

Key Features

$2 per 1,000 search requests — simple unit pricing
One search = query + up to 100 documents
Automatic document chunking for long texts
Rerank 3.5 (production default) and Rerank 4 Fast
Multilingual support across 100+ languages
Drop-in improvement for any RAG pipeline
Available on Cohere API, AWS Bedrock, Azure AI
Works with any embedding model and vector database

Pros & Cons

Pros

Cheapest way to meaningfully improve RAG answer quality
Works as a drop-in upgrade without changing embeddings or LLM
Simple per-search pricing is easy to forecast
Multilingual support across 100+ languages

Cons

No free tier beyond limited trial
Adds one extra API call per query to your RAG pipeline
Long documents count as multiple billing units after chunking

✅ Pricing verified April 2026 · ✅ Independently reviewed · ✅ Scoring methodology

FAQ

How much does Cohere Rerank actually improve RAG quality?

Most teams see 10-30% improvements in end-to-end answer quality metrics when adding Rerank between their vector retrieval step and their LLM context assembly. The exact gains depend on your domain, query complexity, and embedding model quality — noisier retrieval pipelines benefit the most.

What counts as one search?

One search request equals one query paired with up to 100 documents where each document is under 500 tokens (roughly 2,000 characters). If any of your documents are longer than 500 tokens, Cohere automatically splits them into multiple chunks, and each chunk counts as an additional document toward the 100-document limit.

How does Rerank compare to better embeddings?

Rerank and embedding quality both affect RAG performance, but they address different problems. Better embeddings help the initial retrieval step. Rerank then re-scores that top-K using a more powerful cross-encoder model. In practice, using a solid embedding model plus Rerank beats using an excellent embedding model alone, and is almost always cheaper.

Does Rerank work with any vector database?

Yes. Rerank is completely model-agnostic and works as a post-processing step after any vector database retrieval — Pinecone, Weaviate, Qdrant, Chroma, pgvector, Elasticsearch, OpenSearch, or even simple numpy cosine similarity. No changes to your vector database are required.

What languages does Rerank support?

Cohere Rerank 3.5 supports 100+ languages including English, Spanish, French, German, Portuguese, Italian, Dutch, Chinese, Japanese, Korean, Arabic, Hindi, and dozens of others. For multilingual RAG pipelines where queries and documents may be in different languages, Rerank is one of the strongest cross-lingual reranking options available.

How is this different from the main Cohere tool page?

This page focuses specifically on Rerank — Cohere's RAG-focused reranking API. The main Cohere review covers Cohere as a company, including Command R+ LLMs, Embed, classification APIs, and enterprise features.