Core Concepts

What is Token Limit?

The maximum number of tokens that can be processed in a single API request.

Definition

Token limits define the maximum input and output sizes for AI API calls. Each model has specific limits: GPT-4o supports 128K input tokens, Claude supports 200K. The limit includes both your prompt and the AI response. Exceeding the limit requires chunking strategies, summarization, or using models with larger context windows.

💡 Example

If you try to paste a 300-page PDF into ChatGPT, you will hit the token limit. Solutions include using Claude (200K tokens = ~150K words), splitting the document into sections, or using RAG to retrieve only relevant passages.

Related concepts

LLM (Large Language Model)

A type of AI trained on massive text datasets to understand and generate human language.

The basic unit of text that AI models process — roughly 4 characters or 0.75 words.

The maximum amount of text an AI model can process in a single conversation.

API (Application Programming Interface)

A way for developers to programmatically access AI models in their own applications.

Explore AI tools

Find tools that use token limit in practice.

Browse all tools → Back to glossary

What is Token Limit?

Token limits define the maximum input and output sizes for AI API calls. Each model has specific limits: GPT-4o supports 128K input tokens, Claude supports 200K. The limit includes both your prompt and the AI response. Exceeding the limit requires chunking strategies, summarization, or using models with larger context windows.

How does Token Limit work in practice?

If you try to paste a 300-page PDF into ChatGPT, you will hit the token limit. Solutions include using Claude (200K tokens = ~150K words), splitting the document into sections, or using RAG to retrieve only relevant passages.