Core Concepts

What is Inference?

The process of running a trained AI model to generate predictions or outputs.

Definition

Inference is the process of using a trained AI model to generate outputs from new inputs. When you send a prompt to ChatGPT and receive a response, that is inference. Inference costs (compute, memory, latency) are a major factor in AI API pricing and deployment decisions. Faster inference means quicker responses and lower costs.

๐Ÿ’ก Example

Every time you press "Send" in ChatGPT, the GPT-4 model performs inference โ€” processing your tokens through its neural network layers to generate a response. API pricing reflects this inference compute cost.

Related concepts

LLM (Large Language Model)

A type of AI trained on massive text datasets to understand and generate human language.

โ†’
Token

The basic unit of text that AI models process โ€” roughly 4 characters or 0.75 words.

โ†’
API (Application Programming Interface)

A way for developers to programmatically access AI models in their own applications.

โ†’

Explore AI tools

Find tools that use inference in practice.

Browse all tools โ†’ Back to glossary
What is Inference?

Inference is the process of using a trained AI model to generate outputs from new inputs. When you send a prompt to ChatGPT and receive a response, that is inference. Inference costs (compute, memory, latency) are a major factor in AI API pricing and deployment decisions. Faster inference means quicker responses and lower costs.

How does Inference work in practice?

Every time you press "Send" in ChatGPT, the GPT-4 model performs inference โ€” processing your tokens through its neural network layers to generate a response. API pricing reflects this inference compute cost.