What is Inference?
The process of running a trained AI model to generate predictions or outputs.
Definition
Inference is the process of using a trained AI model to generate outputs from new inputs. When you send a prompt to ChatGPT and receive a response, that is inference. Inference costs (compute, memory, latency) are a major factor in AI API pricing and deployment decisions. Faster inference means quicker responses and lower costs.
๐ก Example
Every time you press "Send" in ChatGPT, the GPT-4 model performs inference โ processing your tokens through its neural network layers to generate a response. API pricing reflects this inference compute cost.
Related concepts
A type of AI trained on massive text datasets to understand and generate human language.
The basic unit of text that AI models process โ roughly 4 characters or 0.75 words.
A way for developers to programmatically access AI models in their own applications.
What is Inference?
Inference is the process of using a trained AI model to generate outputs from new inputs. When you send a prompt to ChatGPT and receive a response, that is inference. Inference costs (compute, memory, latency) are a major factor in AI API pricing and deployment decisions. Faster inference means quicker responses and lower costs.
How does Inference work in practice?
Every time you press "Send" in ChatGPT, the GPT-4 model performs inference โ processing your tokens through its neural network layers to generate a response. API pricing reflects this inference compute cost.