The chat completions endpoint is the primary way to interact with AI models through the gateway. You send a conversation — a list of messages with roles — and the model returns its reply. The endpoint is fully compatible with the OpenAI chat completions API, so any code written for OpenAI works here by changing only the base URL.Documentation Index
Fetch the complete documentation index at: https://docs.metask.ai/llms.txt
Use this file to discover all available pages before exploring further.
Endpoint
Request parameters
The ID of the model to use for this request. Use the GET /v1/models endpoint to retrieve the list of available model IDs.Example:
"gpt-4o-mini"An array of message objects representing the conversation history. Messages are processed in order.
When
true, the response is sent as a stream of server-sent events (SSE) rather than a single JSON object. Each event contains a partial completion delta. The stream ends with a data: [DONE] event.Sampling temperature between
0 and 2. Lower values produce more deterministic output; higher values produce more varied output. Values above 1.5 can lead to incoherent responses.The maximum number of tokens to generate in the completion. When omitted, the model uses its default context limit. Setting this lower helps control costs.
Nucleus sampling parameter between
0 and 1. The model considers only the tokens comprising the top top_p probability mass. Use either temperature or top_p, not both.Response fields
A unique identifier for this completion, prefixed with
chatcmpl-.Always
"chat.completion" for non-streaming responses.Unix timestamp (seconds) of when the completion was created.
The model ID that was used to generate this completion.
An array of completion choices. Most requests return a single choice at index
0.Token usage for this request.
Examples
Streaming
Setstream: true to receive the response as a series of server-sent events. Each event carries a delta with the next chunk of text. This is useful for displaying output to users in real time.