Token Usage and Billing Explained

Every AI model request you make through 元任务 AI 网关 consumes tokens, and your account balance decreases accordingly. Understanding how token counting works helps you predict costs, read your usage data accurately, and make informed decisions when choosing between models.

What are tokens?

Tokens are the units AI models use to process text. A token is roughly 3–4 characters in English — about 75% of a word. As a practical approximation:

1,000 tokens ≈ 750 words
A short paragraph ≈ 100–200 tokens
A detailed technical prompt ≈ 500–2,000 tokens

Non-English text, code, and special characters may tokenize differently and often consume more tokens per word than English prose.

How the gateway meters usage

Every request has two token components:

Input tokens (also called prompt tokens): the text you send — your system prompt, conversation history, and user message.
Output tokens (also called completion tokens): the text the model generates in response.

Both are counted and billed. Input and output tokens may be priced differently depending on the model; more capable models typically cost more per token. The gateway records token consumption for every successful request and deducts the corresponding amount from your account balance.

Reading usage from API responses

Every chat completions response includes a usage object that reports the token counts for that request:

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1714000000,
  "model": "gpt-4o",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The Kyoto Protocol is an international treaty..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 312,
    "completion_tokens": 187,
    "total_tokens": 499
  }
}

Field	Description
`usage.prompt_tokens`	Tokens consumed by your input (system prompt + messages)
`usage.completion_tokens`	Tokens generated in the model’s response
`usage.total_tokens`	Sum of prompt and completion tokens

Use total_tokens to understand the cost of a single request. Over time, tracking prompt_tokens separately helps you identify if a growing conversation history or large system prompt is driving up costs.

Checking your balance and usage

Your current account balance
Historical token usage by model and time period
Individual request logs with per-request token counts

Managing your balance

Top up your balance

In the dashboard, navigate to Billing and select Add Balance. Choose an amount and complete the payment. Your balance is available immediately after the transaction is confirmed.

Apply a promo code

If you have a promo code, enter it in the Promo Code field in the Billing section. Promo codes add credit directly to your balance.

Monitor usage

Review your usage regularly in the dashboard to catch unexpected consumption early. Check the Usage section for per-request and per-model breakdowns.

When your balance reaches zero, API requests will be rejected with an authentication or quota error. Top up your balance before it runs out to avoid interruptions in production.

Tips for managing costs

Choose the right model for the task. Smaller, faster models like gpt-4o-mini or claude-3-haiku-20240307 cost significantly less per token than frontier models. Use them for classification, extraction, summarization, and other tasks that don’t require the highest capability. Keep system prompts concise. Your system prompt is included in prompt_tokens for every request in a session. A 2,000-token system prompt adds 2,000 tokens of cost to every single call. Trim it to what’s necessary. Limit conversation history. In multi-turn conversations, you send the full message history with each request. Truncate or summarize older messages to prevent unbounded token growth. Use streaming for long responses. Streaming ("stream": true) doesn’t reduce token usage, but it lets you stop generation early if the model begins producing irrelevant output, avoiding wasted completion tokens. Set max_tokens limits. Cap the maximum response length with the max_tokens parameter to prevent unexpectedly long completions from consuming more tokens than you need.

Token counts in the usage field reflect the actual tokens processed by the model. They may differ slightly from estimates produced by local tokenizer libraries, which are useful for budgeting but not authoritative.

Get Started

Core Concepts

Integrations

Account

Token Usage and Billing Explained

What are tokens?

How the gateway meters usage

Reading usage from API responses

Checking your balance and usage

Managing your balance

Tips for managing costs

Get Started

Core Concepts

Integrations

Account

Documentation Index

​What are tokens?

​How the gateway meters usage

​Reading usage from API responses

​Checking your balance and usage

​Managing your balance

​Tips for managing costs

What are tokens?

How the gateway meters usage

Reading usage from API responses

Checking your balance and usage

Managing your balance

Tips for managing costs