per-token$106B inference market (2025)

Per-Token Billing for LLM Inference in 2 Lines of Code

Meter every OpenAI, Anthropic, and Google API call. Set per-token budgets, track cross-provider costs, and bill your users automatically. Works with any LLM SDK.

Start Building -- Free Read the Docs

How it works

TypeScript

import { settlegrid } from '@settlegrid/mcp'
import OpenAI from 'openai'

const sg = settlegrid.init({
  toolSlug: 'my-llm-proxy',
  pricing: { model: 'per-token', inputCostPer1k: 0.3, outputCostPer1k: 1.2 },
})

const openai = new OpenAI()

const billedCompletion = sg.wrap(async (args: { prompt: string }) => {
  const response = await openai.chat.completions.create({
    model: 'gpt-4o',
    messages: [{ role: 'user', content: args.prompt }],
  })
  return { content: [{ type: 'text', text: response.choices[0].message.content }] }
})

Supported providers

SettleGrid works with any provider. Here are the most common ones for llm inference & ai models.

Provider	Pricing
OpenAI	GPT-4o, GPT-4o-mini, o1, o3 -- $2-60/M tokens
Anthropic	Claude Opus, Sonnet, Haiku -- $0.25-75/M tokens
Google Gemini	Gemini 2.5 Pro, Flash -- $1.25-10/M tokens
DeepSeek	DeepSeek V3, R1 -- $0.14-2.19/M tokens
Groq	Llama, Mixtral on LPU -- $0.04-0.88/M tokens
Together AI	100+ open models -- $0.10-18/M tokens
Fireworks AI	Optimized inference -- $0.10-3/M tokens

Billing model: per-token

Why per-token billing?

LLM inference costs scale with token usage. Per-token billing lets you pass through exact costs to end users, set per-user budget caps, and automatically track input vs output token spend across multiple providers. SettleGrid meters tokens from the response metadata and settles in real time, so you never eat costs from runaway prompts.

$106B

Total Addressable Market

Supported Providers

2 min

Setup Time

Frequently asked questions

How does per-token billing work with SettleGrid?

SettleGrid reads the token counts from the LLM response (usage.prompt_tokens and usage.completion_tokens) and applies your configured input/output rates. Settlement happens atomically before the response is returned to the caller.

Can I set per-user budget caps?

Yes. SettleGrid supports per-caller budget limits at the tool level. When a user hits their cap, subsequent calls return a 402 Payment Required status instead of hitting your LLM provider.

Does this work with streaming responses?

Yes. For streaming completions, SettleGrid accumulates tokens across chunks and settles the full cost when the stream completes. If the stream is interrupted, you only pay for tokens actually generated.

Can I use different pricing for different models?

Absolutely. You can configure method-level pricing in SettleGrid, so gpt-4o calls use one rate while gpt-4o-mini calls use another. The pricing configuration supports arbitrary method names.

What if my LLM provider changes their pricing?

Update your SettleGrid pricing config via the dashboard or API. Changes take effect immediately for new calls. Historical settlements retain their original pricing.

Start billing llm inference & ai models today

Add per-token billing to your llm inference & ai models service in under 2 minutes. No upfront costs, no contracts.

Start Building -- Free Read the Docs

← Previous solution

Data APIs

Next solution →

Search & RAG

How it works

TypeScript

import { settlegrid } from '@settlegrid/mcp'
import OpenAI from 'openai'

const sg = settlegrid.init({
  toolSlug: 'my-llm-proxy',
  pricing: { model: 'per-token', inputCostPer1k: 0.3, outputCostPer1k: 1.2 },
})

const openai = new OpenAI()

const billedCompletion = sg.wrap(async (args: { prompt: string }) => {
  const response = await openai.chat.completions.create({
    model: 'gpt-4o',
    messages: [{ role: 'user', content: args.prompt }],
  })
  return { content: [{ type: 'text', text: response.choices[0].message.content }] }
})

Supported providers

SettleGrid works with any provider. Here are the most common ones for llm inference & ai models.

Provider	Pricing
OpenAI	GPT-4o, GPT-4o-mini, o1, o3 -- $2-60/M tokens
Anthropic	Claude Opus, Sonnet, Haiku -- $0.25-75/M tokens
Google Gemini	Gemini 2.5 Pro, Flash -- $1.25-10/M tokens
DeepSeek	DeepSeek V3, R1 -- $0.14-2.19/M tokens
Groq	Llama, Mixtral on LPU -- $0.04-0.88/M tokens
Together AI	100+ open models -- $0.10-18/M tokens
Fireworks AI	Optimized inference -- $0.10-3/M tokens

Why per-token billing?

Frequently asked questions

How does per-token billing work with SettleGrid?

Can I set per-user budget caps?

Yes. SettleGrid supports per-caller budget limits at the tool level. When a user hits their cap, subsequent calls return a 402 Payment Required status instead of hitting your LLM provider.

Does this work with streaming responses?

Can I use different pricing for different models?

Absolutely. You can configure method-level pricing in SettleGrid, so gpt-4o calls use one rate while gpt-4o-mini calls use another. The pricing configuration supports arbitrary method names.

What if my LLM provider changes their pricing?

Update your SettleGrid pricing config via the dashboard or API. Changes take effect immediately for new calls. Historical settlements retain their original pricing.

Per-Token Billing for LLM Inference in 2 Lines of Code

How it works

Supported providers

Why per-token billing?

Frequently asked questions

Start billing llm inference & ai models today

SettleGrid Help

How can I help?

Per-Token Billing for LLM Inference in 2 Lines of Code

How it works

Supported providers

Why per-token billing?

Frequently asked questions

Start billing llm inference & ai models today