English | 中文
Local AI API gateway for OpenAI / Gemini / Anthropic. Runs on your machine, keeps tokens counted (SQLite), offers priority-based load balancing, optional API format conversion (OpenAI Chat/Responses ↔ Anthropic Messages, plus Gemini ↔ OpenAI/Anthropic, including SSE/tools/images), and one-click setup for Claude Code / Codex.
Default listen port: 9208 (release) / 19208 (debug builds).
- Multiple providers:
openai,openai-response,anthropic,gemini,kiro - Built-in routing + optional format conversion (OpenAI Chat ⇄ Responses; Anthropic Messages ↔ OpenAI; Gemini ↔ OpenAI/Anthropic; SSE supported)
- Per-upstream priority + two balancing strategies (fill-first / round-robin)
- Model alias mapping (exact / prefix* / wildcard*) and response model rewrite
- Local access key (Authorization) + upstream key injection
- SQLite-powered dashboard (requests, tokens, cached tokens, latency, recent)
- macOS tray live token rate (optional)
Dashboard![]() |
Core![]() |
Upstreams![]() |
Add upstream![]() |
- Install: move
Token Proxy.appto/Applications. If blocked:xattr -cr /Applications/Token\ Proxy.app. - Launch the app. The proxy starts automatically.
- Open Config File tab, edit and save (writes
config.jsoncin the Tauri config dir). Defaults are usable; just paste your upstream API keys. - Call via curl (example with local auth):
curl -X POST \
-H "Authorization: Bearer YOUR_LOCAL_KEY" \
-H "Content-Type: application/json" \
http://127.0.0.1:9208/v1/chat/completions \
-d '{"model":"gpt-4.1-mini","messages":[{"role":"user","content":"hi"}]}'You can also call using the Anthropic Messages format (useful for Claude Code clients):
curl -X POST \
-H "x-api-key: YOUR_LOCAL_KEY" \
-H "Content-Type: application/json" \
http://127.0.0.1:9208/v1/messages \
-d '{"model":"claude-3-5-sonnet-20241022","max_tokens":256,"messages":[{"role":"user","content":[{"type":"text","text":"hi"}]}]}'- File:
config.jsonc(comments + trailing commas allowed) - Location: Tauri AppConfig directory (resolved automatically by the app)
| Field | Default | Notes |
|---|---|---|
host |
127.0.0.1 |
Listen address (IPv6 allowed; will be bracketed in URLs) |
port |
9208 release / 19208 debug |
Change if the port is taken |
local_api_key |
null |
When set: local auth uses format-specific headers (see Auth rules); local auth inputs are not forwarded upstream. |
app_proxy_url |
null |
Proxy for app updater & as placeholder for upstreams ("$app_proxy_url"). Supports http/https/socks5/socks5h. |
log_level |
silent |
`silent |
max_request_body_bytes |
20971520 (20 MiB) |
0 = fallback to default. Protects inbound body size. |
tray_token_rate.enabled |
true |
macOS tray live rate; harmless elsewhere. |
tray_token_rate.format |
split |
combined (total), split (↑in ↓out), both (`total |
enable_api_format_conversion |
true |
Allow OpenAI/Anthropic/Gemini fallback via request/response body and SSE stream conversion. |
upstream_strategy |
priority_fill_first |
priority_fill_first (default) keeps trying the highest-priority group in list order; priority_round_robin rotates within each priority group. |
| Field | Default | Notes |
|---|---|---|
id |
required | Unique per upstream. |
provider |
required | One of openai, openai-response, anthropic, gemini, kiro. |
base_url |
required | Full base; overlapping path parts are de-duplicated. (kiro can be empty.) |
api_key |
null |
Provider-specific bearer/key; overrides request headers. |
kiro_account_id |
null |
Required when provider=kiro. |
preferred_endpoint |
null |
kiro only: ide or cli. |
proxy_url |
null |
Per-upstream proxy; supports http/https/socks5/socks5h; default is no system proxy. $app_proxy_url placeholder allowed. |
priority |
0 |
Higher = tried earlier. Grouped by priority then by order (or round-robin). |
enabled |
true |
Disabled upstreams are skipped. |
model_mappings |
{} |
Exact / prefix* / *. Priority: exact > longest prefix > wildcard. Response echoes original alias. |
overrides.header |
{} |
Set/remove headers (null removes). Hop-by-hop/Host/Content-Length are always ignored. |
- Gemini:
/v1beta/models/*:generateContentand*:streamGenerateContent→gemini(SSE supported). - Anthropic:
/v1/messages(and subpaths) and/v1/complete→anthropic(Kiro shares the same format). - OpenAI:
/v1/chat/completions→openai;/v1/responses→openai-response. - Other paths: choose the provider with the highest configured priority; tie-break is
openai>openai-response>anthropic. - If the preferred provider is missing but
enable_api_format_conversion=true, the proxy auto-converts request/response bodies and streams between supported formats (including SSE). - If
openaiis missing for/v1/chat/completions: fallback can beopenai-response,anthropic, orgemini(priority-based; tie-break prefersopenai-response). - For
/v1/messages: choose betweenanthropicandkiroby priority; tie-break uses upstream id. - If neither
anthropicnorkiroexists for/v1/messagesandenable_api_format_conversion=true: fallback can beopenai-response,openai, orgemini(priority-based; tie-break prefersopenai-response). - If
openai-responseis missing for/v1/responses: fallback can beopenai,anthropic, orgemini(priority-based; tie-break prefersopenai). - If
geminiis missing for/v1beta/models/*:generateContent: fallback can beopenai-response,openai, oranthropic(priority-based; tie-break prefersopenai-response).
- Local access:
local_api_keyenabled → require format-specific key. These local auth inputs are stripped and not forwarded upstream.- OpenAI / Responses:
Authorization: Bearer <key> - Anthropic
/v1/messages:x-api-keyorx-anthropic-api-key - Gemini:
x-goog-api-keyor?key=...
- OpenAI / Responses:
- When
local_api_keyis enabled, request headers are not used for upstream auth; configureupstreams[].api_keyinstead. - Upstream auth resolution (per request):
- OpenAI:
upstream.api_key→x-openai-api-key→Authorization(only iflocal_api_keyis not set) → error. - Anthropic:
upstream.api_key→x-api-key/x-anthropic-api-key→ error. Missinganthropic-versionis auto-filled with2023-06-01. - Gemini:
upstream.api_key→x-goog-api-key→ query?key=...→ error.
- OpenAI:
- Priorities: higher
prioritygroups first; inside a group use list order (fill-first) or round-robin (ifpriority_round_robin). - Retryable conditions: network timeout/connect errors, or status 403/429/307/5xx except 504/524. Retries stay within the same priority group; then the next lower priority group is tried.
- SQLite log:
data.dbin config dir. Stores per-request stats (tokens, cached tokens, latency, model, upstream). - Token rate: macOS tray shows live total or split rates (configurable via
tray_token_rate). - Debug/trace log bodies capped at 64KiB.
- In-app Dashboard page visualizes totals, per-provider stats, time series, and recent requests (page size 50, offset supported).
- Claude Code: writes
~/.claude/settings.jsonenv(ANTHROPIC_BASE_URL,ANTHROPIC_AUTH_TOKENwhen local key is set). - Codex: writes
~/.codex/config.tomlmodel_provider="token_proxy"and[model_providers.token_proxy].base_url→http://127.0.0.1:<port>/v1; writes~/.codex/auth.jsonOPENAI_API_KEY. - A
.token_proxy.bakfile is created before overwriting; restart the CLI to apply.
- Port already in use? Change
portinconfig.jsonc; remember to update your client base URL. - Got 401? If
local_api_keyis set, you must send the format-specific local key (OpenAI/Responses:Authorization, Anthropic:x-api-key, Gemini:x-goog-api-keyor?key=). With local auth enabled, configure upstream keys inupstreams[].api_key. - Got 504? Upstream did not send response headers or the first body chunk within 120s. For streaming responses, a 120s idle timeout between chunks may also close the connection.
- 413 Payload Too Large? Body exceeded
max_request_body_bytes(default 20 MiB) or the 4 MiB transform limit for format-conversion requests. - Why no system proxy? By design,
reqwestis built with.no_proxy(); set per-upstreamproxy_urlif needed.



