SwitchAI API Reference¶

The SwitchAI API is a robust and scalable proxy service for managing interactions with multiple LLM providers including OpenAI, Anthropic, and Groq. It provides unified access to 50+ models with intelligent routing, automatic fallback, and integrated billing.

Base URL: https://api.traylinx.com/v1

Quick Start¶

curl -X POST https://api.traylinx.com/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-oss-120b",
    "messages": [{"role": "user", "content": "Hello!"}],
    "max_tokens": 100
  }'

Authentication¶

All API requests require a valid API key passed in the Authorization header:

Authorization: Bearer YOUR_API_KEY

API keys are validated by the Stargate Auth Sidecar, which intercepts requests and injects user context via the User-Info header. You can generate API keys from the Traylinx Console.

Note: For agent-to-agent (A2A) authentication, use the SentinelPass SDKs instead.

Endpoints¶

Chat Completions¶

POST /v1/chat/completions

Create a chat completion with any supported model.

Request Body:

Parameter	Type	Required	Description
`model`	string	✅	Model identifier (e.g., `openai/gpt-oss-120b`, `claude-3-sonnet`)
`messages`	array	✅	Array of message objects with `role` and `content`
`max_tokens`	integer	❌	Maximum tokens to generate (default: model-specific)
`temperature`	float	❌	Sampling temperature (0-2, default: 1)
`stream`	boolean	❌	Enable streaming responses (default: false)
`top_p`	float	❌	Nucleus sampling parameter (0-1)

Example Request:

{
  "model": "claude-3-sonnet",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is the capital of France?"}
  ],
  "max_tokens": 150,
  "temperature": 0.7
}

Response:

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1703980800,
  "model": "claude-3-sonnet",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The capital of France is Paris."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 8,
    "total_tokens": 33
  }
}

Streaming Responses¶

Set stream: true to receive Server-Sent Events (SSE):

curl -X POST https://api.traylinx.com/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model": "openai/gpt-oss-120b", "messages": [...], "stream": true}'

Each SSE event contains a delta:

data: {"choices":[{"delta":{"content":"The"}}]}
data: {"choices":[{"delta":{"content":" capital"}}]}
data: {"choices":[{"delta":{"content":" is"}}]}
data: [DONE]

Supported Models¶

OpenAI Models¶

Model	Context	Capability	Input Price	Output Price
`openai/gpt-oss-120b`	128K	Very High	$0.005/1K	$0.015/1K
`text-embedding-3-small`	8K	Embeddings	$0.00002/1K	-
`text-embedding-3-large`	8K	Embeddings	$0.00013/1K	-

Anthropic Models (Claude)¶

Model	Context	Capability	Input Price	Output Price
`claude-3-opus`	200K	Very High	$0.015/1K	$0.075/1K
`claude-3-sonnet`	200K	High	$0.003/1K	$0.015/1K
`claude-3-haiku`	200K	Medium	$0.00025/1K	$0.00125/1K
`claude-3.5-sonnet`	200K	Very High	$0.003/1K	$0.015/1K

Groq Models (High-Performance)¶

Model	Context	Capability	Input Price	Output Price
`llama-3.3-70b-versatile`	32K	High	$0.59/M	$0.79/M
`llama-3.1-8b-instant`	8K	Medium	$0.05/M	$0.08/M
`mixtral-8x7b`	32K	High	$0.24/M	$0.24/M
`deepseek-r1-distill-llama-70b`	128K	Very High	$0.75/M	$0.99/M

Vision Models¶

Model	Capability	Price
`llama-3.2-90b-vision`	Image understanding	$0.90/M
`llama-3.2-11b-vision`	Image understanding	$0.18/M

Audio Models¶

Model	Capability	Price
`whisper-v3-large`	High-quality transcription	$0.111/hour
`whisper-v3-turbo`	Fast transcription	$0.04/hour
`tts-1-hd`	Text-to-speech	$0.03/1K chars

Auto Mode (IRA)¶

SwitchAI features an Intelligent Routing Agent (IRA) that automatically selects the optimal model based on your request. Set model: "auto" to enable automatic routing.

Quick Start¶

{
  "model": "auto",
  "messages": [
    {"role": "user", "content": "Tell me a joke"}
  ],
  "stream": true
}

Routing Strategies¶

IRA supports three routing strategies:

Strategy	Description	Use Case
`best`	Selects the highest-capability model	Complex reasoning, coding
`cheapest`	Selects the most cost-effective model	High-volume, simple tasks
`fastest`	Selects the lowest-latency model	Real-time applications

Response Headers¶

When using model: "auto", SwitchAI returns these headers in the response:

Header	Description
`X-IRA-Model`	The actual model selected (e.g., `claude-3-sonnet`)
`X-IRA-Strategy`	The routing strategy used (`best`, `cheapest`, `fastest`)
`X-IRA-Latency`	Time in ms taken for routing decision

Example Response Headers¶

X-IRA-Model: claude-3-sonnet
X-IRA-Strategy: best
X-IRA-Latency: 12

API Key Resolution¶

SwitchAI supports two API key modes:

1. Platform API Keys (Default)¶

Use Traylinx-managed API keys. Requests are billed to your organization's credit balance.

Authorization: Bearer tlx_sk_...

2. User-Provided API Keys (BYOK)¶

Bring your own OpenAI, Anthropic, or Groq keys. Store them in your Traylinx project settings, and the system will use them automatically.

Benefits of BYOK: - Use models even when they're "inactive" on the platform - Direct billing to your provider account - Higher rate limits from your provider

Priority Order: 1. User API key for the provider (highest priority) 2. Environment/platform API key (fallback)

Model Fallback System¶

When a primary model is unavailable, SwitchAI can automatically route to a fallback model:

Request: openai/gpt-oss-120b (inactive)
  ↓
Fallback enabled? → Yes, fallback to claude-3-haiku
  ↓
Result: Request served by Claude

Fallback is only triggered when: - The user does not have their own API key for the provider - The primary model is marked as inactive - A fallback model is configured and active

Wallet & Billing¶

Credit Validation Flow¶

For platform API key requests, the system validates credits in this order:

1. Project Wallet → Check project credits
       ↓ (insufficient)
2. Organization Wallet → Check org credits
       ↓ (insufficient)
3. Owner Wallet → Check organization owner's personal credits
       ↓
4. Insufficient at all levels → HTTP 402 Payment Required

Subscription Validation (BYOK Users)¶

Users with their own API keys must have an active subscription:

HTTP 403 Forbidden
{
  "detail": "Valid subscription required",
  "error": {
    "organization_id": "org-123",
    "subscription_found": false
  }
}

Error Responses¶

Common Errors¶

Status	Error	Description
`400`	Model not available	The requested model is inactive or doesn't exist
`401`	Invalid API key	The provided API key is invalid or expired
`402`	Payment required	Insufficient credits in all wallets
`403`	Subscription required	BYOK users need an active subscription
`429`	Rate limit exceeded	Too many requests (5000/minute limit)
`500`	Provider error	Upstream LLM provider returned an error

Error Response Format¶

{
  "detail": "Human-readable error message",
  "error": {
    "code": "INSUFFICIENT_CREDITS",
    "required_amount": 0.05,
    "checked_wallets": [...]
  }
}

Rate Limits¶

Tier	Requests/Minute	Concurrent Requests
Free	60	5
Pro	1000	50
Enterprise	5000	200

Rate limit headers are included in every response:

X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 999
X-RateLimit-Reset: 1703980860

Agent Models¶

SwitchAI supports specialized agent models for web search and scraping:

Search Agent¶

{
  "model": "search-engine",
  "messages": [{"role": "user", "content": "Latest AI news"}],
  "agent_config": {
    "max_results": 10,
    "search_type": "web"
  }
}

Scraping Agent¶

{
  "model": "scrap-engine",
  "messages": [{"role": "user", "content": "https://example.com"}],
  "agent_config": {
    "format": "markdown",
    "extract_links": true
  }
}

SDKs (SentinelPass A2A)¶

For agent-to-agent (A2A) OAuth authentication between any AI agents, use the SentinelPass SDKs:

SentinelPass Python SDK - A2A OAuth authentication for Python agents
SentinelPass JavaScript SDK - A2A OAuth authentication with full TypeScript support

Note: SentinelPass works with any agent ecosystem—not just Traylinx or Stargate.

Performance Tips¶

Use streaming for long responses to reduce perceived latency
Cache embeddings when processing the same content repeatedly
Set appropriate max_tokens to avoid unnecessary generation
Use faster models (e.g., llama-3.1-8b-instant) for simple tasks
Batch requests when processing multiple items