Skip to content

SwitchAI API Reference

The SwitchAI API is a robust and scalable proxy service for managing interactions with multiple LLM providers including OpenAI, Anthropic, and Groq. It provides unified access to 50+ models with intelligent routing, automatic fallback, and integrated billing.

Base URL: https://api.traylinx.com/v1

Quick Start

curl -X POST https://api.traylinx.com/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-oss-120b",
    "messages": [{"role": "user", "content": "Hello!"}],
    "max_tokens": 100
  }'

Authentication

All API requests require a valid API key passed in the Authorization header:

Authorization: Bearer YOUR_API_KEY

API keys are validated by the Stargate Auth Sidecar, which intercepts requests and injects user context via the User-Info header. You can generate API keys from the Traylinx Console.

Note: For agent-to-agent (A2A) authentication, use the SentinelPass SDKs instead.

Endpoints

Chat Completions

POST /v1/chat/completions

Create a chat completion with any supported model.

Request Body:

Parameter Type Required Description
model string Model identifier (e.g., openai/gpt-oss-120b, claude-3-sonnet)
messages array Array of message objects with role and content
max_tokens integer Maximum tokens to generate (default: model-specific)
temperature float Sampling temperature (0-2, default: 1)
stream boolean Enable streaming responses (default: false)
top_p float Nucleus sampling parameter (0-1)

Example Request:

{
  "model": "claude-3-sonnet",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is the capital of France?"}
  ],
  "max_tokens": 150,
  "temperature": 0.7
}

Response:

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1703980800,
  "model": "claude-3-sonnet",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The capital of France is Paris."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 8,
    "total_tokens": 33
  }
}

Streaming Responses

Set stream: true to receive Server-Sent Events (SSE):

curl -X POST https://api.traylinx.com/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model": "openai/gpt-oss-120b", "messages": [...], "stream": true}'

Each SSE event contains a delta:

data: {"choices":[{"delta":{"content":"The"}}]}
data: {"choices":[{"delta":{"content":" capital"}}]}
data: {"choices":[{"delta":{"content":" is"}}]}
data: [DONE]

Supported Models

OpenAI Models

Model Context Capability Input Price Output Price
openai/gpt-oss-120b 128K Very High $0.005/1K $0.015/1K
text-embedding-3-small 8K Embeddings $0.00002/1K -
text-embedding-3-large 8K Embeddings $0.00013/1K -

Anthropic Models (Claude)

Model Context Capability Input Price Output Price
claude-3-opus 200K Very High $0.015/1K $0.075/1K
claude-3-sonnet 200K High $0.003/1K $0.015/1K
claude-3-haiku 200K Medium $0.00025/1K $0.00125/1K
claude-3.5-sonnet 200K Very High $0.003/1K $0.015/1K

Groq Models (High-Performance)

Model Context Capability Input Price Output Price
llama-3.3-70b-versatile 32K High $0.59/M $0.79/M
llama-3.1-8b-instant 8K Medium $0.05/M $0.08/M
mixtral-8x7b 32K High $0.24/M $0.24/M
deepseek-r1-distill-llama-70b 128K Very High $0.75/M $0.99/M

Vision Models

Model Capability Price
llama-3.2-90b-vision Image understanding $0.90/M
llama-3.2-11b-vision Image understanding $0.18/M

Audio Models

Model Capability Price
whisper-v3-large High-quality transcription $0.111/hour
whisper-v3-turbo Fast transcription $0.04/hour
tts-1-hd Text-to-speech $0.03/1K chars

Auto Mode (IRA)

SwitchAI features an Intelligent Routing Agent (IRA) that automatically selects the optimal model based on your request. Set model: "auto" to enable automatic routing.

Quick Start

{
  "model": "auto",
  "messages": [
    {"role": "user", "content": "Tell me a joke"}
  ],
  "stream": true
}

Routing Strategies

IRA supports three routing strategies:

Strategy Description Use Case
best Selects the highest-capability model Complex reasoning, coding
cheapest Selects the most cost-effective model High-volume, simple tasks
fastest Selects the lowest-latency model Real-time applications

Response Headers

When using model: "auto", SwitchAI returns these headers in the response:

Header Description
X-IRA-Model The actual model selected (e.g., claude-3-sonnet)
X-IRA-Strategy The routing strategy used (best, cheapest, fastest)
X-IRA-Latency Time in ms taken for routing decision

Example Response Headers

X-IRA-Model: claude-3-sonnet
X-IRA-Strategy: best
X-IRA-Latency: 12

API Key Resolution

SwitchAI supports two API key modes:

1. Platform API Keys (Default)

Use Traylinx-managed API keys. Requests are billed to your organization's credit balance.

Authorization: Bearer tlx_sk_...

2. User-Provided API Keys (BYOK)

Bring your own OpenAI, Anthropic, or Groq keys. Store them in your Traylinx project settings, and the system will use them automatically.

Benefits of BYOK: - Use models even when they're "inactive" on the platform - Direct billing to your provider account - Higher rate limits from your provider

Priority Order: 1. User API key for the provider (highest priority) 2. Environment/platform API key (fallback)


Model Fallback System

When a primary model is unavailable, SwitchAI can automatically route to a fallback model:

Request: openai/gpt-oss-120b (inactive)
Fallback enabled? → Yes, fallback to claude-3-haiku
Result: Request served by Claude

Fallback is only triggered when: - The user does not have their own API key for the provider - The primary model is marked as inactive - A fallback model is configured and active


Wallet & Billing

Credit Validation Flow

For platform API key requests, the system validates credits in this order:

1. Project Wallet → Check project credits
       ↓ (insufficient)
2. Organization Wallet → Check org credits
       ↓ (insufficient)
3. Owner Wallet → Check organization owner's personal credits
4. Insufficient at all levels → HTTP 402 Payment Required

Subscription Validation (BYOK Users)

Users with their own API keys must have an active subscription:

HTTP 403 Forbidden
{
  "detail": "Valid subscription required",
  "error": {
    "organization_id": "org-123",
    "subscription_found": false
  }
}

Error Responses

Common Errors

Status Error Description
400 Model not available The requested model is inactive or doesn't exist
401 Invalid API key The provided API key is invalid or expired
402 Payment required Insufficient credits in all wallets
403 Subscription required BYOK users need an active subscription
429 Rate limit exceeded Too many requests (5000/minute limit)
500 Provider error Upstream LLM provider returned an error

Error Response Format

{
  "detail": "Human-readable error message",
  "error": {
    "code": "INSUFFICIENT_CREDITS",
    "required_amount": 0.05,
    "checked_wallets": [...]
  }
}

Rate Limits

Tier Requests/Minute Concurrent Requests
Free 60 5
Pro 1000 50
Enterprise 5000 200

Rate limit headers are included in every response:

X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 999
X-RateLimit-Reset: 1703980860

Agent Models

SwitchAI supports specialized agent models for web search and scraping:

Search Agent

{
  "model": "search-engine",
  "messages": [{"role": "user", "content": "Latest AI news"}],
  "agent_config": {
    "max_results": 10,
    "search_type": "web"
  }
}

Scraping Agent

{
  "model": "scrap-engine",
  "messages": [{"role": "user", "content": "https://example.com"}],
  "agent_config": {
    "format": "markdown",
    "extract_links": true
  }
}

SDKs (SentinelPass A2A)

For agent-to-agent (A2A) OAuth authentication between any AI agents, use the SentinelPass SDKs:

Note: SentinelPass works with any agent ecosystem—not just Traylinx or Stargate.


Performance Tips

  1. Use streaming for long responses to reduce perceived latency
  2. Cache embeddings when processing the same content repeatedly
  3. Set appropriate max_tokens to avoid unnecessary generation
  4. Use faster models (e.g., llama-3.1-8b-instant) for simple tasks
  5. Batch requests when processing multiple items