SwitchAI API Reference¶
The SwitchAI API is a robust and scalable proxy service for managing interactions with multiple LLM providers including OpenAI, Anthropic, and Groq. It provides unified access to 50+ models with intelligent routing, automatic fallback, and integrated billing.
Base URL:
https://api.traylinx.com/v1
Quick Start¶
curl -X POST https://api.traylinx.com/v1/chat/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-oss-120b",
"messages": [{"role": "user", "content": "Hello!"}],
"max_tokens": 100
}'
Authentication¶
All API requests require a valid API key passed in the Authorization header:
API keys are validated by the Stargate Auth Sidecar, which intercepts requests and injects user context via the User-Info header. You can generate API keys from the Traylinx Console.
Note: For agent-to-agent (A2A) authentication, use the SentinelPass SDKs instead.
Endpoints¶
Chat Completions¶
Create a chat completion with any supported model.
Request Body:
| Parameter | Type | Required | Description |
|---|---|---|---|
model |
string | ✅ | Model identifier (e.g., openai/gpt-oss-120b, claude-3-sonnet) |
messages |
array | ✅ | Array of message objects with role and content |
max_tokens |
integer | ❌ | Maximum tokens to generate (default: model-specific) |
temperature |
float | ❌ | Sampling temperature (0-2, default: 1) |
stream |
boolean | ❌ | Enable streaming responses (default: false) |
top_p |
float | ❌ | Nucleus sampling parameter (0-1) |
Example Request:
{
"model": "claude-3-sonnet",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the capital of France?"}
],
"max_tokens": 150,
"temperature": 0.7
}
Response:
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1703980800,
"model": "claude-3-sonnet",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The capital of France is Paris."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 25,
"completion_tokens": 8,
"total_tokens": 33
}
}
Streaming Responses¶
Set stream: true to receive Server-Sent Events (SSE):
curl -X POST https://api.traylinx.com/v1/chat/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"model": "openai/gpt-oss-120b", "messages": [...], "stream": true}'
Each SSE event contains a delta:
data: {"choices":[{"delta":{"content":"The"}}]}
data: {"choices":[{"delta":{"content":" capital"}}]}
data: {"choices":[{"delta":{"content":" is"}}]}
data: [DONE]
Supported Models¶
OpenAI Models¶
| Model | Context | Capability | Input Price | Output Price |
|---|---|---|---|---|
openai/gpt-oss-120b |
128K | Very High | $0.005/1K | $0.015/1K |
text-embedding-3-small |
8K | Embeddings | $0.00002/1K | - |
text-embedding-3-large |
8K | Embeddings | $0.00013/1K | - |
Anthropic Models (Claude)¶
| Model | Context | Capability | Input Price | Output Price |
|---|---|---|---|---|
claude-3-opus |
200K | Very High | $0.015/1K | $0.075/1K |
claude-3-sonnet |
200K | High | $0.003/1K | $0.015/1K |
claude-3-haiku |
200K | Medium | $0.00025/1K | $0.00125/1K |
claude-3.5-sonnet |
200K | Very High | $0.003/1K | $0.015/1K |
Groq Models (High-Performance)¶
| Model | Context | Capability | Input Price | Output Price |
|---|---|---|---|---|
llama-3.3-70b-versatile |
32K | High | $0.59/M | $0.79/M |
llama-3.1-8b-instant |
8K | Medium | $0.05/M | $0.08/M |
mixtral-8x7b |
32K | High | $0.24/M | $0.24/M |
deepseek-r1-distill-llama-70b |
128K | Very High | $0.75/M | $0.99/M |
Vision Models¶
| Model | Capability | Price |
|---|---|---|
llama-3.2-90b-vision |
Image understanding | $0.90/M |
llama-3.2-11b-vision |
Image understanding | $0.18/M |
Audio Models¶
| Model | Capability | Price |
|---|---|---|
whisper-v3-large |
High-quality transcription | $0.111/hour |
whisper-v3-turbo |
Fast transcription | $0.04/hour |
tts-1-hd |
Text-to-speech | $0.03/1K chars |
Auto Mode (IRA)¶
SwitchAI features an Intelligent Routing Agent (IRA) that automatically selects the optimal model based on your request. Set model: "auto" to enable automatic routing.
Quick Start¶
Routing Strategies¶
IRA supports three routing strategies:
| Strategy | Description | Use Case |
|---|---|---|
best |
Selects the highest-capability model | Complex reasoning, coding |
cheapest |
Selects the most cost-effective model | High-volume, simple tasks |
fastest |
Selects the lowest-latency model | Real-time applications |
Response Headers¶
When using model: "auto", SwitchAI returns these headers in the response:
| Header | Description |
|---|---|
X-IRA-Model |
The actual model selected (e.g., claude-3-sonnet) |
X-IRA-Strategy |
The routing strategy used (best, cheapest, fastest) |
X-IRA-Latency |
Time in ms taken for routing decision |
Example Response Headers¶
API Key Resolution¶
SwitchAI supports two API key modes:
1. Platform API Keys (Default)¶
Use Traylinx-managed API keys. Requests are billed to your organization's credit balance.
2. User-Provided API Keys (BYOK)¶
Bring your own OpenAI, Anthropic, or Groq keys. Store them in your Traylinx project settings, and the system will use them automatically.
Benefits of BYOK: - Use models even when they're "inactive" on the platform - Direct billing to your provider account - Higher rate limits from your provider
Priority Order: 1. User API key for the provider (highest priority) 2. Environment/platform API key (fallback)
Model Fallback System¶
When a primary model is unavailable, SwitchAI can automatically route to a fallback model:
Request: openai/gpt-oss-120b (inactive)
↓
Fallback enabled? → Yes, fallback to claude-3-haiku
↓
Result: Request served by Claude
Fallback is only triggered when: - The user does not have their own API key for the provider - The primary model is marked as inactive - A fallback model is configured and active
Wallet & Billing¶
Credit Validation Flow¶
For platform API key requests, the system validates credits in this order:
1. Project Wallet → Check project credits
↓ (insufficient)
2. Organization Wallet → Check org credits
↓ (insufficient)
3. Owner Wallet → Check organization owner's personal credits
↓
4. Insufficient at all levels → HTTP 402 Payment Required
Subscription Validation (BYOK Users)¶
Users with their own API keys must have an active subscription:
HTTP 403 Forbidden
{
"detail": "Valid subscription required",
"error": {
"organization_id": "org-123",
"subscription_found": false
}
}
Error Responses¶
Common Errors¶
| Status | Error | Description |
|---|---|---|
400 |
Model not available | The requested model is inactive or doesn't exist |
401 |
Invalid API key | The provided API key is invalid or expired |
402 |
Payment required | Insufficient credits in all wallets |
403 |
Subscription required | BYOK users need an active subscription |
429 |
Rate limit exceeded | Too many requests (5000/minute limit) |
500 |
Provider error | Upstream LLM provider returned an error |
Error Response Format¶
{
"detail": "Human-readable error message",
"error": {
"code": "INSUFFICIENT_CREDITS",
"required_amount": 0.05,
"checked_wallets": [...]
}
}
Rate Limits¶
| Tier | Requests/Minute | Concurrent Requests |
|---|---|---|
| Free | 60 | 5 |
| Pro | 1000 | 50 |
| Enterprise | 5000 | 200 |
Rate limit headers are included in every response:
Agent Models¶
SwitchAI supports specialized agent models for web search and scraping:
Search Agent¶
{
"model": "search-engine",
"messages": [{"role": "user", "content": "Latest AI news"}],
"agent_config": {
"max_results": 10,
"search_type": "web"
}
}
Scraping Agent¶
{
"model": "scrap-engine",
"messages": [{"role": "user", "content": "https://example.com"}],
"agent_config": {
"format": "markdown",
"extract_links": true
}
}
SDKs (SentinelPass A2A)¶
For agent-to-agent (A2A) OAuth authentication between any AI agents, use the SentinelPass SDKs:
- SentinelPass Python SDK - A2A OAuth authentication for Python agents
- SentinelPass JavaScript SDK - A2A OAuth authentication with full TypeScript support
Note: SentinelPass works with any agent ecosystem—not just Traylinx or Stargate.
Performance Tips¶
- Use streaming for long responses to reduce perceived latency
- Cache embeddings when processing the same content repeatedly
- Set appropriate
max_tokensto avoid unnecessary generation - Use faster models (e.g.,
llama-3.1-8b-instant) for simple tasks - Batch requests when processing multiple items