📚 Traylinx Cortex API Reference¶
Version: 2.2.0
Base URL: http://localhost:8000
Last Updated: 2025-12-04
📚 Navigation: Main README | Docs Index | Quick Reference | Integration Guide
🔐 Authentication¶
Traylinx Cortex uses a secure, token-based authentication system similar to GitHub Personal Access Tokens.
How It Works¶
- Registration: Call
POST /v1/userswith your user details to receive an API token - Token Storage: Store the token securely (it's only shown once)
- API Calls: Include the token in all subsequent requests via the
Authorizationheader
Headers¶
| Header | Required | Description |
|---|---|---|
Authorization |
✅ Yes (most endpoints) | Bearer <token> - Your API token |
X-Trace-ID |
❌ Optional | UUID for distributed tracing |
Token Format¶
Tokens follow the format: ctx_<random_string> (e.g., ctx_abc123xyz...)
Security Features¶
- Token Hashing: Tokens are hashed (SHA256) before storage - plaintext never stored
- Encryption at Rest: Sensitive data (API keys) encrypted using Fernet
- Ownership Isolation: Users can only access their own data
👤 User Management¶
Register User / Get Token¶
POST /v1/users
Creates a new user (or updates existing) and returns an API token. This is the entry point for new users.
Auth Required: ❌ No
Request Body:
{
"id": "550e8400-e29b-41d4-a716-446655440000",
"email": "user@example.com",
"first_name": "John",
"last_name": "Doe",
"switch_ai_api_key": "sk-your-api-key",
"custom_attributes": {
"company": "Acme Inc"
},
"token_name": "My Laptop"
}
| Field | Type | Required | Description |
|---|---|---|---|
id |
UUID | ✅ | User ID (from your auth system) |
email |
string | ✅ | User email address |
first_name |
string | ❌ | First name |
last_name |
string | ❌ | Last name |
switch_ai_api_key |
string | ❌ | Switch.AI API key (encrypted at rest) |
custom_attributes |
object | ❌ | Additional user metadata |
token_name |
string | ❌ | Name for the token (default: "Initial Token") |
Response (201 Created):
⚠️ Important: Save this token immediately! It's only shown once.
Get Current User¶
GET /v1/users/me
Returns details of the authenticated user.
Auth Required: ✅ Yes
Response (200 OK):
{
"id": "550e8400-e29b-41d4-a716-446655440000",
"email": "user@example.com",
"first_name": "John",
"last_name": "Doe",
"is_active": true,
"created_at": "2025-12-02T10:00:00Z",
"has_api_key": true,
"api_key_preview": "sk-...abc123"
}
| Field | Description |
|---|---|
has_api_key |
Whether user has a Switch.AI API key stored |
api_key_preview |
Masked preview of the key (null if no key set) |
Update API Key¶
PUT /v1/users/me/api-key
Update or clear the user's Switch.AI API key.
Auth Required: ✅ Yes
Request Body:
Response (200 OK):
{
"message": "API key updated successfully",
"has_api_key": true,
"api_key_preview": "sk-...abc123"
}
Create API Token¶
POST /v1/users/me/tokens
Generate a new API token for the authenticated user. Useful for multiple devices or applications.
Auth Required: ✅ Yes
Request Body:
Response (201 Created):
List API Tokens¶
GET /v1/users/me/tokens
List all active tokens for the authenticated user. Token values are masked for security.
Auth Required: ✅ Yes
Response (200 OK):
{
"tokens": [
{
"id": "token-uuid-1",
"name": "My Laptop",
"token_prefix": "ctx_abc123...",
"last_used_at": "2025-12-02T12:00:00Z",
"created_at": "2025-12-01T10:00:00Z"
},
{
"id": "token-uuid-2",
"name": "Mobile App",
"token_prefix": "ctx_xyz789...",
"last_used_at": null,
"created_at": "2025-12-02T09:00:00Z"
}
]
}
Revoke API Token¶
DELETE /v1/users/me/tokens/{token_id}
Revoke (delete) a specific API token. The token becomes immediately invalid.
Auth Required: ✅ Yes
Response (204 No Content)
Error Response (404 Not Found):
List Sessions¶
GET /v1/users/me/sessions
List all sessions for the current user with pagination.
Auth Required: ✅ Yes
Query Parameters:
- limit: int (default 20)
- offset: int (default 0)
- search: string (optional, filter by title)
Response (200 OK):
{
"sessions": [
{
"id": "session-uuid-1",
"title": "Trip to Tokyo",
"app_id": "my-app",
"created_at": "2025-12-02T12:00:00Z",
"updated_at": "2025-12-02T12:05:00Z",
"message_count": 5
}
],
"total": 1,
"limit": 20,
"offset": 0
}
Get User Profile¶
GET /v1/users/me/profile
Get user profile facts.
Auth Required: ✅ Yes
Query Parameters:
- app_id: string (default "default")
Response (200 OK):
{
"user_id": "user-uuid",
"app_id": "default",
"facts": {
"name": "John Doe",
"preferences": "dark mode"
},
"updated_at": "2025-12-02T12:00:00Z"
}
Update User Profile¶
PATCH /v1/users/me/profile
Update user profile facts (merge with existing).
Auth Required: ✅ Yes
Request Body:
Response (200 OK):
{
"user_id": "user-uuid",
"app_id": "default",
"facts": {
"name": "John Doe",
"preferences": "dark mode",
"location": "New York"
},
"updated_at": "2025-12-02T12:10:00Z"
}
Delete User Profile¶
DELETE /v1/users/me/profile
Delete user profile facts.
Auth Required: ✅ Yes
Query Parameters:
- app_id: string (default "default")
Response (204 No Content)
Extract Facts from Content¶
POST /v1/users/me/profile/extract
Extract structured user facts from unstructured content using AI. This endpoint is generic and app-agnostic - any application (web, mobile, etc.) can send content in various formats.
Auth Required: ✅ Yes
Query Parameters:
- app_id: string (default "default")
Request Body:
Accepts any combination of the following fields:
| Field | Type | Description |
|---|---|---|
text |
string | Natural language text (e.g., "My name is Sebastian, I live in Berlin") |
data |
object | Structured JSON object with any fields (nested objects are flattened) |
raw |
string | Any raw content to analyze |
Example Requests:
From natural language:
From structured app data:
{
"data": {
"firstName": "John",
"lastName": "Doe",
"email": "john@example.com",
"address": {
"city": "Berlin",
"country": "Germany"
}
}
}
From raw content:
Combined (all formats):
{
"text": "I love hiking and photography",
"data": {"hobby": "travel"},
"raw": "favorite_color=blue"
}
Response (200 OK):
{
"facts": {
"firstName": "John",
"lastName": "Doe",
"email": "john@example.com",
"city": "Berlin",
"country": "Germany",
"hobby": "hiking"
}
}
LLM Prompt Used:
The endpoint uses this prompt to extract facts:
You are a fact extraction system. Extract user profile facts from the following content.
Return ONLY a valid JSON object with key-value pairs representing facts about the person.
Rules:
- Keys should be camelCase (e.g., "firstName", "dogName", "favoriteColor", "workLocation")
- Keep keys short, clear, and descriptive
- Values should be the extracted information as strings
- Only extract factual, personal information about the user
- Normalize similar fields (e.g., "first_name", "firstName", "name" -> use "name" or "firstName")
- For addresses, extract as separate fields: street, city, state, country, postalCode
- For dates, keep in ISO format if possible (YYYY-MM-DD)
- If no facts can be extracted, return an empty object {}
- Do NOT include sensitive data like passwords
Content to analyze:
{combined_content}
Return only the JSON object, no explanation or markdown:
Use Cases: - Import user profile from authentication systems - Parse natural language chat messages - Process form data from mobile apps - Extract facts from any structured or unstructured data
Error Handling (LLM Response Sanitization):
LLMs can return malformed responses. The endpoint uses a robust JSON sanitizer that handles:
| Issue | How It's Handled |
|---|---|
| Markdown code blocks | Strips ```json ... ``` wrappers |
| Mixed text + JSON | Extracts JSON object from surrounding text |
| Trailing commas | Removes {"key": "value",} → {"key": "value"} |
| Single quotes | Converts Python-style 'key' to "key" |
| Unquoted keys | Converts {key: "value"} to {"key": "value"} |
| Python booleans | Converts True/False/None to true/false/null |
| Truncated JSON | Attempts to close unclosed braces |
| Empty/whitespace | Returns empty facts object {} |
| Complete failure | Fallback to regex key-value extraction |
Validation Applied:
- Empty/null values are removed
- Placeholder values (unknown, N/A) are removed
- Keys limited to 64 characters
- Values limited to 1000 characters
- Invalid keys converted to camelCase
The endpoint never fails with 500 for LLM parsing issues - it gracefully returns {"facts": {}} if extraction fails completely.
🩺 Health & Observability¶
Get Basic Health¶
GET /health
Returns the basic status of the service.
Auth Required: ❌ No
Response (200 OK):
Get Liveness Status¶
GET /health/live
Detailed check of all dependencies (PostgreSQL, Redis, LLM).
Auth Required: ❌ No
Response (200 OK):
{
"status": "healthy",
"checks": {
"postgres": {
"status": "healthy",
"latency_ms": 1.2
},
"redis": {
"status": "healthy",
"latency_ms": 0.5
},
"llm": {
"status": "healthy"
}
},
"timestamp": "2025-12-02T12:00:00Z"
}
Possible Status Values:
- healthy - All systems operational
- degraded - Some systems have issues but service is functional
- unhealthy - Critical systems are down
Get Readiness Status¶
GET /ready
Kubernetes readiness probe. Returns 200 if ready to serve traffic, 503 otherwise.
Auth Required: ❌ No
Response (200 OK):
Response (503 Service Unavailable):
💬 Sessions¶
Create Session¶
POST /v1/session
Initialize a new conversation session.
Auth Required: ✅ Yes
Request Body:
Response (201 Created):
Get Session Details¶
GET /v1/session/{session_id}
Retrieve details of a specific session.
Auth Required: ✅ Yes
Response (200 OK):
{
"id": "550e8400-e29b-41d4-a716-446655440000",
"user_id": "user-123",
"app_id": "my-app",
"title": "Trip to Tokyo",
"created_at": "2025-12-02T12:00:00Z",
"updated_at": "2025-12-02T12:05:00Z",
"metadata": {
"source": "web"
}
}
Update Session¶
PATCH /v1/session/{session_id}
Update session title or metadata.
Auth Required: ✅ Yes
Request Body:
Response (200 OK):
Get Session History¶
GET /v1/session/{session_id}/history
Retrieve full message history for a session in chronological order.
Auth Required: ✅ Yes
Response (200 OK):
{
"session_id": "550e8400-e29b-41d4-a716-446655440000",
"messages": [
{
"id": "msg-uuid-1",
"role": "user",
"content": "Hello",
"token_count": 2,
"created_at": "2025-12-02T12:00:00Z"
},
{
"id": "msg-uuid-2",
"role": "assistant",
"content": "Hi there! How can I help you today?",
"token_count": 10,
"created_at": "2025-12-02T12:00:01Z"
}
]
}
Delete Session¶
DELETE /v1/session/{session_id}
Clears Short-Term Memory (STM) for the session. Long-Term Memory (LTM) is preserved.
Auth Required: ✅ Yes
Response (204 No Content)
🤖 Chat¶
Send Message¶
POST /v1/chat
Send a message to the AI agent. Supports both standard and streaming responses.
Auth Required: ✅ Yes
Request Body:
{
"session_id": "550e8400-e29b-41d4-a716-446655440000",
"message": "What is the capital of Japan?",
"config": {
"stream": false,
"model_preference": "balanced",
"switch_ai_api_key": null,
"embedding_model": null
}
}
Config Options:
| Field | Type | Default | Description |
|---|---|---|---|
stream |
boolean | false |
Enable SSE streaming |
model_preference |
string | "balanced" |
"fast", "balanced", or "powerful" |
switch_ai_api_key |
string | null |
Override API key (uses stored key if not provided) |
embedding_model |
string | null |
Embedding model (e.g., mistral-embed) |
💡 API Key Fallback Chain: The chat endpoint uses API keys in this priority order: 1.
switch_ai_api_keyin config (highest priority) 2. User's stored key (set viaPUT /v1/users/me/api-key) 3.SWITCH_AI_API_KEYenvironment variable (server default)
Response (200 OK - Non-streaming):
{
"message_id": "msg-uuid",
"content": "The capital of Japan is Tokyo.",
"usage": {
"tokens_in": 20,
"tokens_out": 10
},
"cost_usd": 0.00015,
"model": "gpt-4o"
}
Response (200 OK - Streaming SSE):
When stream: true, returns Server-Sent Events:
event: message
data: {"chunk": "The", "id": "msg-123"}
event: message
data: {"chunk": " capital", "id": "msg-123"}
event: message
data: {"chunk": " of Japan is Tokyo.", "id": "msg-123"}
event: done
data: {"status": "completed", "usage": {"tokens_in": 20, "tokens_out": 10}, "cost_usd": 0.00015, "model": "gpt-4o"}
Event Types:
- message - Content chunk
- done - Stream completed with final stats
- error - Error occurred during streaming
🧠 Memory¶
Search Memory¶
POST /v1/memory/search
Search user's long-term memory by semantic similarity. Uses the user's stored Switch.AI API key for embedding generation (set via PUT /v1/users/me/api-key).
Auth Required: ✅ Yes
Request Body:
{
"query": "What is my favorite color?",
"limit": 15,
"offset": 0,
"min_similarity": 0.6,
"app_id": "default",
"switch_ai_api_key": null
}
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
query |
string | ✅ | - | Search query text |
app_id |
string | ❌ | "default" |
Application identifier |
limit |
integer | ❌ | 15 | Max results (1-50) |
offset |
integer | ❌ | 0 | Pagination offset |
min_similarity |
float | ❌ | 0.6 | Minimum similarity threshold (0.0-1.0) |
switch_ai_api_key |
string | ❌ | null |
Override API key (uses stored key if not provided) |
Response (200 OK):
{
"results": [
{
"id": "mem-uuid-1",
"content": "User's favorite color is blue",
"similarity": 0.92,
"created_at": "2025-12-01T10:00:00Z"
}
],
"total": 1,
"limit": 15,
"offset": 0
}
💡 API Key Fallback Chain: The search endpoint uses API keys in this priority order: 1.
switch_ai_api_keyin request body (highest priority) 2. User's stored key (set viaPUT /v1/users/me/api-key) 3.SWITCH_AI_API_KEYenvironment variable (server default)
List User Memories¶
GET /v1/memory/me
Retrieve all memories for the authenticated user with pagination support. Memories are returned with importance scores and temporal decay applied.
Auth Required: ✅ Yes
Query Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
app_id |
string | ✅ | - | Application identifier |
limit |
integer | ❌ | 15 | Number of results (1-100) |
offset |
integer | ❌ | 0 | Pagination offset |
Request Example:
curl -X GET "https://api.traylinx.com/v1/memory/me?app_id=my-app&limit=20&offset=0" \
-H "Authorization: Bearer ctx_your_token_here"
Response (200 OK):
{
"memories": [
{
"id": "550e8400-e29b-41d4-a716-446655440000",
"content": "User is allergic to peanuts",
"importance_score": 0.9,
"tags": ["medical", "allergy"],
"created_at": "2025-12-01T10:00:00Z"
},
{
"id": "660e8400-e29b-41d4-a716-446655440001",
"content": "User prefers dark mode",
"importance_score": 0.7,
"tags": ["preference", "ui"],
"created_at": "2025-12-02T14:30:00Z"
}
],
"total": 42,
"limit": 20,
"offset": 0
}
Importance Score Levels: - 0.9: Medical/Safety (allergies, medications, emergencies) - 0.85: Identity (name, personal identifiers) - 0.75: Summaries (conversation summaries) - 0.7: Preferences (likes, dislikes, wants) - 0.65: Decisions (plans, choices) - 0.5: Default (general information)
Delete Single Memory¶
DELETE /v1/memory/{memory_id}
Delete a specific memory by ID. Only the memory owner can delete it.
Auth Required: ✅ Yes
Path Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
memory_id |
UUID | ✅ | Memory identifier |
Request Example:
curl -X DELETE "https://api.traylinx.com/v1/memory/550e8400-e29b-41d4-a716-446655440000" \
-H "Authorization: Bearer ctx_your_token_here"
Response (200 OK):
Error Responses:
| Status | Code | Description |
|---|---|---|
| 404 | MEMORY_NOT_FOUND |
Memory doesn't exist or not owned by user |
| 401 | AUTHENTICATION_FAILED |
Invalid or missing token |
Delete All User Memories (GDPR)¶
DELETE /v1/memory/user/all
Delete all memories for the authenticated user. This endpoint supports GDPR "right to be forgotten" compliance.
Auth Required: ✅ Yes
Query Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
app_id |
string | ✅ | Application identifier |
Request Example:
curl -X DELETE "https://api.traylinx.com/v1/memory/user/all?app_id=my-app" \
-H "Authorization: Bearer ctx_your_token_here"
Response (200 OK):
⚠️ Warning: This action is irreversible. All memories for the user in the specified app will be permanently deleted.
Find Duplicate Memories¶
GET /v1/memory/duplicates
Find groups of semantically similar (duplicate) memories. Useful for identifying redundant information before cleanup.
Auth Required: ✅ Yes
Query Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
app_id |
string | ✅ | Application identifier |
Request Example:
curl -X GET "https://api.traylinx.com/v1/memory/duplicates?app_id=my-app" \
-H "Authorization: Bearer ctx_your_token_here"
Response (200 OK):
{
"groups": [
{
"canonical_id": "550e8400-e29b-41d4-a716-446655440000",
"canonical_content": "User's name is Sebastian",
"duplicates": [
{
"id": "660e8400-e29b-41d4-a716-446655440001",
"content": "User is referred to as Sebastian",
"created_at": "2025-12-02T14:30:00Z",
"similarity_to_canonical": 0.92
},
{
"id": "770e8400-e29b-41d4-a716-446655440002",
"content": "User is called Sebastian",
"created_at": "2025-12-03T10:00:00Z",
"similarity_to_canonical": 0.89
}
]
}
],
"total_duplicates": 2
}
Response Fields:
- groups: Array of duplicate groups, each containing a canonical (oldest) memory and its duplicates
- canonical_id: ID of the memory to keep (oldest in the group)
- canonical_content: Content of the canonical memory
- duplicates: Array of duplicate memories with similarity scores
- total_duplicates: Total count of duplicate memories across all groups
Merge Duplicate Memories¶
POST /v1/memory/deduplicate
Merge duplicate memory groups by keeping the oldest (canonical) memory and deleting all newer duplicates.
Auth Required: ✅ Yes
Query Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
app_id |
string | ✅ | Application identifier |
Request Example:
curl -X POST "https://api.traylinx.com/v1/memory/deduplicate?app_id=my-app" \
-H "Authorization: Bearer ctx_your_token_here"
Response (200 OK):
Response Fields:
- success: Whether the operation completed successfully
- deleted_count: Total number of duplicate memories deleted
- groups_merged: Number of duplicate groups that were merged
💡 Tip: Run
GET /v1/memory/duplicatesfirst to preview what will be merged before calling this endpoint.
🔄 Agent-to-Agent (A2A)¶
Create Conversation (A2A)¶
POST /a2a/conversation/create
Create a session using the A2A envelope protocol.
Request Body:
{
"envelope": {
"message_id": "msg-source-1",
"sender_agent_key": "agent-client",
"timestamp": "2025-12-02T12:00:00Z"
},
"user_id": "user-123",
"app_id": "my-app"
}
Response (201 Created):
{
"envelope": {
"message_id": "msg-resp-1",
"sender_agent_key": "traylinx-cortex",
"timestamp": "2025-12-02T12:00:01Z",
"in_reply_to": "msg-source-1"
},
"session_id": "...",
"created_at": "..."
}
Chat (A2A)¶
POST /a2a/conversation/chat
Send a message using the A2A envelope protocol.
Request Body:
{
"envelope": {
"message_id": "msg-source-2",
"sender_agent_key": "agent-client",
"timestamp": "2025-12-02T12:00:10Z"
},
"action": "chat",
"session_id": "...",
"message": "Hello",
"user_id": "user-123"
}
⚠️ Error Handling¶
All errors follow a standardized format:
{
"error": {
"code": "ERROR_CODE",
"message": "Human readable message",
"trace_id": "abc-123",
"details": {}
}
}
Common Error Codes¶
| Code | HTTP Status | Description |
|---|---|---|
SESSION_NOT_FOUND |
404 | Session ID does not exist |
VALIDATION_ERROR |
400 | Invalid request parameters |
AUTHENTICATION_FAILED |
401 | Invalid or missing token |
TOKEN_BUDGET_EXCEEDED |
400 | Request exceeds token limits |
LLM_SERVICE_ERROR |
502 | Upstream LLM provider failed |
INTERNAL_ERROR |
500 | Unexpected server error |
📋 Implementation Status¶
✅ Fully Implemented¶
| Endpoint | Method | Description |
|---|---|---|
/v1/users |
POST | User registration |
/v1/users/me |
GET | Get current user |
/v1/users/me/tokens |
POST | Create token |
/v1/users/me/tokens |
GET | List tokens |
/v1/users/me/tokens/{id} |
DELETE | Revoke token |
/v1/users/me/api-key |
PUT | Update Switch.AI API key |
/v1/users/me/sessions |
GET | List user sessions |
/v1/users/me/profile |
GET | Get user profile |
/v1/users/me/profile |
PATCH | Update profile |
/v1/users/me/profile |
DELETE | Clear profile |
/v1/session |
POST | Create session |
/v1/session/{id} |
GET | Get session |
/v1/session/{id} |
PATCH | Update session |
/v1/session/{id} |
DELETE | Delete session |
/v1/session/{id}/history |
GET | Get history |
/v1/chat |
POST | Chat completion |
/v1/memory/search |
POST | Semantic memory search |
/v1/memory/me |
GET | List user memories |
/v1/memory/{id} |
DELETE | Delete single memory |
/v1/memory/user/all |
DELETE | Delete all user memories (GDPR) |
/v1/memory/duplicates |
GET | Find duplicate memory groups |
/v1/memory/deduplicate |
POST | Merge duplicate memories |
/health |
GET | Basic health |
/health/live |
GET | Detailed health |
/ready |
GET | Readiness check |
🚧 Planned (Not Yet Implemented)¶
| Endpoint | Method | Description |
|---|---|---|
/v1/session/{id}/context |
GET | Get assembled context |
/v1/memory/consolidate |
POST | Force memory consolidation |
🔧 Configuration¶
Environment Variables¶
| Variable | Required | Default | Description |
|---|---|---|---|
DATABASE_URL |
✅ | - | PostgreSQL connection URL |
REDIS_URL |
✅ | - | Redis connection URL |
ENCRYPTION_KEY |
✅ | - | Fernet key for data encryption |
LLM_BASE_URL |
❌ | Switch.AI | LLM provider base URL |
EMBEDDING_BASE_URL |
❌ | Switch.AI | Embedding provider URL |
SWITCH_AI_API_KEY |
❌ | - | Default API key (fallback when no user key is set) |
MODEL_FAST |
❌ | openai/gemini-2.5-flash |
Fast model with provider prefix |
MODEL_BALANCED |
❌ | openai/llama-3.3-70b-versatile |
Balanced model with provider prefix |
MODEL_POWERFUL |
❌ | openai/deepseek-r1-distill-llama-70b |
Powerful model with provider prefix |
CELERY_BROKER_URL |
❌ | redis://redis:6379/1 |
Celery message broker URL |
CELERY_RESULT_BACKEND |
❌ | redis://redis:6379/2 |
Celery task results store URL |
MEMORY_TEMPORAL_DECAY |
❌ | 0.05 |
Memory decay rate per day (0.0-1.0) |
MEMORY_MIN_IMPORTANCE |
❌ | 0.3 |
Minimum importance score to store memory (0.0-1.0) |
MEMORY_MAX_AGE_DAYS |
❌ | 365 |
Maximum age in days for memories in search |
MEMORY_DEDUP_SIMILARITY_THRESHOLD |
❌ | 0.85 |
Minimum similarity to consider memories as duplicates (0.0-1.0) |
MEMORY_DEDUP_LLM_CHECK_ENABLED |
❌ | true |
Enable LLM-based semantic equivalence check for borderline cases |
MEMORY_DEDUP_LLM_THRESHOLD_LOW |
❌ | 0.85 |
Lower bound similarity for LLM equivalence check |
MEMORY_DEDUP_LLM_THRESHOLD_HIGH |
❌ | 0.95 |
Upper bound similarity (above this, skip LLM check) |
⚠️ LiteLLM Provider Prefixes: Model names must include provider prefixes when using LiteLLM routing. Common prefixes: -
openai/- For OpenAI-compatible endpoints (including SwitchAI) -anthropic/- For Anthropic models -google/- For Google Gemini models - See LiteLLM docs for full list
Memory System Configuration¶
The memory system uses temporal decay and importance scoring to prioritize relevant information:
Temporal Decay (MEMORY_TEMPORAL_DECAY):
- Controls how quickly memories lose relevance over time
- Formula: weight = exp(-decay_rate × days_old)
- Default 0.05 means memories retain ~22% weight after 30 days
- Lower values = slower decay (memories stay relevant longer)
- Higher values = faster decay (prioritize recent information)
Minimum Importance (MEMORY_MIN_IMPORTANCE):
- Filters out low-importance memories during storage
- Memories below this threshold are not saved
- Default 0.3 filters trivial information while keeping preferences (0.7) and critical info (0.9)
Maximum Age (MEMORY_MAX_AGE_DAYS):
- Excludes memories older than this from search results
- Improves performance by reducing search space
- Default 365 days (1 year)
Memory Deduplication Configuration¶
The memory system automatically deduplicates semantically similar facts during extraction to prevent storing redundant information like "User's name is Sebastian" and "User is called Sebastian" as separate memories.
Similarity Threshold (MEMORY_DEDUP_SIMILARITY_THRESHOLD):
- Minimum cosine similarity to consider two memories as duplicates
- Default 0.85 catches most semantic duplicates
- Higher values = stricter matching (fewer duplicates detected)
- Lower values = looser matching (more aggressive deduplication)
LLM Check Enabled (MEMORY_DEDUP_LLM_CHECK_ENABLED):
- When true, uses LLM to verify semantic equivalence for borderline cases
- Adds accuracy but increases latency and cost
- Set to false to rely solely on embedding similarity
LLM Threshold Range (MEMORY_DEDUP_LLM_THRESHOLD_LOW / HIGH):
- Defines the "borderline" similarity range where LLM check is triggered
- Below LOW: Not a duplicate (store the memory)
- Between LOW and HIGH: Ask LLM to verify equivalence
- Above HIGH: Definite duplicate (skip without LLM check)
- Default range [0.85, 0.95] balances accuracy and cost
Deduplication Flow:
New Fact → Normalize → Generate Embedding → Search Similar
↓
similarity < 0.85 → STORE
similarity ≥ 0.95 → SKIP (duplicate)
0.85 ≤ similarity < 0.95 → LLM Check
↓
LLM says equivalent → SKIP
LLM says different → STORE