Skip to content

🎯 Traylinx Cortex - Developer Quick Reference

Version: 2.3.0
Last Updated: December 4, 2025

📚 Navigation: Main README | Docs Index | API Reference | Database Schema


This is your one-page cheat sheet for understanding Traylinx Cortex. For deep dives, see the full documentation.


🧠 What is Cortex?

In one sentence: A stateful, memory-enabled middleware that sits between your app and LLMs, making AI conversations actually remember things.

The Problem It Solves: - LLMs are stateless (goldfish memory) - Passing full history is expensive and slow - No standard way to retrieve "facts" from past conversations

The Cortex Solution: - Short-Term Memory (STM): Redis cache of recent messages (4,000 token budget) - Long-Term Memory (LTM): PostgreSQL vector store of facts (searchable via embeddings) - Smart Routing: LiteLLM automatically picks the best/cheapest model


📐 Architecture (30-Second Version)

User Message → Cortex API
1. Fetch recent messages from Redis (STM)
2. Search vector DB for relevant facts (LTM)
3. Build smart prompt: System + Memories + History
4. Route to LLM (GPT-4, Claude, etc)
5. Stream response back to user
6. Save to DB + trigger background worker
Background: Extract facts, embed them, store in LTM

🗃️ Database Schema (Simplified)

Table Purpose Key Columns
users User accounts id, email, switch_ai_api_key_encrypted
api_tokens Bearer tokens user_id, token_hash, token_prefix
user_profiles User facts user_id, app_id, facts (JSONB)
sessions Conversation threads id, user_id, app_id, title
messages Raw chat log session_id, role, content, token_count
memories Vector store user_id, content, embedding (VECTOR)
usage_logs Cost tracking session_id, model, tokens_in/out, cost_usd
langgraph_checkpoints LangGraph state persistence thread_id, checkpoint, updated_at

Special: memories.embedding vector dimensions configurable via EMBEDDING_DIMENSIONS env var (default: 1024 for mistral-embed)


🔐 Authentication

Cortex uses bearer token authentication similar to GitHub Personal Access Tokens.

Register & Get Token

POST /v1/users
{
  "id": "uuid",
  "email": "user@example.com",
  "token_name": "My Laptop"
}
 {"access_token": "ctx_abc123...", "token_type": "Bearer"}

Use Token

curl -H "Authorization: Bearer ctx_abc123..." \
  http://localhost:8000/v1/users/me

Token Management

GET  /v1/users/me/tokens     # List all tokens
POST /v1/users/me/tokens     # Create new token
DELETE /v1/users/me/tokens/{id}  # Revoke token

🔌 API Endpoints (Core)

Create Session

POST /v1/session
Authorization: Bearer ctx_abc123...
{
  "app_id": "my_app"
}
 {"session_id": "uuid"}

Send Message

POST /v1/chat
Authorization: Bearer ctx_abc123...
{
  "session_id": "uuid",
  "message": "Hello",
  "config": {
    "stream": true,
    "model_preference": "balanced",
    "switch_ai_api_key": "sk-user-key",
    "embedding_model": "mistral-embed"
  }
}
 SSE stream of chunks

Clear Context

DELETE /v1/session/{session_id}
Authorization: Bearer ctx_abc123...
# Clears STM (Redis), keeps LTM

User Profile

# Get profile facts
GET /v1/users/me/profile?app_id=default

# Update profile facts (merge)
PATCH /v1/users/me/profile?app_id=default
{ "facts": { "name": "Sebastian", "location": "Madrid" } }

# Extract facts from any content (AI-powered)
POST /v1/users/me/profile/extract?app_id=default
{
  "text": "I'm a developer living in Berlin",     // Natural language
  "data": {"firstName": "John", "city": "NYC"},  // Structured JSON
  "raw": "name=Sebastian loc=Spain"              // Raw content
}
 {"facts": {"job": "developer", "city": "Berlin"}}

# Delete all profile facts
DELETE /v1/users/me/profile?app_id=default

🧩 Key Technologies

Component Tech Why?
Web Framework FastAPI Async, auto-docs, fast
Orchestration LangGraph State machines for conversations
LLM Router LiteLLM 100+ models, fallbacks, retries
Database PostgreSQL 16 Reliable, ACID, pgvector support
Vector Search pgvector HNSW index = millisecond queries
Cache Redis 7 Ultra-fast STM, Celery queue
Background Jobs Celery Async memory consolidation
PII Scrubbing Presidio Redact credit cards, SSNs
Auth Traylinx Sentinel A2A authentication

🔑 Key Concepts

Token Budget

  • STM: 4,000 tokens (recent conversation)
  • LTM: 1,000 tokens (relevant facts from vector search)
  • Total Context: ~5,000 tokens (well under GPT-4's limit)

Memory Consolidation (Background)

After every chat turn: 1. Check if STM > 4,000 tokens → Summarize oldest messages 2. Extract new facts from the conversation 3. Normalize facts (e.g., "User is called X" → "User's name is X") 4. Deduplicate against existing memories (similarity ≥ 0.85) 5. Embed facts → Store in memories table

Memory Deduplication

Prevents storing semantically identical facts with different phrasings: - Automatic: New facts are deduplicated during extraction - Manual cleanup: GET /v1/memory/duplicates + POST /v1/memory/deduplicate - Configurable: Adjust thresholds via MEMORY_DEDUP_* env vars

Search endpoints now support pagination and similarity thresholds: - Pagination: All list/search endpoints support limit (default 15) and offset - Similarity Threshold: min_similarity (default 0.6) filters noise from semantic searches - Ordering: Results ordered by raw similarity (highest first) for consistency - Configurable: Set MEMORY_SEARCH_MIN_SIMILARITY env var for default threshold

LLM Routing Strategies

  • Fast: uses gpt-4o-mini or haiku (cheap)
  • Balanced: uses gpt-4o or claude-3-5-sonnet
  • Powerful: uses o1-preview or claude-opus
  • Dynamic Keys: Pass switch_ai_api_key in config for per-user billing

Base URLs: Configure LLM_BASE_URL and EMBEDDING_BASE_URL env vars to point to your proxy. API keys are passed dynamically per-request for billing attribution.


🛡️ Security Features

  1. Token Authentication: SHA256 hashed tokens, never stored in plaintext
  2. API Key Encryption: User API keys encrypted at rest (Fernet)
  3. PII Scrubbing: Credit cards, SSNs, phone numbers auto-redacted before LLM
  4. Multi-Tenancy: All queries filtered by (app_id, user_id)
  5. Row-Level Security: PostgreSQL RLS enforces data isolation
  6. Sentinel Auth: All A2A requests require valid Bearer token

📊 Observability

Tracing: Every request gets a trace_id (LangSmith or OpenTelemetry)
Logging: Structured JSON logs
Metrics: Cost per user, tokens, latency

Example Log:

{
  "trace_id": "abc-123",
  "event": "llm_response",
  "model": "gpt-4o",
  "tokens_in": 50,
  "tokens_out": 120,
  "cost_usd": 0.00015,
  "latency_ms": 850
}


🚀 Development Phases

Phase Goal Duration
1. Skeleton FastAPI + Sentinel auth 2 days
2. Data Layer Postgres + Redis setup 2 days
3. Orchestrator LangGraph + LiteLLM 3 days
4. LTM Vector search + Celery 3 days
5. Production PII, tests, monitoring 1+ week

Total: ~2-3 weeks for MVP


✅ Definition of Done (DoD)

A deployment is production-ready when:

  • [x] POST /v1/chat returns streaming responses
  • [x] Memory Test: "My name is X" → restart → "What is my name?" = correct
  • [x] Security Test: Invalid token → 401
  • [x] PII Test: "My card is 4111..." → Gets redacted
  • [x] Health checks pass (/health, /health/live)
  • [x] Code passes mypy + ruff
  • [x] Test coverage ≥ 80%

🐛 Common Gotchas

  1. Embedding Mismatch: If you change embedding models, set EMBEDDING_DIMENSIONS env var to match (e.g., 1024 for mistral-embed, 1536 for OpenAI)
  2. Token Overflow: Always count tokens BEFORE sending to LLM (use tiktoken)
  3. Session Expiry: Redis STM has 24h TTL - LTM persists forever
  4. PII in Vectors: Never store unsanitized content in memories table
  5. Celery Retry: Background jobs WILL fail - ensure retry logic configured

📚 Where to Learn More

Question Document
Why does this exist? docs/gcam_research.md
How does it work? docs/specs_v2_final.md
What's the database design? docs/./DATABASE_SCHEMA.md
How do I build it? docs/implementation_plan_v2_final.md
How does it integrate? docs/./integration_guide.md

🎓 Mental Model

Think of Cortex as a brain prosthetic for LLMs:

  • Hippocampus (LTM): Stores long-term facts
  • Working Memory (STM): Keeps recent context active
  • Cerebellum (LangGraph): Coordinates the retrieval-thinking-response flow
  • Corpus Callosum (LiteLLM): Connects to different "thought processes" (models)

Without Cortex, your LLM is like someone with amnesia - brilliant in the moment, but forgets everything immediately.


💬 Quick Commands

# Start everything
docker-compose up -d

# Run migrations
alembic upgrade head

# Start API
uvicorn app.main:app --reload

# Run tests
pytest tests/ -v --cov=app

# Type check
mypy app/

# Lint
ruff check app/

# Register user & get token
curl -X POST localhost:8000/v1/users \
  -H "Content-Type: application/json" \
  -d '{"id": "550e8400-e29b-41d4-a716-446655440000", "email": "test@example.com"}'

# Create session (with token)
curl -X POST localhost:8000/v1/session \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ctx_your_token_here" \
  -d '{"app_id": "test"}'

Need Help? Read docs/specs_v2_final.md → Contact team → Start coding! 🚀


**[← Back to Docs Index](./README.md)** | **[Main README](../README.md)** | **[API Reference →](././api_reference.md)**