🎯 Traylinx Cortex - Developer Quick Reference¶

Version: 2.3.0
Last Updated: December 4, 2025

📚 Navigation: Main README | Docs Index | API Reference | Database Schema

This is your one-page cheat sheet for understanding Traylinx Cortex. For deep dives, see the full documentation.

🧠 What is Cortex?¶

In one sentence: A stateful, memory-enabled middleware that sits between your app and LLMs, making AI conversations actually remember things.

The Problem It Solves: - LLMs are stateless (goldfish memory) - Passing full history is expensive and slow - No standard way to retrieve "facts" from past conversations

The Cortex Solution: - Short-Term Memory (STM): Redis cache of recent messages (4,000 token budget) - Long-Term Memory (LTM): PostgreSQL vector store of facts (searchable via embeddings) - Smart Routing: LiteLLM automatically picks the best/cheapest model

📐 Architecture (30-Second Version)¶

User Message → Cortex API
  ↓
1. Fetch recent messages from Redis (STM)
2. Search vector DB for relevant facts (LTM)
3. Build smart prompt: System + Memories + History
4. Route to LLM (GPT-4, Claude, etc)
5. Stream response back to user
6. Save to DB + trigger background worker
  ↓
Background: Extract facts, embed them, store in LTM

🗃️ Database Schema (Simplified)¶

Table	Purpose	Key Columns
`users`	User accounts	`id`, `email`, `switch_ai_api_key_encrypted`
`api_tokens`	Bearer tokens	`user_id`, `token_hash`, `token_prefix`
`user_profiles`	User facts	`user_id`, `app_id`, `facts (JSONB)`
`sessions`	Conversation threads	`id`, `user_id`, `app_id`, `title`
`messages`	Raw chat log	`session_id`, `role`, `content`, `token_count`
`memories`	Vector store	`user_id`, `content`, `embedding (VECTOR)`
`usage_logs`	Cost tracking	`session_id`, `model`, `tokens_in/out`, `cost_usd`
`langgraph_checkpoints`	LangGraph state persistence	`thread_id`, `checkpoint`, `updated_at`

Special: memories.embedding vector dimensions configurable via EMBEDDING_DIMENSIONS env var (default: 1024 for mistral-embed)

🔐 Authentication¶

Cortex uses bearer token authentication similar to GitHub Personal Access Tokens.

Register & Get Token¶

POST /v1/users
{
  "id": "uuid",
  "email": "user@example.com",
  "token_name": "My Laptop"
}
→ {"access_token": "ctx_abc123...", "token_type": "Bearer"}

Use Token¶

curl -H "Authorization: Bearer ctx_abc123..." \
  http://localhost:8000/v1/users/me

Token Management¶

GET  /v1/users/me/tokens     # List all tokens
POST /v1/users/me/tokens     # Create new token
DELETE /v1/users/me/tokens/{id}  # Revoke token

🔌 API Endpoints (Core)¶

Create Session¶

POST /v1/session
Authorization: Bearer ctx_abc123...
{
  "app_id": "my_app"
}
→ {"session_id": "uuid"}

Send Message¶

POST /v1/chat
Authorization: Bearer ctx_abc123...
{
  "session_id": "uuid",
  "message": "Hello",
  "config": {
    "stream": true,
    "model_preference": "balanced",
    "switch_ai_api_key": "sk-user-key",
    "embedding_model": "mistral-embed"
  }
}
→ SSE stream of chunks

Clear Context¶

DELETE /v1/session/{session_id}
Authorization: Bearer ctx_abc123...
# Clears STM (Redis), keeps LTM

User Profile¶

# Get profile facts
GET /v1/users/me/profile?app_id=default

# Update profile facts (merge)
PATCH /v1/users/me/profile?app_id=default
{ "facts": { "name": "Sebastian", "location": "Madrid" } }

# Extract facts from any content (AI-powered)
POST /v1/users/me/profile/extract?app_id=default
{
  "text": "I'm a developer living in Berlin",     // Natural language
  "data": {"firstName": "John", "city": "NYC"},  // Structured JSON
  "raw": "name=Sebastian loc=Spain"              // Raw content
}
→ {"facts": {"job": "developer", "city": "Berlin"}}

# Delete all profile facts
DELETE /v1/users/me/profile?app_id=default

🧩 Key Technologies¶

Component	Tech	Why?
Web Framework	FastAPI	Async, auto-docs, fast
Orchestration	LangGraph	State machines for conversations
LLM Router	LiteLLM	100+ models, fallbacks, retries
Database	PostgreSQL 16	Reliable, ACID, pgvector support
Vector Search	pgvector	HNSW index = millisecond queries
Cache	Redis 7	Ultra-fast STM, Celery queue
Background Jobs	Celery	Async memory consolidation
PII Scrubbing	Presidio	Redact credit cards, SSNs
Auth	Traylinx Sentinel	A2A authentication

🔑 Key Concepts¶

Token Budget¶

STM: 4,000 tokens (recent conversation)
LTM: 1,000 tokens (relevant facts from vector search)
Total Context: ~5,000 tokens (well under GPT-4's limit)

Memory Consolidation (Background)¶

After every chat turn: 1. Check if STM > 4,000 tokens → Summarize oldest messages 2. Extract new facts from the conversation 3. Normalize facts (e.g., "User is called X" → "User's name is X") 4. Deduplicate against existing memories (similarity ≥ 0.85) 5. Embed facts → Store in memories table

Memory Deduplication¶

Prevents storing semantically identical facts with different phrasings: - Automatic: New facts are deduplicated during extraction - Manual cleanup: GET /v1/memory/duplicates + POST /v1/memory/deduplicate - Configurable: Adjust thresholds via MEMORY_DEDUP_* env vars

Memory Search¶

Search endpoints now support pagination and similarity thresholds: - Pagination: All list/search endpoints support limit (default 15) and offset - Similarity Threshold: min_similarity (default 0.6) filters noise from semantic searches - Ordering: Results ordered by raw similarity (highest first) for consistency - Configurable: Set MEMORY_SEARCH_MIN_SIMILARITY env var for default threshold

LLM Routing Strategies¶

Fast: uses gpt-4o-mini or haiku (cheap)
Balanced: uses gpt-4o or claude-3-5-sonnet
Powerful: uses o1-preview or claude-opus
Dynamic Keys: Pass switch_ai_api_key in config for per-user billing

Base URLs: Configure LLM_BASE_URL and EMBEDDING_BASE_URL env vars to point to your proxy. API keys are passed dynamically per-request for billing attribution.

🛡️ Security Features¶

Token Authentication: SHA256 hashed tokens, never stored in plaintext
API Key Encryption: User API keys encrypted at rest (Fernet)
PII Scrubbing: Credit cards, SSNs, phone numbers auto-redacted before LLM
Multi-Tenancy: All queries filtered by (app_id, user_id)
Row-Level Security: PostgreSQL RLS enforces data isolation
Sentinel Auth: All A2A requests require valid Bearer token

📊 Observability¶

Tracing: Every request gets a trace_id (LangSmith or OpenTelemetry)
Logging: Structured JSON logs
Metrics: Cost per user, tokens, latency

Example Log:

{
  "trace_id": "abc-123",
  "event": "llm_response",
  "model": "gpt-4o",
  "tokens_in": 50,
  "tokens_out": 120,
  "cost_usd": 0.00015,
  "latency_ms": 850
}

🚀 Development Phases¶

Phase	Goal	Duration
1. Skeleton	FastAPI + Sentinel auth	2 days
2. Data Layer	Postgres + Redis setup	2 days
3. Orchestrator	LangGraph + LiteLLM	3 days
4. LTM	Vector search + Celery	3 days
5. Production	PII, tests, monitoring	1+ week

Total: ~2-3 weeks for MVP

✅ Definition of Done (DoD)¶

A deployment is production-ready when:

[x] POST /v1/chat returns streaming responses
[x] Memory Test: "My name is X" → restart → "What is my name?" = correct
[x] Security Test: Invalid token → 401
[x] PII Test: "My card is 4111..." → Gets redacted
[x] Health checks pass (/health, /health/live)
[x] Code passes mypy + ruff
[x] Test coverage ≥ 80%

🐛 Common Gotchas¶

Embedding Mismatch: If you change embedding models, set EMBEDDING_DIMENSIONS env var to match (e.g., 1024 for mistral-embed, 1536 for OpenAI)
Token Overflow: Always count tokens BEFORE sending to LLM (use tiktoken)
Session Expiry: Redis STM has 24h TTL - LTM persists forever
PII in Vectors: Never store unsanitized content in memories table
Celery Retry: Background jobs WILL fail - ensure retry logic configured

📚 Where to Learn More¶

Question	Document
Why does this exist?	`docs/gcam_research.md`
How does it work?	`docs/specs_v2_final.md`
What's the database design?	`docs/./DATABASE_SCHEMA.md`
How do I build it?	`docs/implementation_plan_v2_final.md`
How does it integrate?	`docs/./integration_guide.md`

🎓 Mental Model¶

Think of Cortex as a brain prosthetic for LLMs:

Hippocampus (LTM): Stores long-term facts
Working Memory (STM): Keeps recent context active
Cerebellum (LangGraph): Coordinates the retrieval-thinking-response flow
Corpus Callosum (LiteLLM): Connects to different "thought processes" (models)

Without Cortex, your LLM is like someone with amnesia - brilliant in the moment, but forgets everything immediately.

💬 Quick Commands¶

# Start everything
docker-compose up -d

# Run migrations
alembic upgrade head

# Start API
uvicorn app.main:app --reload

# Run tests
pytest tests/ -v --cov=app

# Type check
mypy app/

# Lint
ruff check app/

# Register user & get token
curl -X POST localhost:8000/v1/users \
  -H "Content-Type: application/json" \
  -d '{"id": "550e8400-e29b-41d4-a716-446655440000", "email": "test@example.com"}'

# Create session (with token)
curl -X POST localhost:8000/v1/session \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ctx_your_token_here" \
  -d '{"app_id": "test"}'

Need Help? Read docs/specs_v2_final.md → Contact team → Start coding! 🚀

**[← Back to Docs Index](./README.md)** | **[Main README](../README.md)** | **[API Reference →](././api_reference.md)**