🎯 Traylinx Cortex - Developer Quick Reference¶
Version: 2.3.0
Last Updated: December 4, 2025
📚 Navigation: Main README | Docs Index | API Reference | Database Schema
This is your one-page cheat sheet for understanding Traylinx Cortex. For deep dives, see the full documentation.
🧠 What is Cortex?¶
In one sentence: A stateful, memory-enabled middleware that sits between your app and LLMs, making AI conversations actually remember things.
The Problem It Solves: - LLMs are stateless (goldfish memory) - Passing full history is expensive and slow - No standard way to retrieve "facts" from past conversations
The Cortex Solution: - Short-Term Memory (STM): Redis cache of recent messages (4,000 token budget) - Long-Term Memory (LTM): PostgreSQL vector store of facts (searchable via embeddings) - Smart Routing: LiteLLM automatically picks the best/cheapest model
📐 Architecture (30-Second Version)¶
User Message → Cortex API
↓
1. Fetch recent messages from Redis (STM)
2. Search vector DB for relevant facts (LTM)
3. Build smart prompt: System + Memories + History
4. Route to LLM (GPT-4, Claude, etc)
5. Stream response back to user
6. Save to DB + trigger background worker
↓
Background: Extract facts, embed them, store in LTM
🗃️ Database Schema (Simplified)¶
| Table | Purpose | Key Columns |
|---|---|---|
users |
User accounts | id, email, switch_ai_api_key_encrypted |
api_tokens |
Bearer tokens | user_id, token_hash, token_prefix |
user_profiles |
User facts | user_id, app_id, facts (JSONB) |
sessions |
Conversation threads | id, user_id, app_id, title |
messages |
Raw chat log | session_id, role, content, token_count |
memories |
Vector store | user_id, content, embedding (VECTOR) |
usage_logs |
Cost tracking | session_id, model, tokens_in/out, cost_usd |
langgraph_checkpoints |
LangGraph state persistence | thread_id, checkpoint, updated_at |
Special: memories.embedding vector dimensions configurable via EMBEDDING_DIMENSIONS env var (default: 1024 for mistral-embed)
🔐 Authentication¶
Cortex uses bearer token authentication similar to GitHub Personal Access Tokens.
Register & Get Token¶
POST /v1/users
{
"id": "uuid",
"email": "user@example.com",
"token_name": "My Laptop"
}
→ {"access_token": "ctx_abc123...", "token_type": "Bearer"}
Use Token¶
Token Management¶
GET /v1/users/me/tokens # List all tokens
POST /v1/users/me/tokens # Create new token
DELETE /v1/users/me/tokens/{id} # Revoke token
🔌 API Endpoints (Core)¶
Create Session¶
POST /v1/session
Authorization: Bearer ctx_abc123...
{
"app_id": "my_app"
}
→ {"session_id": "uuid"}
Send Message¶
POST /v1/chat
Authorization: Bearer ctx_abc123...
{
"session_id": "uuid",
"message": "Hello",
"config": {
"stream": true,
"model_preference": "balanced",
"switch_ai_api_key": "sk-user-key",
"embedding_model": "mistral-embed"
}
}
→ SSE stream of chunks
Clear Context¶
User Profile¶
# Get profile facts
GET /v1/users/me/profile?app_id=default
# Update profile facts (merge)
PATCH /v1/users/me/profile?app_id=default
{ "facts": { "name": "Sebastian", "location": "Madrid" } }
# Extract facts from any content (AI-powered)
POST /v1/users/me/profile/extract?app_id=default
{
"text": "I'm a developer living in Berlin", // Natural language
"data": {"firstName": "John", "city": "NYC"}, // Structured JSON
"raw": "name=Sebastian loc=Spain" // Raw content
}
→ {"facts": {"job": "developer", "city": "Berlin"}}
# Delete all profile facts
DELETE /v1/users/me/profile?app_id=default
🧩 Key Technologies¶
| Component | Tech | Why? |
|---|---|---|
| Web Framework | FastAPI | Async, auto-docs, fast |
| Orchestration | LangGraph | State machines for conversations |
| LLM Router | LiteLLM | 100+ models, fallbacks, retries |
| Database | PostgreSQL 16 | Reliable, ACID, pgvector support |
| Vector Search | pgvector | HNSW index = millisecond queries |
| Cache | Redis 7 | Ultra-fast STM, Celery queue |
| Background Jobs | Celery | Async memory consolidation |
| PII Scrubbing | Presidio | Redact credit cards, SSNs |
| Auth | Traylinx Sentinel | A2A authentication |
🔑 Key Concepts¶
Token Budget¶
- STM: 4,000 tokens (recent conversation)
- LTM: 1,000 tokens (relevant facts from vector search)
- Total Context: ~5,000 tokens (well under GPT-4's limit)
Memory Consolidation (Background)¶
After every chat turn:
1. Check if STM > 4,000 tokens → Summarize oldest messages
2. Extract new facts from the conversation
3. Normalize facts (e.g., "User is called X" → "User's name is X")
4. Deduplicate against existing memories (similarity ≥ 0.85)
5. Embed facts → Store in memories table
Memory Deduplication¶
Prevents storing semantically identical facts with different phrasings:
- Automatic: New facts are deduplicated during extraction
- Manual cleanup: GET /v1/memory/duplicates + POST /v1/memory/deduplicate
- Configurable: Adjust thresholds via MEMORY_DEDUP_* env vars
Memory Search¶
Search endpoints now support pagination and similarity thresholds:
- Pagination: All list/search endpoints support limit (default 15) and offset
- Similarity Threshold: min_similarity (default 0.6) filters noise from semantic searches
- Ordering: Results ordered by raw similarity (highest first) for consistency
- Configurable: Set MEMORY_SEARCH_MIN_SIMILARITY env var for default threshold
LLM Routing Strategies¶
- Fast: uses
gpt-4o-miniorhaiku(cheap) - Balanced: uses
gpt-4oorclaude-3-5-sonnet - Powerful: uses
o1-previeworclaude-opus - Dynamic Keys: Pass
switch_ai_api_keyin config for per-user billing
Base URLs: Configure LLM_BASE_URL and EMBEDDING_BASE_URL env vars to point to your proxy.
API keys are passed dynamically per-request for billing attribution.
🛡️ Security Features¶
- Token Authentication: SHA256 hashed tokens, never stored in plaintext
- API Key Encryption: User API keys encrypted at rest (Fernet)
- PII Scrubbing: Credit cards, SSNs, phone numbers auto-redacted before LLM
- Multi-Tenancy: All queries filtered by
(app_id, user_id) - Row-Level Security: PostgreSQL RLS enforces data isolation
- Sentinel Auth: All A2A requests require valid Bearer token
📊 Observability¶
Tracing: Every request gets a trace_id (LangSmith or OpenTelemetry)
Logging: Structured JSON logs
Metrics: Cost per user, tokens, latency
Example Log:
{
"trace_id": "abc-123",
"event": "llm_response",
"model": "gpt-4o",
"tokens_in": 50,
"tokens_out": 120,
"cost_usd": 0.00015,
"latency_ms": 850
}
🚀 Development Phases¶
| Phase | Goal | Duration |
|---|---|---|
| 1. Skeleton | FastAPI + Sentinel auth | 2 days |
| 2. Data Layer | Postgres + Redis setup | 2 days |
| 3. Orchestrator | LangGraph + LiteLLM | 3 days |
| 4. LTM | Vector search + Celery | 3 days |
| 5. Production | PII, tests, monitoring | 1+ week |
Total: ~2-3 weeks for MVP
✅ Definition of Done (DoD)¶
A deployment is production-ready when:
- [x]
POST /v1/chatreturns streaming responses - [x] Memory Test: "My name is X" → restart → "What is my name?" = correct
- [x] Security Test: Invalid token → 401
- [x] PII Test: "My card is 4111..." → Gets redacted
- [x] Health checks pass (
/health,/health/live) - [x] Code passes
mypy+ruff - [x] Test coverage ≥ 80%
🐛 Common Gotchas¶
- Embedding Mismatch: If you change embedding models, set
EMBEDDING_DIMENSIONSenv var to match (e.g., 1024 for mistral-embed, 1536 for OpenAI) - Token Overflow: Always count tokens BEFORE sending to LLM (use
tiktoken) - Session Expiry: Redis STM has 24h TTL - LTM persists forever
- PII in Vectors: Never store unsanitized content in
memoriestable - Celery Retry: Background jobs WILL fail - ensure retry logic configured
📚 Where to Learn More¶
| Question | Document |
|---|---|
| Why does this exist? | docs/gcam_research.md |
| How does it work? | docs/specs_v2_final.md |
| What's the database design? | docs/./DATABASE_SCHEMA.md |
| How do I build it? | docs/implementation_plan_v2_final.md |
| How does it integrate? | docs/./integration_guide.md |
🎓 Mental Model¶
Think of Cortex as a brain prosthetic for LLMs:
- Hippocampus (LTM): Stores long-term facts
- Working Memory (STM): Keeps recent context active
- Cerebellum (LangGraph): Coordinates the retrieval-thinking-response flow
- Corpus Callosum (LiteLLM): Connects to different "thought processes" (models)
Without Cortex, your LLM is like someone with amnesia - brilliant in the moment, but forgets everything immediately.
💬 Quick Commands¶
# Start everything
docker-compose up -d
# Run migrations
alembic upgrade head
# Start API
uvicorn app.main:app --reload
# Run tests
pytest tests/ -v --cov=app
# Type check
mypy app/
# Lint
ruff check app/
# Register user & get token
curl -X POST localhost:8000/v1/users \
-H "Content-Type: application/json" \
-d '{"id": "550e8400-e29b-41d4-a716-446655440000", "email": "test@example.com"}'
# Create session (with token)
curl -X POST localhost:8000/v1/session \
-H "Content-Type: application/json" \
-H "Authorization: Bearer ctx_your_token_here" \
-d '{"app_id": "test"}'
Need Help? Read docs/specs_v2_final.md → Contact team → Start coding! 🚀