Cortex documentation ingestion runbook¶
How Traylinx product documentation becomes searchable knowledge for the in-app Linx assistant.
What this powers¶
The Traylinx Personal Assistant uses Cortex for long-term memory and documentation retrieval. Public docs, user manuals, and product references are embedded into the Cortex memories table under a shared system identity. When a user asks about Traylinx, Tytus, tytusOS, OpenClaw, Hermes, Atomek, JULI3TA, shared folders, billing, install steps, or platform features, Cortex can retrieve these documentation chunks as grounding context.
Source of truth¶
Current documentation sources indexed into Cortex:
| Source tag | Local source | Purpose |
|---|---|---|
docs-hub |
/Users/sebastian/Projects/makakoo/agents/traylinx-docs-hub/docs |
Public docs site at https://docs.traylinx.com |
traylinx-user-manuals |
/Users/sebastian/projects/makakoo/front/web_apps/traylinx/docs/user-manuals |
App repo user manuals and support docs |
traylinx-public-references |
/Users/sebastian/projects/makakoo/front/web_apps/traylinx/public/references |
llms.txt reference corpus and product pages |
Tytus-specific documents that must stay fresh:
docs/hubs/user-guide/features/tytus-os.mddocs/hubs/cookbooks/tytus-private-pod-quickstart.mdtraylinx/docs/user-manuals/05_tytus_os.mdtraylinx/public/references/tytus.mdtraylinx/src/utils/agenticPrompt.jstraylinx/src/generated/faqKnowledgeBase.json
Embedding model¶
Use:
Current vector dimension:
Do not use stale local .env defaults if they point to an inactive model. On 2026-05-14 the old qwen3-embedding:0.6b value failed with:
Override with EMBEDDING_MODEL=mistral-embed when running ingestion.
Database model¶
Cortex stores documentation in Postgres with pgvector.
Table:
System identity:
Relevant columns:
| Column | Type | Meaning |
|---|---|---|
id |
uuid |
memory row id |
user_id |
varchar |
SYSTEM_TRAYLINX for shared docs |
app_id |
varchar |
traylinx_ground_truth |
content |
text |
markdown chunk with source header |
embedding |
vector |
1024-dimensional embedding |
tags |
text[] |
docs, source tag, category, filename |
importance_score |
numeric |
currently 1.0 for docs chunks |
valid_at |
timestamptz |
insertion time |
created_at |
timestamptz |
insertion time |
Refresh is destructive and atomic: delete old SYSTEM_TRAYLINX/traylinx_ground_truth rows and insert the new corpus in one transaction.
Ingestion script¶
Script:
Capabilities:
- Reads one or more Markdown files/directories.
- Strips MDX-only components.
- Chunks by
##headers. - Tags each chunk with source/category/file.
- Calls the OpenAI-compatible embeddings endpoint.
- Either writes directly to Cortex Postgres or exports JSONL for in-cluster insertion.
Required environment variables are normally loaded from scripts/.env:
The production database is DigitalOcean managed Postgres and may reject direct connections from a developer machine. If direct DB connection times out, export embeddings locally and insert from inside the Cortex Kubernetes pod.
Dry run¶
Use dry run before any refresh:
cd /Users/sebastian/Projects/makakoo/agents/traylinx-docs-hub
python3 scripts/ingest_to_cortex.py --dry-run \
--docs-dir /Users/sebastian/Projects/makakoo/agents/traylinx-docs-hub/docs --source docs-hub \
--docs-dir /Users/sebastian/projects/makakoo/front/web_apps/traylinx/docs/user-manuals --source traylinx-user-manuals \
--docs-dir /Users/sebastian/projects/makakoo/front/web_apps/traylinx/public/references --source traylinx-public-references
Expected current scale after the Tytus refresh: about 988 chunks.
Direct refresh path¶
Use this only when the machine can reach the production database:
cd /Users/sebastian/Projects/makakoo/agents/traylinx-docs-hub
EMBEDDING_MODEL=mistral-embed python3 scripts/ingest_to_cortex.py --mode refresh \
--docs-dir /Users/sebastian/Projects/makakoo/agents/traylinx-docs-hub/docs --source docs-hub \
--docs-dir /Users/sebastian/projects/makakoo/front/web_apps/traylinx/docs/user-manuals --source traylinx-user-manuals \
--docs-dir /Users/sebastian/projects/makakoo/front/web_apps/traylinx/public/references --source traylinx-public-references
Production-safe Kubernetes path¶
Use this path when DigitalOcean Postgres times out from the laptop.
1. Export local embeddings¶
cd /Users/sebastian/Projects/makakoo/agents/traylinx-docs-hub
OUT=/tmp/traylinx-cortex-docs-embeddings-$(date +%Y%m%d).jsonl
EMBEDDING_MODEL=mistral-embed python3 scripts/ingest_to_cortex.py --mode refresh --export-embeddings "$OUT" \
--docs-dir /Users/sebastian/Projects/makakoo/agents/traylinx-docs-hub/docs --source docs-hub \
--docs-dir /Users/sebastian/projects/makakoo/front/web_apps/traylinx/docs/user-manuals --source traylinx-user-manuals \
--docs-dir /Users/sebastian/projects/makakoo/front/web_apps/traylinx/public/references --source traylinx-public-references
2. Copy JSONL into the Cortex pod¶
POD=$(kubectl -n production get pod -l app=cortex-ms -o jsonpath='{.items[0].metadata.name}')
kubectl -n production cp "$OUT" "$POD":/tmp/traylinx-cortex-docs-embeddings.jsonl
3. Insert from inside the pod¶
The pod has network access to Postgres and has asyncpg installed in /opt/venv. Use a small inserter that:
- Loads
/tmp/traylinx-cortex-docs-embeddings.jsonl. - Normalizes
DATABASE_URLfrompostgresql+asyncpg://topostgresql://. - Removes
sslmode=requirefrom the query and passesssl='require'toasyncpg.connect. - Deletes old docs rows where
user_id='SYSTEM_TRAYLINX'andapp_id='traylinx_ground_truth'. - Inserts all rows into
memorieswithembedding::vectorandtags::text[]. - Commits once.
The 2026-05-14 Tytus refresh used this path and replaced:
Verification queries¶
Run from inside the Cortex pod:
select tags[2] as source, count(*)
from memories
where user_id = 'SYSTEM_TRAYLINX'
and app_id = 'traylinx_ground_truth'
group by tags[2]
order by count(*) desc;
Expected current sources:
Verify Tytus Windows install content exists:
select substring(content from 1 for 260)
from memories
where user_id = 'SYSTEM_TRAYLINX'
and app_id = 'traylinx_ground_truth'
and content ilike '%powershell -c%install.ps1%'
limit 3;
Expected: hits from the Tytus private pod quickstart, Personal Assistant Tytus help page, and TytusOS feature page.
Frontend bot knowledge layers¶
Cortex embeddings are not the only source of bot behavior. Keep all four layers aligned:
src/utils/agenticPrompt.js: Linx system prompt and current Tytus facts.src/generated/faqKnowledgeBase.json: quick-action question labels.public/references/*.mdpluspublic/llms.txt: public machine-readable reference corpus.- Cortex
memoriestable: embedded retrieval corpus.
If answers are stale, check the layers in that order, then regenerate Cortex embeddings.