Skip to content

Cortex documentation ingestion runbook

How Traylinx product documentation becomes searchable knowledge for the in-app Linx assistant.

What this powers

The Traylinx Personal Assistant uses Cortex for long-term memory and documentation retrieval. Public docs, user manuals, and product references are embedded into the Cortex memories table under a shared system identity. When a user asks about Traylinx, Tytus, tytusOS, OpenClaw, Hermes, Atomek, JULI3TA, shared folders, billing, install steps, or platform features, Cortex can retrieve these documentation chunks as grounding context.

Source of truth

Current documentation sources indexed into Cortex:

Source tag Local source Purpose
docs-hub /Users/sebastian/Projects/makakoo/agents/traylinx-docs-hub/docs Public docs site at https://docs.traylinx.com
traylinx-user-manuals /Users/sebastian/projects/makakoo/front/web_apps/traylinx/docs/user-manuals App repo user manuals and support docs
traylinx-public-references /Users/sebastian/projects/makakoo/front/web_apps/traylinx/public/references llms.txt reference corpus and product pages

Tytus-specific documents that must stay fresh:

  • docs/hubs/user-guide/features/tytus-os.md
  • docs/hubs/cookbooks/tytus-private-pod-quickstart.md
  • traylinx/docs/user-manuals/05_tytus_os.md
  • traylinx/public/references/tytus.md
  • traylinx/src/utils/agenticPrompt.js
  • traylinx/src/generated/faqKnowledgeBase.json

Embedding model

Use:

mistral-embed

Current vector dimension:

1024

Do not use stale local .env defaults if they point to an inactive model. On 2026-05-14 the old qwen3-embedding:0.6b value failed with:

Model 'qwen3-embedding:0.6b' is not available or not active.

Override with EMBEDDING_MODEL=mistral-embed when running ingestion.

Database model

Cortex stores documentation in Postgres with pgvector.

Table:

memories

System identity:

user_id = SYSTEM_TRAYLINX
app_id  = traylinx_ground_truth

Relevant columns:

Column Type Meaning
id uuid memory row id
user_id varchar SYSTEM_TRAYLINX for shared docs
app_id varchar traylinx_ground_truth
content text markdown chunk with source header
embedding vector 1024-dimensional embedding
tags text[] docs, source tag, category, filename
importance_score numeric currently 1.0 for docs chunks
valid_at timestamptz insertion time
created_at timestamptz insertion time

Refresh is destructive and atomic: delete old SYSTEM_TRAYLINX/traylinx_ground_truth rows and insert the new corpus in one transaction.

Ingestion script

Script:

/Users/sebastian/Projects/makakoo/agents/traylinx-docs-hub/scripts/ingest_to_cortex.py

Capabilities:

  • Reads one or more Markdown files/directories.
  • Strips MDX-only components.
  • Chunks by ## headers.
  • Tags each chunk with source/category/file.
  • Calls the OpenAI-compatible embeddings endpoint.
  • Either writes directly to Cortex Postgres or exports JSONL for in-cluster insertion.

Required environment variables are normally loaded from scripts/.env:

CORTEX_DATABASE_URL
EMBEDDING_API_URL
EMBEDDING_API_KEY
EMBEDDING_MODEL

The production database is DigitalOcean managed Postgres and may reject direct connections from a developer machine. If direct DB connection times out, export embeddings locally and insert from inside the Cortex Kubernetes pod.

Dry run

Use dry run before any refresh:

cd /Users/sebastian/Projects/makakoo/agents/traylinx-docs-hub
python3 scripts/ingest_to_cortex.py --dry-run \
  --docs-dir /Users/sebastian/Projects/makakoo/agents/traylinx-docs-hub/docs --source docs-hub \
  --docs-dir /Users/sebastian/projects/makakoo/front/web_apps/traylinx/docs/user-manuals --source traylinx-user-manuals \
  --docs-dir /Users/sebastian/projects/makakoo/front/web_apps/traylinx/public/references --source traylinx-public-references

Expected current scale after the Tytus refresh: about 988 chunks.

Direct refresh path

Use this only when the machine can reach the production database:

cd /Users/sebastian/Projects/makakoo/agents/traylinx-docs-hub
EMBEDDING_MODEL=mistral-embed python3 scripts/ingest_to_cortex.py --mode refresh \
  --docs-dir /Users/sebastian/Projects/makakoo/agents/traylinx-docs-hub/docs --source docs-hub \
  --docs-dir /Users/sebastian/projects/makakoo/front/web_apps/traylinx/docs/user-manuals --source traylinx-user-manuals \
  --docs-dir /Users/sebastian/projects/makakoo/front/web_apps/traylinx/public/references --source traylinx-public-references

Production-safe Kubernetes path

Use this path when DigitalOcean Postgres times out from the laptop.

1. Export local embeddings

cd /Users/sebastian/Projects/makakoo/agents/traylinx-docs-hub
OUT=/tmp/traylinx-cortex-docs-embeddings-$(date +%Y%m%d).jsonl
EMBEDDING_MODEL=mistral-embed python3 scripts/ingest_to_cortex.py --mode refresh --export-embeddings "$OUT" \
  --docs-dir /Users/sebastian/Projects/makakoo/agents/traylinx-docs-hub/docs --source docs-hub \
  --docs-dir /Users/sebastian/projects/makakoo/front/web_apps/traylinx/docs/user-manuals --source traylinx-user-manuals \
  --docs-dir /Users/sebastian/projects/makakoo/front/web_apps/traylinx/public/references --source traylinx-public-references

2. Copy JSONL into the Cortex pod

POD=$(kubectl -n production get pod -l app=cortex-ms -o jsonpath='{.items[0].metadata.name}')
kubectl -n production cp "$OUT" "$POD":/tmp/traylinx-cortex-docs-embeddings.jsonl

3. Insert from inside the pod

The pod has network access to Postgres and has asyncpg installed in /opt/venv. Use a small inserter that:

  1. Loads /tmp/traylinx-cortex-docs-embeddings.jsonl.
  2. Normalizes DATABASE_URL from postgresql+asyncpg:// to postgresql://.
  3. Removes sslmode=require from the query and passes ssl='require' to asyncpg.connect.
  4. Deletes old docs rows where user_id='SYSTEM_TRAYLINX' and app_id='traylinx_ground_truth'.
  5. Inserts all rows into memories with embedding::vector and tags::text[].
  6. Commits once.

The 2026-05-14 Tytus refresh used this path and replaced:

before: 621
inserted: 988
after: 988

Verification queries

Run from inside the Cortex pod:

select tags[2] as source, count(*)
from memories
where user_id = 'SYSTEM_TRAYLINX'
  and app_id = 'traylinx_ground_truth'
group by tags[2]
order by count(*) desc;

Expected current sources:

docs-hub                  836
traylinx-public-references 78
traylinx-user-manuals      74

Verify Tytus Windows install content exists:

select substring(content from 1 for 260)
from memories
where user_id = 'SYSTEM_TRAYLINX'
  and app_id = 'traylinx_ground_truth'
  and content ilike '%powershell -c%install.ps1%'
limit 3;

Expected: hits from the Tytus private pod quickstart, Personal Assistant Tytus help page, and TytusOS feature page.

Frontend bot knowledge layers

Cortex embeddings are not the only source of bot behavior. Keep all four layers aligned:

  1. src/utils/agenticPrompt.js: Linx system prompt and current Tytus facts.
  2. src/generated/faqKnowledgeBase.json: quick-action question labels.
  3. public/references/*.md plus public/llms.txt: public machine-readable reference corpus.
  4. Cortex memories table: embedded retrieval corpus.

If answers are stale, check the layers in that order, then regenerate Cortex embeddings.