Cortex documentation ingestion runbook¶

How Traylinx product documentation becomes searchable knowledge for the in-app Linx assistant.

What this powers¶

The Traylinx Personal Assistant uses Cortex for long-term memory and documentation retrieval. Public docs, user manuals, and product references are embedded into the Cortex memories table under a shared system identity. When a user asks about Traylinx, Tytus, tytusOS, OpenClaw, Hermes, Atomek, JULI3TA, shared folders, billing, install steps, or platform features, Cortex can retrieve these documentation chunks as grounding context.

Source of truth¶

Current documentation sources indexed into Cortex:

Source tag	Local source	Purpose
`docs-hub`	`/Users/sebastian/Projects/makakoo/agents/traylinx-docs-hub/docs`	Public docs site at `https://docs.traylinx.com`
`traylinx-user-manuals`	`/Users/sebastian/projects/makakoo/front/web_apps/traylinx/docs/user-manuals`	App repo user manuals and support docs
`traylinx-public-references`	`/Users/sebastian/projects/makakoo/front/web_apps/traylinx/public/references`	`llms.txt` reference corpus and product pages

Tytus-specific documents that must stay fresh:

docs/hubs/user-guide/features/tytus-os.md
docs/hubs/cookbooks/tytus-private-pod-quickstart.md
traylinx/docs/user-manuals/05_tytus_os.md
traylinx/public/references/tytus.md
traylinx/src/utils/agenticPrompt.js
traylinx/src/generated/faqKnowledgeBase.json

Embedding model¶

Use:

mistral-embed

Current vector dimension:

Do not use stale local .env defaults if they point to an inactive model. On 2026-05-14 the old qwen3-embedding:0.6b value failed with:

Model 'qwen3-embedding:0.6b' is not available or not active.

Override with EMBEDDING_MODEL=mistral-embed when running ingestion.

Database model¶

Cortex stores documentation in Postgres with pgvector.

Table:

memories

System identity:

user_id = SYSTEM_TRAYLINX
app_id  = traylinx_ground_truth

Relevant columns:

Column	Type	Meaning
`id`	`uuid`	memory row id
`user_id`	`varchar`	`SYSTEM_TRAYLINX` for shared docs
`app_id`	`varchar`	`traylinx_ground_truth`
`content`	`text`	markdown chunk with source header
`embedding`	`vector`	1024-dimensional embedding
`tags`	`text[]`	`docs`, source tag, category, filename
`importance_score`	`numeric`	currently `1.0` for docs chunks
`valid_at`	`timestamptz`	insertion time
`created_at`	`timestamptz`	insertion time

Refresh is destructive and atomic: delete old SYSTEM_TRAYLINX/traylinx_ground_truth rows and insert the new corpus in one transaction.

Ingestion script¶

Script:

/Users/sebastian/Projects/makakoo/agents/traylinx-docs-hub/scripts/ingest_to_cortex.py

Capabilities:

Reads one or more Markdown files/directories.
Strips MDX-only components.
Chunks by ## headers.
Tags each chunk with source/category/file.
Calls the OpenAI-compatible embeddings endpoint.
Either writes directly to Cortex Postgres or exports JSONL for in-cluster insertion.

Required environment variables are normally loaded from scripts/.env:

CORTEX_DATABASE_URL
EMBEDDING_API_URL
EMBEDDING_API_KEY
EMBEDDING_MODEL

The production database is DigitalOcean managed Postgres and may reject direct connections from a developer machine. If direct DB connection times out, export embeddings locally and insert from inside the Cortex Kubernetes pod.

Dry run¶

Use dry run before any refresh:

cd /Users/sebastian/Projects/makakoo/agents/traylinx-docs-hub
python3 scripts/ingest_to_cortex.py --dry-run \
  --docs-dir /Users/sebastian/Projects/makakoo/agents/traylinx-docs-hub/docs --source docs-hub \
  --docs-dir /Users/sebastian/projects/makakoo/front/web_apps/traylinx/docs/user-manuals --source traylinx-user-manuals \
  --docs-dir /Users/sebastian/projects/makakoo/front/web_apps/traylinx/public/references --source traylinx-public-references

Expected current scale after the Tytus refresh: about 988 chunks.

Direct refresh path¶

Use this only when the machine can reach the production database:

cd /Users/sebastian/Projects/makakoo/agents/traylinx-docs-hub
EMBEDDING_MODEL=mistral-embed python3 scripts/ingest_to_cortex.py --mode refresh \
  --docs-dir /Users/sebastian/Projects/makakoo/agents/traylinx-docs-hub/docs --source docs-hub \
  --docs-dir /Users/sebastian/projects/makakoo/front/web_apps/traylinx/docs/user-manuals --source traylinx-user-manuals \
  --docs-dir /Users/sebastian/projects/makakoo/front/web_apps/traylinx/public/references --source traylinx-public-references

Production-safe Kubernetes path¶

Use this path when DigitalOcean Postgres times out from the laptop.

1. Export local embeddings¶

cd /Users/sebastian/Projects/makakoo/agents/traylinx-docs-hub
OUT=/tmp/traylinx-cortex-docs-embeddings-$(date +%Y%m%d).jsonl
EMBEDDING_MODEL=mistral-embed python3 scripts/ingest_to_cortex.py --mode refresh --export-embeddings "$OUT" \
  --docs-dir /Users/sebastian/Projects/makakoo/agents/traylinx-docs-hub/docs --source docs-hub \
  --docs-dir /Users/sebastian/projects/makakoo/front/web_apps/traylinx/docs/user-manuals --source traylinx-user-manuals \
  --docs-dir /Users/sebastian/projects/makakoo/front/web_apps/traylinx/public/references --source traylinx-public-references

2. Copy JSONL into the Cortex pod¶

POD=$(kubectl -n production get pod -l app=cortex-ms -o jsonpath='{.items[0].metadata.name}')
kubectl -n production cp "$OUT" "$POD":/tmp/traylinx-cortex-docs-embeddings.jsonl

3. Insert from inside the pod¶

The pod has network access to Postgres and has asyncpg installed in /opt/venv. Use a small inserter that:

Loads /tmp/traylinx-cortex-docs-embeddings.jsonl.
Normalizes DATABASE_URL from postgresql+asyncpg:// to postgresql://.
Removes sslmode=require from the query and passes ssl='require' to asyncpg.connect.
Deletes old docs rows where user_id='SYSTEM_TRAYLINX' and app_id='traylinx_ground_truth'.
Inserts all rows into memories with embedding::vector and tags::text[].
Commits once.

The 2026-05-14 Tytus refresh used this path and replaced:

before: 621
inserted: 988
after: 988

Verification queries¶

Run from inside the Cortex pod:

select tags[2] as source, count(*)
from memories
where user_id = 'SYSTEM_TRAYLINX'
  and app_id = 'traylinx_ground_truth'
group by tags[2]
order by count(*) desc;

Expected current sources:

docs-hub                  836
traylinx-public-references 78
traylinx-user-manuals      74

Verify Tytus Windows install content exists:

select substring(content from 1 for 260)
from memories
where user_id = 'SYSTEM_TRAYLINX'
  and app_id = 'traylinx_ground_truth'
  and content ilike '%powershell -c%install.ps1%'
limit 3;

Expected: hits from the Tytus private pod quickstart, Personal Assistant Tytus help page, and TytusOS feature page.

Frontend bot knowledge layers¶

Cortex embeddings are not the only source of bot behavior. Keep all four layers aligned:

src/utils/agenticPrompt.js: Linx system prompt and current Tytus facts.
src/generated/faqKnowledgeBase.json: quick-action question labels.
public/references/*.md plus public/llms.txt: public machine-readable reference corpus.
Cortex memories table: embedded retrieval corpus.

If answers are stale, check the layers in that order, then regenerate Cortex embeddings.