🍳 Cookbook: Agentic Upload Engine¶

Build an intelligent file ingestion system that automatically extracts, translates, and understands uploaded documents using OCR and AI agents.

🎯 What You'll Build¶

A service that: 1. Accepts file uploads (PDF, images, JSON, CSV) 2. Automatically performs OCR on images 3. Translates content if needed 4. Returns structured, searchable data

🏗️ Phase 1: Clone & Setup¶

1. Get the Code¶

git clone https://github.com/traylinx/agentic-upload-engines.git
cd agentic-upload-engines

2. Install Dependencies¶

pip install -r requirements.txt

3. Configure Environment¶

cp .env.example .env
nano .env

Key Variables:

# LLM Configuration
LLM_MODEL=gpt-4o
LLM_API_KEY=your-api-key
LLM_BASE_URL=https://switchai.traylinx.com/v1

# File Engine (Required!)
FILE_ENGINE_BASE_URL=https://api.traylinx.com/file-engine

🚀 Phase 2: Run the Service¶

uvicorn api:app --reload
# Server running at http://localhost:8000

🧪 Phase 3: Upload a Simple File¶

Upload a PDF¶

curl -X POST http://localhost:8000/v1/upload \
  -H "Authorization: Bearer your-token" \
  -F "file=@invoice.pdf"

Response:

{
  "file_id": "file_abc123",
  "filename": "invoice.pdf",
  "extracted_text": "Invoice #12345\nDate: 2025-01-15\nTotal: $500.00",
  "metadata": {
    "pages": 1,
    "has_images": false
  }
}

📷 Phase 4: OCR Processing¶

Upload an image and automatically extract text.

Upload an Image¶

curl -X POST http://localhost:8000/v1/upload \
  -H "Authorization: Bearer your-token" \
  -F "file=@receipt.jpg"

The engine: 1. Detects it's an image 2. Sends to OCR processor 3. Returns extracted text

🌍 Phase 5: Upload + Translate¶

Process a document AND translate it in one request.

curl -X POST http://localhost:8000/v1/upload \
  -H "Authorization: Bearer your-token" \
  -F "file=@german_contract.pdf" \
  -F "translate_to=en"

Response:

{
  "file_id": "file_xyz789",
  "original_text": "Vertrag zwischen...",
  "translated_text": "Contract between...",
  "source_language": "de",
  "target_language": "en"
}

🔄 The Processing Pipeline¶

          File Upload
             │
             ▼
     ┌───────────────┐
     │ Type Detector │ ← PDF? Image? JSON?
     └───────┬───────┘
             │
     ┌───────┴───────┐
     │               │
     ▼               ▼
┌─────────┐   ┌───────────┐
│   OCR   │   │  Parser   │
│ (Images)│   │ (Text/PDF)│
└────┬────┘   └─────┬─────┘
     │              │
     └──────┬───────┘
            ▼
     ┌─────────────┐
     │ Translator? │ ← If translate_to provided
     └──────┬──────┘
            ▼
     ┌─────────────┐
     │   Output    │
     └─────────────┘

🔐 Phase 6: A2A Authentication¶

Enable machine-to-machine file uploads.

Add Credentials to .env¶

TRAYLINX_CLIENT_ID=ag-xxx
TRAYLINX_CLIENT_SECRET=ts-xxx

Upload as an Agent¶

curl -X POST http://localhost:8000/v1/upload \
  -H "X-Agent-Secret-Token: your-token" \
  -H "X-Agent-User-Id: your-agent-id" \
  -F "file=@data.csv"

📊 Supported File Types¶

Type	Extensions	Processing
Documents	`.pdf`, `.docx`	Text extraction
Images	`.jpg`, `.png`, `.webp`	OCR
Data	`.json`, `.csv`	Parsing
Text	`.txt`, `.md`	Direct read

📚 Next Steps¶

Build a file search index with vector embeddings
Add webhook callbacks for async processing
Integrate with the RAG Cookbook for document Q&A