Skip to content

Traylinx Router Agent

Status: โœ… Production Ready (Sprint 1 & 2 Complete)
Version: 2.0.0
Port: 8080


Overview

The Router Agent is the central message broker for the Traylinx platform. It provides two core capabilities:

  1. Request Routing (Sprint 1) - Routes requests to agents based on capabilities
  2. Event Publishing (Sprint 2) - Publishes events to multiple subscribers in parallel

Think of it as: A smart switchboard that knows which agent can handle what request, and a broadcast system that delivers events to interested parties.


๐ŸŽฏ Key Features

Request Routing (Sprint 1)

  • โœ… Capability-Based Discovery - Find agents via Registry
  • โœ… Automatic Agent Selection - Pick best agent by score
  • โœ… Request Forwarding - Forward to target agent with A2A auth
  • โœ… Retry Logic - Try multiple agents on failure
  • โœ… Error Handling - Graceful degradation
  • โœ… Performance Tracking - Report stats back to Registry

Event Publishing (Sprint 2)

  • โœ… Subscriber Discovery - Query Subscription Service
  • โœ… Parallel Fan-out - Deliver to multiple subscribers concurrently
  • โœ… Delivery Tracking - Success/failure per subscriber
  • โœ… Error Recovery - Continue on partial failures
  • โœ… Metrics Collection - Track event delivery stats

Technical Features

  • โœ… A2A Authentication - All endpoints secured
  • โœ… Health Checks - Monitor Registry & Subscription Service
  • โœ… Structured Logging - Comprehensive operation logs
  • โœ… Metrics Endpoint - In-memory stats (Prometheus-ready)
  • โœ… Dependency Injection - Clean architecture
  • โœ… Async/Await - High-performance async operations

Architecture

Request Routing Flow

1. Client sends request with capabilities
2. Router queries Registry for matching agents
3. Registry returns ranked list of agents
4. Router selects best agent (by score)
5. Router forwards request to agent
6. If failure, try next agent (up to 3 attempts)
7. Router reports stats back to Registry
8. Return response to client

Event Publishing Flow

1. Publisher sends event to Router
2. Router queries Subscription Service for subscribers
3. Subscription Service returns matching subscribers
4. Router fans out event to all subscribers in parallel
5. Router tracks delivery success/failure
6. Return delivery summary to publisher

Quick Start

Prerequisites

  • Python 3.9+
  • Poetry
  • Agent Registry running (port 8000)
  • Subscription Service running (port 8001)

1. Install Dependencies

poetry install

2. Configure Environment

Create .env file:

# Registry Service
REGISTRY_SERVICE_URL=http://localhost:8000

# Subscription Service
SUBSCRIPTION_SERVICE_URL=http://localhost:8001

# Router Configuration
ROUTER_TIMEOUT_SECONDS=30
ROUTER_RETRY_ATTEMPTS=3
ROUTER_MAX_AGENTS_TO_TRY=3

# Event Publishing
EVENT_FANOUT_TIMEOUT_SECONDS=30
EVENT_MAX_PARALLEL_DELIVERIES=50

# Agent Identity
ROUTER_AGENT_KEY=traylinx-router-agent

3. Start Router

poetry run uvicorn app.main:app --host 0.0.0.0 --port 8080 --reload

4. Verify Health

# Liveness
curl http://localhost:8080/health

# Readiness (checks Registry & Subscription Service)
curl http://localhost:8080/ready

# Metrics
curl http://localhost:8080/metrics

5. View API Docs

Open http://localhost:8080/docs


API Reference

Base URL

http://localhost:8080

Authentication

All endpoints require A2A authentication:

Authorization: Bearer {access_token}
X-Agent-Key: {agent_key}
X-Agent-Secret-Token: {agent_secret}

๐Ÿ”€ 1. Request Routing

Route a request to an agent based on capabilities.

Endpoint: POST /a2a/route

Request

{
  "capabilities": [
    {"key": "domain", "value": "flights"},
    {"key": "op", "value": "search"}
  ],
  "endpoint": "/a2a/search",
  "payload": {
    "query": "NYC to LAX",
    "date": "2025-12-01"
  },
  "timeout": 30
}

Response (Success)

{
  "success": true,
  "data": {
    "results": [
      {"flight": "AA123", "price": 299}
    ]
  },
  "agent_key": "flights-search-agent",
  "latency_ms": 245
}

Response (Failure)

{
  "success": false,
  "error": "No agents found matching capabilities",
  "latency_ms": 50
}

Error Scenarios

Error HTTP Status Description
No agents found 404 No agents match the capabilities
All agents failed 503 All candidate agents failed
Registry unavailable 503 Cannot connect to Registry
Agent timeout 504 Agent didn't respond in time

Routing Behavior

  1. Agent Discovery: Query Registry with capabilities
  2. Agent Selection: Pick agent with highest score (success_rate, latency, freshness)
  3. Request Forwarding: Forward to {agent.base_url}{endpoint}
  4. Retry Logic: On failure, try next agent (up to 3 total attempts)
  5. Stats Reporting: Report success/failure/latency to Registry

๐Ÿ“ก 2. Event Publishing

Publish an event to all subscribers.

Endpoint: POST /a2a/event

Request

{
  "event_type": "job.completed",
  "job_id": "job-123",
  "payload": {
    "status": "success",
    "result": "All tests passed",
    "duration": 120
  },
  "timeout": 30
}

Response (Success)

{
  "success": true,
  "delivered": 3,
  "failed": 0,
  "total_subscribers": 3,
  "latency_ms": 150,
  "errors": null
}

Response (Partial Failure)

{
  "success": true,
  "delivered": 2,
  "failed": 1,
  "total_subscribers": 3,
  "latency_ms": 200,
  "errors": [
    {
      "agent_key": "agent-c",
      "error": "Connection timeout"
    }
  ]
}

Response (No Subscribers)

{
  "success": true,
  "delivered": 0,
  "failed": 0,
  "total_subscribers": 0,
  "latency_ms": 10
}

Event Publishing Behavior

  1. Query Subscribers: Call Subscription Service with event details
  2. Subscriber Discovery: Get list of matching agents
  3. Parallel Fan-out: Deliver to all subscribers concurrently (up to 50 at once)
  4. Track Results: Count successful and failed deliveries
  5. Return Summary: Report delivery statistics to publisher

Event Delivery

  • Target Endpoint: {agent.base_url}/a2a/event/receive
  • Concurrency: Up to 50 parallel deliveries
  • Timeout: Configurable per event (default 30s)
  • Error Handling: Continue on partial failures

๐Ÿฅ Health & Monitoring

Health Endpoints

Liveness Probe

GET /health

Returns 200 if service is running.

Readiness Probe

GET /ready

Checks connectivity to: - Agent Registry - Subscription Service

Returns 200 if both are healthy, 503 otherwise.

Response:

{
  "service": "traylinx-router-agent",
  "ready": true,
  "registry": "healthy",
  "subscription_service": "healthy"
}

Metrics

GET /metrics

Returns in-memory metrics:

{
  "uptime_seconds": 3600,
  "routing_requests_total": 1523,
  "routing_requests_success": 1487,
  "routing_requests_failed": 36,
  "success_rate": 97.6,
  "latency_avg_ms": 234.5,
  "latency_p95_ms": 450.2,
  "latency_p99_ms": 890.1,
  "agent_selection_counts": {
    "flights-search-agent": 892,
    "hotels-search-agent": 595
  },
  "event_publishes": 234,
  "event_deliveries": 702,
  "event_failures": 12
}

Configuration

Environment Variables

Variable Default Description
REGISTRY_SERVICE_URL http://localhost:8000 Agent Registry URL
SUBSCRIPTION_SERVICE_URL http://localhost:8001 Subscription Service URL
ROUTER_TIMEOUT_SECONDS 30 Default request timeout
ROUTER_RETRY_ATTEMPTS 3 Max retry attempts
ROUTER_MAX_AGENTS_TO_TRY 3 Max agents to try per request
EVENT_FANOUT_TIMEOUT_SECONDS 30 Event delivery timeout
EVENT_MAX_PARALLEL_DELIVERIES 50 Max parallel event deliveries
ROUTER_AGENT_KEY traylinx-router-agent Router's agent key

A2A (Agent-to-Agent) Authentication

Incoming Requests

The Router protects all endpoints with @require_a2a_auth:

from traylinx_auth_client import require_a2a_auth
from fastapi import APIRouter

router = APIRouter()

@router.post("/a2a/route")
@require_a2a_auth
async def route_request(request: RouteRequest):
    # Only authenticated agents reach here
    ...

Outgoing Requests

When calling other services, the Router uses get_request_headers():

from traylinx_auth_client import get_request_headers
import httpx

headers = get_request_headers()
response = await client.post(
    agent_url,
    headers=headers,
    json=payload
)

This automatically includes: - Authorization: Bearer {access_token} - X-Agent-Key: {agent_key} - X-Agent-Secret-Token: {agent_secret}


๐Ÿ“ Project Structure

traylinx_router_agent/
โ”œโ”€โ”€ app/
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”œโ”€โ”€ main.py                    # FastAPI application
โ”‚   โ”œโ”€โ”€ config.py                  # Settings
โ”‚   โ”œโ”€โ”€ models.py                  # Request/Response models
โ”‚   โ”œโ”€โ”€ exceptions.py              # Custom exceptions
โ”‚   โ”œโ”€โ”€ dependencies.py            # Dependency injection
โ”‚   โ”œโ”€โ”€ routers/
โ”‚   โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”‚   โ”œโ”€โ”€ health.py              # Health checks
โ”‚   โ”‚   โ”œโ”€โ”€ router.py              # Routing endpoint
โ”‚   โ”‚   โ””โ”€โ”€ events.py              # Event endpoint
โ”‚   โ””โ”€โ”€ services/
โ”‚       โ”œโ”€โ”€ __init__.py
โ”‚       โ”œโ”€โ”€ registry_client.py     # Registry API client
โ”‚       โ”œโ”€โ”€ subscription_client.py # Subscription API client
โ”‚       โ”œโ”€โ”€ agent_client.py        # Agent communication
โ”‚       โ”œโ”€โ”€ routing_service.py     # Routing logic
โ”‚       โ”œโ”€โ”€ event_service.py       # Event fan-out logic
โ”‚       โ””โ”€โ”€ metrics.py             # Metrics collection
โ”œโ”€โ”€ tests/
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”œโ”€โ”€ test_router.py             # Unit tests
โ”‚   โ””โ”€โ”€ integration_test.py        # Integration tests
โ”œโ”€โ”€ pyproject.toml                 # Dependencies
โ”œโ”€โ”€ poetry.lock                    # Locked dependencies
โ”œโ”€โ”€ README.md                      # This file
โ””โ”€โ”€ API_REFERENCE.md              # Detailed API docs

๐Ÿงช Testing

Unit Tests

poetry run pytest tests/test_router.py -v

Integration Tests

Requires all services running:

# Terminal 1: Start Registry
cd ../traylinx_agent_registry
poetry run uvicorn app.main:app --port 8000

# Terminal 2: Start Subscription Service
cd ../traylinx_subscription_service
poetry run uvicorn app.main:app --port 8001

# Terminal 3: Start Router
cd ../traylinx_router_agent
poetry run uvicorn app.main:app --port 8080

# Terminal 4: Run tests
poetry run pytest tests/integration_test.py -v

๐Ÿ› Troubleshooting

Router Won't Start

# Check dependencies
poetry install

# Check Python version
python --version  # Should be 3.9+

# Run with debug logging
LOG_LEVEL=DEBUG poetry run uvicorn app.main:app --port 8080

Registry Connection Issues

# Test Registry connectivity
curl http://localhost:8000/health

# Check REGISTRY_SERVICE_URL
echo $REGISTRY_SERVICE_URL

Subscription Service Connection Issues

# Test Subscription Service connectivity
curl http://localhost:8001/health

# Check SUBSCRIPTION_SERVICE_URL
echo $SUBSCRIPTION_SERVICE_URL

Routing Failures

Common issues: 1. No agents found: Check Registry has agents with matching capabilities 2. All agents failed: Check target agents are running and healthy 3. Authentication errors: Verify A2A auth setup

Check logs for detailed error messages.


Performance

Routing Performance

  • Average Latency: ~200-300ms (Registry lookup + agent call)
  • P95 Latency: ~500ms
  • P99 Latency: ~1000ms
  • Throughput: 100+ requests/second (single instance)

Event Publishing Performance

  • Fan-out Latency: ~50-200ms for 10 subscribers
  • Parallel Deliveries: Up to 50 concurrent
  • Throughput: 50+ events/second (single instance)

Optimization Tips

  1. Use caching: Consider caching Registry responses
  2. Tune timeouts: Adjust based on agent response times
  3. Scale horizontally: Run multiple Router instances
  4. Monitor metrics: Watch /metrics for bottlenecks

๐Ÿšฆ Production Checklist

Before deploying to production:

  • [ ] Configure production service URLs
  • [ ] Set appropriate timeouts
  • [ ] Configure log aggregation
  • [ ] Add Prometheus metrics
  • [ ] Set up monitoring alerts
  • [ ] Load test routing and events
  • [ ] Configure container orchestration
  • [ ] Set up horizontal pod autoscaling
  • [ ] Review retry and timeout settings
  • [ ] Test failure scenarios

  • API Reference: API_REFERENCE.md
  • Agent Registry: ../traylinx_agent_registr../index.md
  • Subscription Service: ../traylinx_subscription_servic../index.md
  • Ecosystem Architecture: ../TRAYLINX_API_DOCUMENTATION.md
  • Development Status: ../TRAYLINX_DEVELOPMENT_STATUS.md
  • Sprint 2 Summary: ../SPRINT_2_COMPLETE.md

๐Ÿ”„ Version History

v2.0.0 (Sprint 2)

  • โœ… Added event publishing endpoint
  • โœ… Integrated Subscription Service client
  • โœ… Implemented parallel event fan-out
  • โœ… Added event metrics
  • โœ… Updated health checks

v1.0.0 (Sprint 1)

  • โœ… Capability-based routing
  • โœ… Registry integration
  • โœ… Retry logic
  • โœ… Performance tracking
  • โœ… A2A authentication

๐Ÿค Support

For issues or questions: 1. Check logs for error messages 2. Verify Registry and Subscription Service are healthy 3. Test A2A authentication 4. Review configuration settings


๐Ÿ“„ License

[Your License Here]


Built with: FastAPI, httpx, traylinx_auth_client, Pydantic v2