NAT Traversal & Circuit Relay v2¶

Last Updated: January 2026
Stargate Version: v0.2.0+

Overview¶

Traylinx Stargate v0.2.0 introduces comprehensive NAT traversal capabilities, enabling agents behind firewalls and NAT to communicate reliably through Circuit Relay v2.

The NAT Problem¶

Most agents run behind Network Address Translation (NAT), which prevents direct peer-to-peer connections:

Agent A (NAT)  ❌  Cannot connect directly  ❌  Agent B (NAT)
     │                                              │
     │         ✅ Solution: Relay Node ✅           │
     │                     │                        │
     └─────────────────────┴────────────────────────┘

Common NAT Scenarios: - Home networks behind routers - Corporate networks with firewalls - Cloud instances with security groups - Mobile devices on cellular networks

Circuit Relay v2¶

Circuit Relay v2 is a libp2p protocol that enables peers to communicate through public relay nodes.

Architecture¶

┌─────────────────────────────────────────────────────────────┐
│                    CIRCUIT RELAY v2 FLOW                    │
└─────────────────────────────────────────────────────────────┘

Peer A (NAT)              Relay Node              Peer B (NAT)
     │                    (Public IP)                   │
     │                         │                        │
     │  1. Connect to relay    │                        │
     │────────────────────────▶│                        │
     │                         │  2. Connect to relay   │
     │                         │◀───────────────────────│
     │                         │                        │
     │  3. Request relay to B  │                        │
     │────────────────────────▶│                        │
     │                         │  4. Establish circuit  │
     │                         │───────────────────────▶│
     │                         │                        │
     │  5. Data flows through relay                     │
     │◀────────────────────────┼───────────────────────▶│

Connection Priority¶

Stargate uses a 3-tier fallback strategy:

Direct P2P (Lowest latency) - Attempts direct connection first
Circuit Relay v2 (Medium latency) - Falls back to relay if direct fails
NATS Relay (Highest reliability) - Final fallback for maximum compatibility

Features¶

1. Automatic NAT Detection¶

Stargate automatically detects your NAT configuration:

from traylinx_stargate import StarGateNode

node = StarGateNode(display_name="my-agent")
await node.start()

# Check NAT status
status = node.get_status()
print(f"NAT Type: {status['nat_status']['nat_type']}")
print(f"Public Addresses: {status['nat_status']['public_addrs']}")
print(f"Requires Relay: {status['nat_status']['requires_relay']}")

NAT Types Detected: - public - Direct internet connection - private - Behind NAT (10.x, 172.16-31.x, 192.168.x) - symmetric - Symmetric NAT (requires relay) - unknown - Unable to determine

2. Connection Pooling¶

Reuses connections to frequently-contacted peers:

# Connections are automatically pooled
result1 = await node.call("peer_id", "method1", {})
result2 = await node.call("peer_id", "method2", {})  # Reuses connection

# Pool configuration
transport._connection_pool.get_stats()
# {'size': 5, 'max_size': 50, 'ttl': 300.0}

Benefits: - Reduced latency for repeated calls - Lower network overhead - Automatic connection cleanup

3. Connection Retry with Exponential Backoff¶

Automatic retry for transient failures:

# Retry is automatic with configurable parameters
result = await node.call(
    "peer_id",
    "method",
    payload,
    max_retries=3,           # Default: 3
    backoff_factor=2.0       # Default: 2.0 (exponential)
)

Retry Schedule: - Attempt 1: Immediate - Attempt 2: After 2 seconds - Attempt 3: After 4 seconds - Attempt 4: After 8 seconds

4. Comprehensive Metrics¶

Track connection performance:

# Get metrics for a specific peer
metrics = node.transport.get_metrics("peer_id")

print(f"Connection Type: {metrics.connection_type}")  # direct/relay/nats
print(f"Latency: {metrics.latency_ms}ms")
print(f"Success Rate: {metrics.success_rate * 100}%")
print(f"Total Requests: {metrics.total_requests}")
print(f"Failed Requests: {metrics.failed_requests}")

5. Relay Health Monitoring¶

Automatic health checks for relay nodes:

# Health checks run every 5 seconds
# Failures detected within 10 seconds

# Get relay health status
health = node.transport.get_relay_health_status()

for relay, status in health.items():
    print(f"Relay: {relay}")
    print(f"  Healthy: {status['is_healthy']}")
    print(f"  Consecutive Failures: {status['consecutive_failures']}")
    print(f"  Last Success: {status['last_success']}")

Using Circuit Relay v2¶

Quick Start¶

import asyncio
from traylinx_stargate import StarGateNode

async def main():
    # Create node with libp2p transport
    node = StarGateNode(
        display_name="my-agent",
        transport="libp2p"
    )
    await node.start()

    # Enable Circuit Relay v2 (uses default Traylinx relays)
    await node.transport.enable_circuit_relay_v2()

    # Now you can connect to peers through relays
    result = await node.call("peer_id", "ping", {"message": "hello"})
    print(result)

    await node.stop()

asyncio.run(main())

Custom Relay Nodes¶

Use your own relay infrastructure:

# Configure custom relay nodes
custom_relays = [
    "/ip4/203.0.113.42/tcp/4001/p2p/QmRelay1...",
    "/ip4/203.0.113.43/tcp/4001/p2p/QmRelay2...",
]

await node.transport.enable_circuit_relay_v2(relay_addrs=custom_relays)

Running a Relay Node¶

Deploy your own relay node for private networks:

# Start a relay node
traylinx stargate relay --port 4001 --max-connections 1000

# The relay will display its multiaddr:
# Relay node started: QmYourRelayPeerID...
# Multiaddrs: ['/ip4/203.0.113.42/tcp/4001/p2p/QmYourRelayPeerID...']

See Also: Relay Node Deployment Guide

Default Relay Nodes¶

Stargate includes default relay nodes operated by Traylinx:

Relay	Address	Location	Status
relay1.traylinx.io	`/ip4/relay1.traylinx.io/tcp/4001/p2p/QmRelay1...`	US East	Planned
relay2.traylinx.io	`/ip4/relay2.traylinx.io/tcp/4001/p2p/QmRelay2...`	EU West	Planned

Note: Default relay addresses are placeholders and will be updated when production relays are deployed.

Performance Characteristics¶

Latency Comparison¶

Connection Type	Typical Latency	Use Case
Direct P2P	5-50ms	Same network or public IPs
Circuit Relay v2	50-200ms	NAT traversal required
NATS Relay	100-300ms	Maximum compatibility

Resource Usage¶

Connection Pool: - Default: 50 connections max - TTL: 300 seconds (5 minutes) - Memory: ~2-4 MB per connection

Relay Health Checks: - Interval: 5 seconds - Failure detection: < 10 seconds - Bandwidth: Minimal (~1 KB/check)

Troubleshooting¶

Connection Failures¶

Symptom: Cannot connect to peer

Diagnosis:

# Check NAT status
status = node.get_status()
print(status['nat_status'])

# Check relay status
if node.transport._circuit_relay_enabled:
    print(f"Relays: {node.transport._connected_relays}")
else:
    print("Circuit Relay v2 not enabled")

Solutions: 1. Enable Circuit Relay v2: await node.transport.enable_circuit_relay_v2() 2. Check relay health: node.transport.get_relay_health_status() 3. Verify firewall allows outbound connections on port 4001

High Latency¶

Symptom: Slow response times

Diagnosis:

# Check connection metrics
metrics = node.transport.get_metrics("peer_id")
print(f"Connection Type: {metrics.connection_type}")
print(f"Latency: {metrics.latency_ms}ms")

Solutions: 1. If using relay, deploy geographically closer relay nodes 2. Check network connectivity 3. Consider direct connection if both peers have public IPs

Relay Failures¶

Symptom: Relay connections failing

Diagnosis:

# Check relay health
health = node.transport.get_relay_health_status()
for relay, status in health.items():
    if not status['is_healthy']:
        print(f"Unhealthy relay: {relay}")
        print(f"Failures: {status['consecutive_failures']}")

Solutions: 1. Configure multiple relay nodes for redundancy 2. Deploy your own relay nodes 3. Check relay node logs for issues

Best Practices¶

1. Use Multiple Relays¶

Configure at least 2-3 relay nodes for redundancy:

relays = [
    "/ip4/relay1.example.com/tcp/4001/p2p/QmRelay1...",
    "/ip4/relay2.example.com/tcp/4001/p2p/QmRelay2...",
    "/ip4/relay3.example.com/tcp/4001/p2p/QmRelay3...",
]
await node.transport.enable_circuit_relay_v2(relay_addrs=relays)

2. Monitor Connection Metrics¶

Track performance over time:

# Periodically check metrics
all_metrics = node.transport.get_metrics()
for peer_id, metrics in all_metrics.items():
    if metrics.success_rate < 0.9:  # Less than 90% success
        print(f"Warning: Low success rate for {peer_id}")

3. Handle Connection Failures Gracefully¶

Implement retry logic in your application:

async def call_with_fallback(node, peer_id, method, payload):
    try:
        return await node.call(peer_id, method, payload)
    except ConnectionError:
        # Try alternative peer or method
        return await fallback_method(payload)

4. Deploy Relay Nodes for Production¶

For production deployments, run your own relay infrastructure:

Deploy in multiple geographic regions
Use high-bandwidth servers (1+ Gbps)
Monitor relay health and performance
Configure appropriate connection limits

See: Relay Node Deployment Guide

Security Considerations¶

Relay Node Trust¶

Relay nodes can see: - Connection metadata (who is connecting to whom) - Message sizes and timing

Relay nodes cannot see: - Message content (encrypted end-to-end) - Private keys or identities

Recommendation: Only use trusted relay nodes or deploy your own.

End-to-End Encryption¶

All messages are encrypted regardless of connection type:

# Messages are automatically encrypted with Noise protocol
# No additional configuration needed
result = await node.call("peer_id", "method", {"sensitive": "data"})

Rate Limiting¶

Relay nodes enforce connection limits:

relay:
  max_connections: 1000  # Per relay node
  bandwidth_limit_mbps: 100  # Future feature

API Reference¶

`enable_circuit_relay_v2(relay_addrs=None, hop_enabled=False)`¶

Enable Circuit Relay v2 for NAT traversal.

Parameters: - relay_addrs (list[str] | None): List of relay node multiaddrs. If None, uses default Traylinx relay nodes. - hop_enabled (bool): If True, this node can act as a relay for others. Default: False

Returns: - bool: True if relay was successfully enabled

Raises: - ConnectionError: If not connected to the network

`get_metrics(peer_id=None)`¶

Get connection metrics for one or all peers.

Parameters: - peer_id (str | None): Specific peer ID. If None, returns all metrics.

Returns: - dict[str, ConnectionMetrics] or ConnectionMetrics or None

`get_relay_health_status(relay_addr=None)`¶

Get health status for relay nodes.

Parameters: - relay_addr (str | None): Specific relay address. If None, returns all relay health status.

Returns: - dict: Health status information

NAT Traversal & Circuit Relay v2¶

Overview¶

The NAT Problem¶

Circuit Relay v2¶

Architecture¶

Connection Priority¶

Features¶

1. Automatic NAT Detection¶

2. Connection Pooling¶

3. Connection Retry with Exponential Backoff¶

4. Comprehensive Metrics¶

5. Relay Health Monitoring¶

Using Circuit Relay v2¶

Quick Start¶

Custom Relay Nodes¶

Running a Relay Node¶

Default Relay Nodes¶

Performance Characteristics¶

Latency Comparison¶

Resource Usage¶

Troubleshooting¶

Connection Failures¶

High Latency¶

Relay Failures¶

Best Practices¶

1. Use Multiple Relays¶

2. Monitor Connection Metrics¶

3. Handle Connection Failures Gracefully¶

4. Deploy Relay Nodes for Production¶

Security Considerations¶

Relay Node Trust¶

End-to-End Encryption¶

Rate Limiting¶

API Reference¶

enable_circuit_relay_v2(relay_addrs=None, hop_enabled=False)¶

get_metrics(peer_id=None)¶

get_relay_health_status(relay_addr=None)¶

See Also¶

`enable_circuit_relay_v2(relay_addrs=None, hop_enabled=False)`¶

`get_metrics(peer_id=None)`¶

`get_relay_health_status(relay_addr=None)`¶