Troubleshooting AgentWeave

This section helps you diagnose and resolve issues when working with AgentWeave. Whether you're encountering identity problems, authorization failures, or connectivity issues, we'll guide you through systematic troubleshooting.

How to Approach Troubleshooting

AgentWeave's security architecture has multiple layers, so issues can occur at different levels. Follow this systematic approach:

1. Identify the Layer

Determine which layer is failing:

1
2
3
4
5
6
7
8
9
┌─────────────────────────────────────┐
│ Application Logic                   │  ← Your code
├─────────────────────────────────────┤
│ Authorization (OPA)                 │  ← Policy enforcement
├─────────────────────────────────────┤
│ Transport (mTLS)                    │  ← Network communication
├─────────────────────────────────────┤
│ Identity (SPIFFE)                   │  ← Cryptographic identity
└─────────────────────────────────────┘

Common symptoms by layer:

  • Identity: "Cannot connect to SPIRE", "SVID expired"
  • Transport: "Connection timeout", "TLS handshake failed"
  • Authorization: "Policy denied request", "OPA connection refused"
  • Application: Task errors, invalid payloads, business logic issues

2. Gather Information

Collect diagnostic data before attempting fixes:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
# Check agent health
curl https://localhost:8443/health

# View agent configuration
agentweave validate config.yaml

# Check identity status
agentweave identity show

# Test authorization
agentweave authz check --caller CALLER_ID --target TARGET_ID --action ACTION

# View logs with timestamps
agentweave logs --follow --timestamps

3. Check the Basics

Before diving deep, verify fundamentals:

  • Infrastructure running (SPIRE, OPA)
  • Network connectivity
  • Configuration file valid
  • SPIRE registration exists
  • Correct SPIFFE socket path
  • Sufficient permissions

4. Enable Debug Logging

Get more detailed information:

1
2
3
4
5
# config.yaml
observability:
  logging:
    level: "DEBUG"
    format: "json"  # Structured logs for easier parsing

5. Isolate the Problem

Test components individually:

1
2
3
4
5
6
7
8
9
10
11
# Test SPIRE connectivity
agentweave spire check

# Test OPA connectivity
agentweave opa check

# Test mTLS to target agent
agentweave ping TARGET_SPIFFE_ID

# Validate configuration
agentweave validate config.yaml

Quick Diagnostic Commands

Check Infrastructure Status

1
2
3
4
5
6
7
8
9
10
11
# SPIRE Server health
docker exec spire-server spire-server healthcheck

# SPIRE Agent health
docker exec spire-agent spire-agent healthcheck

# OPA health
curl http://localhost:8181/health

# List SPIRE entries
docker exec spire-server spire-server entry show

Test Connectivity

1
2
3
4
5
6
7
8
# Test local agent health
curl https://localhost:8443/health

# Test agent-to-agent connectivity
agentweave ping spiffe://example.com/agent/target

# Check agent card discovery
curl https://target-agent:8443/.well-known/agent.json

View Logs

1
2
3
4
5
6
7
8
9
10
11
# Agent logs
agentweave logs --level DEBUG

# SPIRE Server logs
docker logs spire-server --tail 100 --follow

# SPIRE Agent logs
docker logs spire-agent --tail 100 --follow

# OPA logs
docker logs opa --tail 100 --follow

Diagnostic Tools

AgentWeave provides several CLI tools for troubleshooting:

agentweave validate

Validates configuration files:

1
2
3
4
5
6
7
8
# Validate config
agentweave validate config.yaml

# Validate with environment substitution
agentweave validate config.yaml --env production

# Strict validation (enforce all security requirements)
agentweave validate config.yaml --strict

agentweave identity

Check identity status:

1
2
3
4
5
6
7
8
# Show current identity
agentweave identity show

# Show SVID details
agentweave identity show --verbose

# Test SPIRE connection
agentweave identity test

agentweave authz

Test authorization policies:

1
2
3
4
5
6
7
8
9
10
11
# Check if caller can perform action
agentweave authz check \
  --caller spiffe://example.com/agent/caller \
  --target spiffe://example.com/agent/target \
  --action process_data

# Test with custom input
agentweave authz check --input input.json

# Show policy decision trace
agentweave authz check --trace

agentweave ping

Test connectivity to other agents:

1
2
3
4
5
6
7
8
# Ping an agent
agentweave ping spiffe://example.com/agent/target

# Ping with timeout
agentweave ping spiffe://example.com/agent/target --timeout 5s

# Verbose output
agentweave ping spiffe://example.com/agent/target --verbose

agentweave health

Check overall agent health:

1
2
3
4
5
6
7
8
# Health check
agentweave health

# Detailed health report
agentweave health --verbose

# JSON output for monitoring
agentweave health --format json

Common Issue Categories

Identity Issues

Problems with SPIFFE/SPIRE and cryptographic identity:

  • Cannot connect to SPIRE agent
  • SVID expired or invalid
  • Trust domain mismatch
  • Registration entry not found

Authorization Issues

Problems with OPA and policy enforcement:

  • OPA connection refused
  • Policy denied request
  • Circuit breaker open
  • Policy compilation errors

Transport Issues

Problems with mTLS and network connectivity:

  • Connection timeout
  • TLS handshake failed
  • Peer verification failed
  • Certificate verification errors

Configuration Issues

Problems with configuration files and validation:

  • Configuration validation failed
  • Required field missing
  • Security violations in production
  • Invalid YAML syntax

A2A Protocol Issues

Problems with Agent-to-Agent communication:

  • Agent not discovered
  • Invalid task state
  • Malformed requests
  • Protocol version mismatch

Getting Help

If you can't resolve your issue:

1. Search Existing Issues

Check if others have encountered the same problem:

2. Ask the Community

For questions and discussions:

3. Report a Bug

If you've found a bug, please create an issue with:

  • AgentWeave version (agentweave --version)
  • Python version
  • Operating system
  • Configuration (sanitized)
  • Full error message and stack trace
  • Steps to reproduce
  • Expected vs actual behavior

See Getting Help for detailed guidance on reporting issues.

Troubleshooting Resources

Prevention Best Practices

Avoid common issues by following these practices:

Development

  • Always validate configs: Run agentweave validate before deploying
  • Use debug logging: Start with DEBUG level logging during development
  • Test policies locally: Use agentweave authz check before deploying policies
  • Monitor SVID expiry: Set TTL appropriately and monitor rotation

Production

  • Health checks: Implement readiness and liveness probes
  • Monitoring: Set up metrics and alerts (see Monitoring Guide)
  • Log aggregation: Send logs to centralized logging system
  • Audit trails: Enable audit logging for security events
  • Graceful degradation: Handle SPIRE/OPA failures appropriately

Security

  • Default deny: Always use default_action: deny in production
  • Least privilege: Grant minimal required permissions
  • Regular rotation: Use short SVID TTLs (1 hour or less)
  • Trust domain validation: Verify allowed trust domains
  • TLS 1.3 only: Enforce modern TLS versions

Next Steps: