Troubleshooting AgentWeave
This section helps you diagnose and resolve issues when working with AgentWeave. Whether you're encountering identity problems, authorization failures, or connectivity issues, we'll guide you through systematic troubleshooting.
How to Approach Troubleshooting
AgentWeave's security architecture has multiple layers, so issues can occur at different levels. Follow this systematic approach:
1. Identify the Layer
Determine which layer is failing:
1
2
3
4
5
6
7
8
9
┌─────────────────────────────────────┐
│ Application Logic │ ← Your code
├─────────────────────────────────────┤
│ Authorization (OPA) │ ← Policy enforcement
├─────────────────────────────────────┤
│ Transport (mTLS) │ ← Network communication
├─────────────────────────────────────┤
│ Identity (SPIFFE) │ ← Cryptographic identity
└─────────────────────────────────────┘
Common symptoms by layer:
- Identity: "Cannot connect to SPIRE", "SVID expired"
- Transport: "Connection timeout", "TLS handshake failed"
- Authorization: "Policy denied request", "OPA connection refused"
- Application: Task errors, invalid payloads, business logic issues
2. Gather Information
Collect diagnostic data before attempting fixes:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
# Check agent health
curl https://localhost:8443/health
# View agent configuration
agentweave validate config.yaml
# Check identity status
agentweave identity show
# Test authorization
agentweave authz check --caller CALLER_ID --target TARGET_ID --action ACTION
# View logs with timestamps
agentweave logs --follow --timestamps
3. Check the Basics
Before diving deep, verify fundamentals:
- Infrastructure running (SPIRE, OPA)
- Network connectivity
- Configuration file valid
- SPIRE registration exists
- Correct SPIFFE socket path
- Sufficient permissions
4. Enable Debug Logging
Get more detailed information:
1
2
3
4
5
# config.yaml
observability:
logging:
level: "DEBUG"
format: "json" # Structured logs for easier parsing
5. Isolate the Problem
Test components individually:
1
2
3
4
5
6
7
8
9
10
11
# Test SPIRE connectivity
agentweave spire check
# Test OPA connectivity
agentweave opa check
# Test mTLS to target agent
agentweave ping TARGET_SPIFFE_ID
# Validate configuration
agentweave validate config.yaml
Quick Diagnostic Commands
Check Infrastructure Status
1
2
3
4
5
6
7
8
9
10
11
# SPIRE Server health
docker exec spire-server spire-server healthcheck
# SPIRE Agent health
docker exec spire-agent spire-agent healthcheck
# OPA health
curl http://localhost:8181/health
# List SPIRE entries
docker exec spire-server spire-server entry show
Test Connectivity
1
2
3
4
5
6
7
8
# Test local agent health
curl https://localhost:8443/health
# Test agent-to-agent connectivity
agentweave ping spiffe://example.com/agent/target
# Check agent card discovery
curl https://target-agent:8443/.well-known/agent.json
View Logs
1
2
3
4
5
6
7
8
9
10
11
# Agent logs
agentweave logs --level DEBUG
# SPIRE Server logs
docker logs spire-server --tail 100 --follow
# SPIRE Agent logs
docker logs spire-agent --tail 100 --follow
# OPA logs
docker logs opa --tail 100 --follow
Diagnostic Tools
AgentWeave provides several CLI tools for troubleshooting:
agentweave validate
Validates configuration files:
1
2
3
4
5
6
7
8
# Validate config
agentweave validate config.yaml
# Validate with environment substitution
agentweave validate config.yaml --env production
# Strict validation (enforce all security requirements)
agentweave validate config.yaml --strict
agentweave identity
Check identity status:
1
2
3
4
5
6
7
8
# Show current identity
agentweave identity show
# Show SVID details
agentweave identity show --verbose
# Test SPIRE connection
agentweave identity test
agentweave authz
Test authorization policies:
1
2
3
4
5
6
7
8
9
10
11
# Check if caller can perform action
agentweave authz check \
--caller spiffe://example.com/agent/caller \
--target spiffe://example.com/agent/target \
--action process_data
# Test with custom input
agentweave authz check --input input.json
# Show policy decision trace
agentweave authz check --trace
agentweave ping
Test connectivity to other agents:
1
2
3
4
5
6
7
8
# Ping an agent
agentweave ping spiffe://example.com/agent/target
# Ping with timeout
agentweave ping spiffe://example.com/agent/target --timeout 5s
# Verbose output
agentweave ping spiffe://example.com/agent/target --verbose
agentweave health
Check overall agent health:
1
2
3
4
5
6
7
8
# Health check
agentweave health
# Detailed health report
agentweave health --verbose
# JSON output for monitoring
agentweave health --format json
Common Issue Categories
Identity Issues
Problems with SPIFFE/SPIRE and cryptographic identity:
- Cannot connect to SPIRE agent
- SVID expired or invalid
- Trust domain mismatch
- Registration entry not found
Authorization Issues
Problems with OPA and policy enforcement:
- OPA connection refused
- Policy denied request
- Circuit breaker open
- Policy compilation errors
Transport Issues
Problems with mTLS and network connectivity:
- Connection timeout
- TLS handshake failed
- Peer verification failed
- Certificate verification errors
Configuration Issues
Problems with configuration files and validation:
- Configuration validation failed
- Required field missing
- Security violations in production
- Invalid YAML syntax
A2A Protocol Issues
Problems with Agent-to-Agent communication:
- Agent not discovered
- Invalid task state
- Malformed requests
- Protocol version mismatch
Getting Help
If you can't resolve your issue:
1. Search Existing Issues
Check if others have encountered the same problem:
2. Ask the Community
For questions and discussions:
3. Report a Bug
If you've found a bug, please create an issue with:
- AgentWeave version (
agentweave --version) - Python version
- Operating system
- Configuration (sanitized)
- Full error message and stack trace
- Steps to reproduce
- Expected vs actual behavior
See Getting Help for detailed guidance on reporting issues.
Troubleshooting Resources
- Common Issues - Quick solutions to frequent problems
- Debugging Guide - Deep-dive debugging techniques
- FAQ - Frequently asked questions
- Support - How to get help
Prevention Best Practices
Avoid common issues by following these practices:
Development
- Always validate configs: Run
agentweave validatebefore deploying - Use debug logging: Start with
DEBUGlevel logging during development - Test policies locally: Use
agentweave authz checkbefore deploying policies - Monitor SVID expiry: Set TTL appropriately and monitor rotation
Production
- Health checks: Implement readiness and liveness probes
- Monitoring: Set up metrics and alerts (see Monitoring Guide)
- Log aggregation: Send logs to centralized logging system
- Audit trails: Enable audit logging for security events
- Graceful degradation: Handle SPIRE/OPA failures appropriately
Security
- Default deny: Always use
default_action: denyin production - Least privilege: Grant minimal required permissions
- Regular rotation: Use short SVID TTLs (1 hour or less)
- Trust domain validation: Verify allowed trust domains
- TLS 1.3 only: Enforce modern TLS versions
Next Steps:
- Common Issues - Quick solutions
- Debugging Guide - Deep troubleshooting
- FAQ - Common questions