Audit Logging Guide
Audit logging is critical for security monitoring, compliance, incident response, and forensics. AgentWeave provides comprehensive audit logging for all security-relevant events.
Table of Contents
- Audit Logging Guide
What Gets Logged
AgentWeave logs security-relevant events at multiple layers:
1. Authorization Decisions
Every authorization check is logged:
1
2
3
4
5
6
7
8
9
10
11
12
13
{
"timestamp": "2024-01-15T10:30:00.123Z",
"level": "info",
"event_type": "authorization",
"caller_spiffe_id": "spiffe://example.com/agent/api-gateway/prod",
"callee_spiffe_id": "spiffe://example.com/agent/data-processor/prod",
"capability": "process_data",
"action": "execute",
"decision": "allow",
"reason": "same_trust_domain",
"trace_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"span_id": "1234567890abcdef"
}
Logged fields:
timestamp: ISO 8601 timestamp with millisecondsevent_type: Type of event (authorization, capability_call, etc.)caller_spiffe_id: Who made the requestcallee_spiffe_id: Who received the requestcapability: Capability being invokedaction: Specific action (execute, query, etc.)decision: allow or denyreason: Why access was allowed/denied (from OPA)trace_id: Distributed trace ID for correlationspan_id: Span ID for detailed tracing
2. Capability Invocations
Every capability call is logged:
1
2
3
4
5
6
7
8
9
10
11
12
{
"timestamp": "2024-01-15T10:30:00.456Z",
"level": "info",
"event_type": "capability_call",
"caller_spiffe_id": "spiffe://example.com/agent/api-gateway/prod",
"callee_spiffe_id": "spiffe://example.com/agent/data-processor/prod",
"capability": "process_data",
"status": "success",
"duration_ms": 123.45,
"trace_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"span_id": "abcdef1234567890"
}
3. Identity Events
SVID rotation and identity changes:
1
2
3
4
5
6
7
8
9
{
"timestamp": "2024-01-15T10:15:00.789Z",
"level": "info",
"event_type": "svid_update",
"spiffe_id": "spiffe://example.com/agent/data-processor/prod",
"expiry": "2024-01-15T11:15:00Z",
"ttl_seconds": 3600,
"trust_domain": "example.com"
}
4. Authentication Events
mTLS handshake results:
1
2
3
4
5
6
7
8
9
10
{
"timestamp": "2024-01-15T10:30:00.012Z",
"level": "info",
"event_type": "authentication",
"peer_spiffe_id": "spiffe://example.com/agent/api-gateway/prod",
"peer_trust_domain": "example.com",
"tls_version": "1.3",
"cipher_suite": "TLS_AES_256_GCM_SHA384",
"status": "success"
}
5. Security Events
Anomalies and security-relevant events:
1
2
3
4
5
6
7
8
9
{
"timestamp": "2024-01-15T10:30:05.678Z",
"level": "warning",
"event_type": "security_event",
"description": "High rate of authorization denials",
"caller_spiffe_id": "spiffe://unknown.com/agent/suspicious",
"denial_count": 50,
"time_window_seconds": 60
}
6. Agent Lifecycle Events
Startup, shutdown, configuration changes:
1
2
3
4
5
6
7
8
{
"timestamp": "2024-01-15T10:00:00.000Z",
"level": "info",
"event_type": "agent_start",
"agent_spiffe_id": "spiffe://example.com/agent/data-processor/prod",
"version": "1.0.0",
"config_hash": "sha256:abc123..."
}
Audit Log Configuration
Basic Configuration
Enable audit logging in your agent configuration:
1
2
3
4
5
observability:
audit_log:
enabled: true
level: "info" # debug, info, warning, error
format: "json" # json or text
Log Levels
Choose appropriate log level:
1
2
3
observability:
audit_log:
level: "info"
Levels:
debug: All events including verbose diagnosticsinfo: Normal operational events (recommended)warning: Warnings and errors onlyerror: Errors only
Recommendations:
- Production:
info(captures all security events) - Development:
debug(helps debugging) - High-volume:
warning(reduces log volume)
Field Selection
Control which fields are logged:
1
2
3
4
5
6
7
8
9
10
11
12
observability:
audit_log:
fields:
- "timestamp"
- "event_type"
- "caller_spiffe_id"
- "callee_spiffe_id"
- "capability"
- "action"
- "decision"
- "reason"
- "trace_id"
Payload Logging
Warning: Logging payloads can expose sensitive data.
1
2
3
4
5
6
7
8
9
10
observability:
audit_log:
include_payloads: false # Recommended for production
# If you must log payloads, redact sensitive fields
redact_fields:
- "password"
- "ssn"
- "credit_card"
- "api_key"
Best Practice: Never log payloads in production unless required for compliance and properly secured.
Log Destinations
1. File Destination
Write logs to local file:
1
2
3
4
5
6
7
8
observability:
audit_log:
destination: "file"
file_path: "/var/log/agentweave/audit.log"
max_size_mb: 100
max_backups: 10
max_age_days: 30
compress: true
Considerations:
- Set up log rotation (max_size_mb, max_backups)
- Ensure sufficient disk space
- Protect file with proper permissions (600)
- Not recommended for production (use centralized logging)
2. Syslog Destination
Send logs to syslog server:
1
2
3
4
5
6
7
observability:
audit_log:
destination: "syslog"
syslog_address: "logs.example.com:514"
syslog_protocol: "tcp" # tcp or udp
syslog_facility: "local0"
syslog_tag: "agentweave-audit"
Protocols:
tcp: Reliable delivery (recommended)udp: Lower overhead, may lose logstls: Encrypted syslog (port 6514)
TLS Syslog:
1
2
3
4
5
6
7
observability:
audit_log:
destination: "syslog"
syslog_address: "logs.example.com:6514"
syslog_protocol: "tls"
syslog_tls_verify: true
syslog_tls_ca_cert: "/etc/ssl/syslog-ca.pem"
3. Cloud Logging
AWS CloudWatch
1
2
3
4
5
6
observability:
audit_log:
destination: "cloudwatch"
cloudwatch_group: "/aws/agentweave/audit"
cloudwatch_stream: "agent-data-processor-prod"
cloudwatch_region: "us-east-1"
Google Cloud Logging
1
2
3
4
5
observability:
audit_log:
destination: "gcp_logging"
gcp_project: "my-project"
gcp_log_name: "agentweave-audit"
Azure Monitor
1
2
3
4
5
observability:
audit_log:
destination: "azure_monitor"
workspace_id: "12345678-1234-1234-1234-123456789012"
workspace_key_env: "AZURE_WORKSPACE_KEY"
4. SIEM Integration
Splunk
1
2
3
4
5
6
7
8
observability:
audit_log:
destination: "splunk"
splunk_url: "https://splunk.example.com:8088"
splunk_token_env: "SPLUNK_HEC_TOKEN"
splunk_index: "agentweave_audit"
splunk_source: "agentweave"
splunk_sourcetype: "agentweave:audit"
Elastic Stack (ELK)
1
2
3
4
5
6
observability:
audit_log:
destination: "elasticsearch"
elasticsearch_url: "https://elasticsearch.example.com:9200"
elasticsearch_index: "agentweave-audit"
elasticsearch_api_key_env: "ELASTIC_API_KEY"
Datadog
1
2
3
4
5
6
7
observability:
audit_log:
destination: "datadog"
datadog_api_key_env: "DD_API_KEY"
datadog_site: "datadoghq.com"
datadog_service: "agentweave"
datadog_source: "audit"
Log Retention
Retention Requirements
Configure retention based on compliance needs:
| Compliance | Minimum Retention |
|---|---|
| SOC 2 | 1 year |
| HIPAA | 6 years |
| PCI DSS | 1 year (3 months online) |
| GDPR | As needed for purpose |
| FedRAMP | 1 year |
Retention Configuration
In Cloud Logging
AWS CloudWatch:
1
2
3
aws logs put-retention-policy \
--log-group-name /aws/agentweave/audit \
--retention-in-days 2555 # 7 years for HIPAA
GCP Logging:
1
2
3
gcloud logging buckets update _Default \
--location=global \
--retention-days=2555
Azure Monitor:
1
2
3
4
az monitor log-analytics workspace update \
--resource-group myResourceGroup \
--workspace-name myWorkspace \
--retention-time 2555
In SIEM
Configure retention in your SIEM:
Splunk:
1
2
3
4
5
6
[agentweave_audit]
coldPath = $SPLUNK_DB/agentweave_audit/colddb
homePath = $SPLUNK_DB/agentweave_audit/db
thawedPath = $SPLUNK_DB/agentweave_audit/thaweddb
maxTotalDataSizeMB = 500000
frozenTimePeriodInSecs = 220752000 # 7 years
Archive to Cold Storage
For long-term retention, archive to object storage:
1
2
3
4
5
6
7
8
9
10
# Example: Archive to S3 after 90 days
observability:
audit_log:
destination: "cloudwatch"
cloudwatch_group: "/aws/agentweave/audit"
archive:
enabled: true
after_days: 90
s3_bucket: "agentweave-audit-archive"
s3_prefix: "audit-logs/"
Log Analysis
Common Queries
Find All Access by Specific Agent
Splunk:
index=agentweave_audit caller_spiffe_id="spiffe://example.com/agent/api-gateway/prod"
| table timestamp, callee_spiffe_id, capability, decision
Elastic:
1
2
3
4
5
6
7
{
"query": {
"term": {
"caller_spiffe_id": "spiffe://example.com/agent/api-gateway/prod"
}
}
}
Find All Authorization Denials
Splunk:
index=agentweave_audit event_type=authorization decision=deny
| stats count by caller_spiffe_id, reason
| sort -count
Elastic:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
{
"query": {
"bool": {
"must": [
{"term": {"event_type": "authorization"}},
{"term": {"decision": "deny"}}
]
}
},
"aggs": {
"by_caller": {
"terms": {"field": "caller_spiffe_id"},
"aggs": {
"by_reason": {
"terms": {"field": "reason"}
}
}
}
}
}
Find Access to Specific Capability
Splunk:
index=agentweave_audit capability="process_sensitive_data"
| table timestamp, caller_spiffe_id, decision, duration_ms
Trace Specific Request
Splunk:
index=agentweave_audit trace_id="a1b2c3d4-e5f6-7890-abcd-ef1234567890"
| sort timestamp
| table timestamp, event_type, caller_spiffe_id, callee_spiffe_id, capability, decision
Find High-Volume Callers
Splunk:
index=agentweave_audit event_type=capability_call
| stats count by caller_spiffe_id
| sort -count
| head 20
Security Queries
Detect Brute Force Attempts
Splunk:
index=agentweave_audit event_type=authorization decision=deny
| bin _time span=1m
| stats count by _time, caller_spiffe_id
| where count > 10
Detect Unusual Access Patterns
Splunk:
index=agentweave_audit event_type=authorization
| stats count by caller_spiffe_id, callee_spiffe_id, capability
| where count < 10 # Unusual/rare combinations
Find Access Outside Business Hours
Splunk:
index=agentweave_audit event_type=capability_call
| eval hour=strftime(_time, "%H")
| where hour < 6 OR hour > 20
| table timestamp, caller_spiffe_id, capability
Detect Lateral Movement
Splunk:
index=agentweave_audit event_type=capability_call
| stats dc(callee_spiffe_id) as unique_targets by caller_spiffe_id
| where unique_targets > 10 # Calling many different agents
Alerting on Security Events
Prometheus Alerts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
groups:
- name: agentweave-security
rules:
# High denial rate
- alert: HighAuthzDenialRate
expr: rate(agentweave_authz_denied_total[5m]) > 10
for: 5m
labels:
severity: warning
annotations:
summary: "High authorization denial rate"
description: " denials per second in last 5 minutes"
# Unknown caller
- alert: UnknownCallerAttempt
expr: agentweave_authz_denied_total{reason="unknown_caller"} > 0
labels:
severity: critical
annotations:
summary: "Unknown agent attempted access"
description: "Agent not recognized"
# SVID rotation failure
- alert: SVIDRotationFailed
expr: agentweave_svid_rotation_errors_total > 0
labels:
severity: critical
annotations:
summary: "SVID rotation failed"
description: "Agent failed to rotate SVID"
# Unusual capability usage
- alert: UnusualAdminCapability
expr: rate(agentweave_capability_calls_total{capability="admin"}[1h]) > 1
labels:
severity: warning
annotations:
summary: "Unusual admin capability usage"
SIEM Alerts
Splunk Alert: Multiple Failures from Same Caller
index=agentweave_audit event_type=authorization decision=deny
| bin _time span=5m
| stats count by _time, caller_spiffe_id
| where count > 20
Action: Send email, create ticket, trigger webhook
Elastic Watcher: Access to Sensitive Capability
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
{
"trigger": {
"schedule": {"interval": "5m"}
},
"input": {
"search": {
"request": {
"indices": ["agentweave-audit"],
"body": {
"query": {
"bool": {
"must": [
{"term": {"capability": "delete_all_data"}},
{"range": {"timestamp": {"gte": "now-5m"}}}
]
}
}
}
}
}
},
"condition": {
"compare": {"ctx.payload.hits.total": {"gt": 0}}
},
"actions": {
"send_email": {
"email": {
"to": "security@example.com",
"subject": "Critical: delete_all_data capability invoked",
"body": "Someone invoked delete_all_data capability. Review immediately."
}
}
}
}
Compliance Reporting
SOC 2 Audit Report
Generate report of all authorization decisions:
Splunk:
index=agentweave_audit event_type=authorization
earliest=-30d@d latest=now
| stats count by decision, reason
| eval total=sum(count)
| eval percentage=round((count/total)*100, 2)
| table decision, reason, count, percentage
HIPAA Access Report
Who accessed PHI and when:
Splunk:
index=agentweave_audit capability="get_patient_data"
earliest=-1y@y latest=now
| table timestamp, caller_spiffe_id, decision, trace_id
| sort timestamp desc
PCI DSS Cardholder Data Access
Splunk:
index=agentweave_audit capability="process_payment"
earliest=-1y@y latest=now
| stats count by caller_spiffe_id, decision
| table caller_spiffe_id, decision, count
Log Security
Protect Log Files
If using file destination:
1
2
3
4
5
6
# Set proper permissions
chmod 600 /var/log/agentweave/audit.log
chown agentweave:agentweave /var/log/agentweave/audit.log
# Prevent modification
chattr +a /var/log/agentweave/audit.log # Append-only
Encrypt Logs in Transit
Use TLS for syslog:
1
2
3
4
5
observability:
audit_log:
destination: "syslog"
syslog_protocol: "tls"
syslog_tls_verify: true
Sign Logs
For tamper-evidence, consider log signing:
1
2
3
4
5
6
observability:
audit_log:
signing:
enabled: true
key_path: "/etc/agentweave/signing-key.pem"
algorithm: "RS256"
Each log entry includes signature:
1
2
3
4
5
6
7
{
"timestamp": "2024-01-15T10:30:00.123Z",
"event_type": "authorization",
"caller_spiffe_id": "spiffe://example.com/agent/api-gateway",
// ... other fields ...
"signature": "eyJhbGciOiJSUzI1NiIs..."
}
Immutable Storage
Use write-once storage for compliance:
- AWS S3: Object Lock
- GCP: Bucket lock
- Azure: Immutable blob storage
AWS S3 Example:
1
2
3
4
5
6
7
8
9
10
11
aws s3api put-object-lock-configuration \
--bucket agentweave-audit-archive \
--object-lock-configuration '{
"ObjectLockEnabled": "Enabled",
"Rule": {
"DefaultRetention": {
"Mode": "COMPLIANCE",
"Years": 7
}
}
}'
Best Practices
Do's
✅ Enable audit logging in production
1
2
3
observability:
audit_log:
enabled: true
✅ Send logs to centralized SIEM
1
2
3
4
observability:
audit_log:
destination: "syslog"
syslog_address: "siem.example.com:514"
✅ Configure appropriate retention
1
2
3
observability:
audit_log:
retention_days: 2555 # 7 years for HIPAA
✅ Set up automated alerts
1
# Prometheus alerts, SIEM alerts, etc.
✅ Review logs regularly
- Daily: Security events
- Weekly: Access patterns
- Monthly: Compliance reports
✅ Test log pipeline
1
2
# Ensure logs are reaching SIEM
agentweave test-audit-log
Don'ts
❌ Don't log sensitive payloads
1
2
3
observability:
audit_log:
include_payloads: false # Keep this false!
❌ Don't use only local file logging in production
1
2
3
4
5
6
7
8
9
# ❌ Bad for production
observability:
audit_log:
destination: "file"
# ✅ Good for production
observability:
audit_log:
destination: "syslog"
❌ Don't ignore log volume
- Monitor log volume metrics
- Set up alerts for unusual volume
- Have capacity planning
❌ Don't forget log security
- Encrypt in transit (TLS)
- Protect access (RBAC)
- Prevent tampering (immutable storage)
Troubleshooting
Logs Not Appearing
Check agent logs:
1
kubectl logs -n agentweave pod/data-processor-abc123 | grep audit
Verify configuration:
1
agentweave validate config/production.yaml
Test connectivity:
1
2
3
4
5
# Syslog
nc -zv logs.example.com 514
# HTTPS
curl -I https://splunk.example.com:8088
High Log Volume
Reduce verbosity:
1
2
3
observability:
audit_log:
level: "warning" # Instead of "info"
Filter events:
1
2
3
4
5
observability:
audit_log:
exclude_events:
- "health_check"
- "heartbeat"
Sample logs:
1
2
3
4
5
observability:
audit_log:
sampling:
enabled: true
rate: 0.1 # Log 10% of events
Summary
Audit logging provides:
- Security monitoring: Detect attacks and anomalies
- Compliance: Evidence for auditors
- Forensics: Investigate incidents
- Operational insights: Understand access patterns
Key Recommendations:
- Enable audit logging in production
- Send logs to centralized SIEM
- Configure retention per compliance requirements
- Set up automated alerts
- Review logs regularly
- Protect logs from tampering
Next Steps
- Configure audit logging: See Configuration Reference
- Set up monitoring: See Observability Guide
- Review compliance: See Compliance
- Understand threats: See Threat Model