# Production Logging Configuration Comprehensive production logging setup with performance optimization, resilience, and observability features. ## Overview The framework provides production-ready logging configurations optimized for different deployment scenarios: - **Standard Production**: Balanced configuration for typical production workloads - **High Performance**: Optimized for high-throughput applications with sampling - **Production with Aggregation**: Reduces log volume by 70-90% while preserving critical logs - **Debug**: Temporary configuration for production troubleshooting - **Staging**: Development-friendly configuration for staging environments ## Configuration Options ### 1. Standard Production (Recommended) **Use Case**: Default production setup for most applications **Features**: - Resilient logging with automatic fallback - Buffered writes for performance (100 entries, 5s flush) - 14-day rotating log files - Structured logs with request/trace context - INFO level and above - Performance metrics included **Setup**: ```php use App\Framework\Logging\ProductionLogConfig; $logConfig = ProductionLogConfig::production( logPath: '/var/log/app', requestIdGenerator: $container->get(RequestIdGenerator::class) ); ``` **Log Files**: - `/var/log/app/app.log` - Primary application logs (14 days retention) - `/var/log/app/fallback.log` - Fallback when primary fails (7 days retention) **Performance**: - Write Latency: <1ms (buffered) - Throughput: 10,000+ logs/second - Disk I/O: Minimized via buffering ### 2. High Performance with Sampling **Use Case**: High-traffic applications (>100 req/s) where log volume is critical **Features**: - Intelligent sampling reduces volume by 80-90% - Always logs ERROR and CRITICAL - Larger buffer (500 entries, 10s flush) - Minimal processors for reduced overhead **Setup**: ```php $logConfig = ProductionLogConfig::highPerformance( logPath: '/var/log/app', requestIdGenerator: $container->get(RequestIdGenerator::class) ); ``` **Sampling Strategy**: ``` DEBUG: 5% sampled (1 in 20) INFO: 10% sampled (1 in 10) WARNING: 50% sampled (1 in 2) ERROR: 100% (always logged) CRITICAL: 100% (always logged) ``` **Performance**: - Write Latency: <0.5ms (buffered + sampling) - Throughput: 50,000+ logs/second - Disk I/O: 80% reduction vs. standard ### 3. Production with Aggregation (High Volume) **Use Case**: Applications with repetitive log patterns (e.g., API gateways, proxies) **Features**: - Aggregates identical messages over time window - Reduces log volume by 70-90% - Preserves all ERROR and CRITICAL logs - Aggregation summary logged periodically **Setup**: ```php $logConfig = ProductionLogConfig::productionWithAggregation( logPath: '/var/log/app', requestIdGenerator: $container->get(RequestIdGenerator::class) ); ``` **Aggregation Example**: ``` Before Aggregation (1000 entries): [INFO] User login successful (x1000) After Aggregation (1 entry): [INFO] User login successful (count: 1000, first: 2025-01-15 10:00:00, last: 2025-01-15 10:05:00) ``` **Performance**: - Write Latency: <1ms - Throughput: 20,000+ logs/second - Disk I/O: 70-90% reduction - Aggregation Window: 60 seconds ### 4. Debug Configuration **Use Case**: Temporary production debugging (short-term troubleshooting) **Features**: - DEBUG level enabled - Smaller buffer for faster feedback (50 entries, 2s flush) - Extensive performance metrics - 3-day retention (auto-cleanup) **Setup**: ```php $logConfig = ProductionLogConfig::debug(logPath: '/var/log/app'); ``` **⚠️ Warning**: High overhead - use sparingly and disable after debugging **Performance Impact**: - 5-10x higher log volume - 2-3ms write latency - Increased disk I/O ### 5. Staging Environment **Use Case**: Pre-production staging environment **Features**: - DEBUG level for development visibility - Production-like resilience features - 7-day retention - Full processor stack for testing **Setup**: ```php $logConfig = ProductionLogConfig::staging(logPath: '/var/log/app'); ``` ## Integration with Application ### Environment-Based Configuration ```php use App\Framework\Config\Environment; use App\Framework\Config\EnvKey; use App\Framework\Logging\ProductionLogConfig; $env = $container->get(Environment::class); $logPath = $env->get(EnvKey::LOG_PATH, '/var/log/app'); $logConfig = match ($env->get(EnvKey::APP_ENV)) { 'production' => ProductionLogConfig::productionWithAggregation( logPath: $logPath, requestIdGenerator: $container->get(RequestIdGenerator::class) ), 'staging' => ProductionLogConfig::staging($logPath), 'debug' => ProductionLogConfig::debug($logPath), default => ProductionLogConfig::production( logPath: $logPath, requestIdGenerator: $container->get(RequestIdGenerator::class) ) }; // Register in DI container $container->singleton(LogConfig::class, $logConfig); ``` ### Environment Variables ```env # .env.production APP_ENV=production LOG_PATH=/var/log/app LOG_LEVEL=INFO LOG_ENABLE_SAMPLING=true LOG_ENABLE_AGGREGATION=true LOG_BUFFER_SIZE=100 LOG_FLUSH_INTERVAL=5 # .env.staging APP_ENV=staging LOG_PATH=/var/log/app LOG_LEVEL=DEBUG LOG_ENABLE_SAMPLING=false LOG_ENABLE_AGGREGATION=false ``` ## Log Rotation and Retention All production configurations use rotating file handlers: **Rotation Strategy**: - Daily rotation at midnight - Compressed archives (.gz) for old logs - Automatic cleanup of old files **Retention Policies**: ``` Production: 14 days (app.log), 7 days (fallback.log) High Perf: 7 days (app.log), 3 days (fallback.log) Debug: 3 days (debug.log), 1 day (fallback.log) Staging: 7 days (staging.log), 3 days (fallback.log) ``` **Disk Space Requirements**: - Standard Production: ~2-5 GB (14 days) - With Sampling: ~500 MB (7 days) - With Aggregation: ~300 MB (14 days) ## Log Format ### Structured JSON Logs All production configurations output structured JSON: ```json { "timestamp": "2025-01-15T10:00:00+00:00", "level": "INFO", "message": "User login successful", "context": { "user_id": "12345", "ip_address": "203.0.113.42" }, "request_id": "req_8f3a9b2c1d", "trace_id": "trace_7e4f2a1b", "span_id": "span_9c6d3e8f", "performance": { "memory_mb": 45.2, "execution_time_ms": 23.5 } } ``` ### Log Processors **RequestIdProcessor**: - Adds unique request ID to all logs - Enables request tracing across services - Integration with RequestIdGenerator **TraceContextProcessor**: - Adds distributed tracing context (trace_id, span_id) - OpenTelemetry compatible - Cross-service correlation **PerformanceProcessor**: - Memory usage at log time - Execution time since request start - CPU usage (optional) **MetricsCollectingProcessor** (with Aggregation): - Collects log volume metrics - Error rate tracking - Performance metrics aggregation ## Monitoring and Alerting ### Health Check Integration ```php use App\Framework\Health\Checks\LoggingHealthCheck; // Automatically registered via HealthCheckManagerInitializer // Checks: // - Log files writable // - Disk space available // - No fallback handler activation // - Log handler performance within SLA ``` ### Metrics Exposed **Via `/metrics` endpoint** (Prometheus format): ```prometheus # Log volume log_entries_total{level="info"} 15234 log_entries_total{level="error"} 23 # Log processing performance log_write_duration_seconds{percentile="p50"} 0.001 log_write_duration_seconds{percentile="p95"} 0.003 log_write_duration_seconds{percentile="p99"} 0.005 # Disk usage log_disk_usage_bytes{path="/var/log/app"} 2147483648 log_disk_available_bytes{path="/var/log/app"} 10737418240 # Handler health log_fallback_activations_total 0 log_buffer_full_events_total 2 ``` ### Alerting Rules (Example) ```yaml # Prometheus Alert Rules groups: - name: logging rules: - alert: HighErrorRate expr: rate(log_entries_total{level="error"}[5m]) > 10 for: 5m labels: severity: warning annotations: summary: "High error log rate detected" - alert: LogDiskSpaceLow expr: log_disk_available_bytes / log_disk_usage_bytes < 0.1 for: 10m labels: severity: critical annotations: summary: "Log disk space critically low" - alert: FallbackHandlerActive expr: rate(log_fallback_activations_total[5m]) > 0 for: 1m labels: severity: warning annotations: summary: "Log fallback handler activated" ``` ## Performance Tuning ### Buffer Size Optimization **Small Buffers (50-100)**: Lower latency, more disk I/O **Large Buffers (500-1000)**: Higher throughput, higher memory **Tuning Guide**: ```php // Low latency requirement (<100ms flush) bufferSize: 50, flushIntervalSeconds: 0.1 // Balanced (recommended) bufferSize: 100, flushIntervalSeconds: 5.0 // High throughput (>50k logs/s) bufferSize: 500, flushIntervalSeconds: 10.0 ``` ### Sampling Configuration ```php use App\Framework\Logging\Sampling\SamplingConfig; // Conservative sampling (production default) SamplingConfig::production(); // INFO: 10%, DEBUG: 5% // Aggressive sampling (high load) SamplingConfig::highLoad(); // INFO: 5%, DEBUG: 2% // Custom sampling new SamplingConfig( debugRate: 0.01, // 1% infoRate: 0.05, // 5% warningRate: 0.25, // 25% errorRate: 1.0, // 100% criticalRate: 1.0 // 100% ); ``` ### Aggregation Configuration ```php use App\Framework\Logging\Aggregation\AggregationConfig; // Standard aggregation (1 minute window) AggregationConfig::production(); // Extended aggregation (5 minute window) new AggregationConfig( enabled: true, windowSeconds: 300, minLevel: LogLevel::DEBUG, excludedPatterns: ['Critical error', 'Fatal exception'] ); ``` ## Troubleshooting ### Issue: High Disk Usage **Diagnosis**: ```bash # Check log sizes du -sh /var/log/app/* # Check retention policy ls -lh /var/log/app/app.log* ``` **Solutions**: 1. Enable sampling: `ProductionLogConfig::highPerformance()` 2. Enable aggregation: `ProductionLogConfig::productionWithAggregation()` 3. Reduce retention: Modify `maxFiles` parameter 4. Increase log level: `minLevel: LogLevel::WARNING` ### Issue: Fallback Handler Activated **Diagnosis**: ```bash # Check fallback logs tail -f /var/log/app/fallback.log # Check metrics curl http://localhost/metrics | grep log_fallback ``` **Common Causes**: - Disk full or permissions error - Log file corruption - Handler exception or crash **Solutions**: 1. Check disk space: `df -h /var/log/app` 2. Check permissions: `ls -la /var/log/app` 3. Review error logs: `tail -100 /var/log/app/fallback.log` ### Issue: High Log Write Latency **Diagnosis**: ```bash # Check metrics curl http://localhost/metrics | grep log_write_duration # Check disk I/O iostat -x 5 ``` **Solutions**: 1. Increase buffer size: `bufferSize: 200` 2. Increase flush interval: `flushIntervalSeconds: 10.0` 3. Enable sampling: `ProductionLogConfig::highPerformance()` 4. Use faster disk (SSD recommended) ### Issue: Logs Missing or Incomplete **Diagnosis**: ```bash # Check buffer status curl http://localhost/health/detailed | jq '.checks.logging' # Check flush events curl http://localhost/metrics | grep log_buffer_full_events ``` **Common Causes**: - Application crash before buffer flush - Buffer overflow (logs dropped) - Aggressive sampling configuration **Solutions**: 1. Enable `flushOnError: true` in BufferedLogHandler 2. Reduce buffer size for more frequent flushes 3. Review sampling configuration 4. Check application error logs ## Best Practices ### 1. Use Environment-Specific Configurations ```php // ✅ Good: Environment-aware $logConfig = match ($env) { 'production' => ProductionLogConfig::productionWithAggregation(), 'staging' => ProductionLogConfig::staging(), default => ProductionLogConfig::debug() }; // ❌ Bad: Hardcoded debug in production $logConfig = ProductionLogConfig::debug(); ``` ### 2. Always Include Request Context ```php // ✅ Good: Request ID for tracing $logger->info('User login', [ 'user_id' => $userId, 'request_id' => $request->getRequestId() ]); // ❌ Bad: No context for debugging $logger->info('User login'); ``` ### 3. Use Appropriate Log Levels ```php // ✅ Good: Proper severity levels $logger->debug('Cache miss for key', ['key' => $key]); $logger->info('User logged in', ['user_id' => $userId]); $logger->warning('Rate limit approaching', ['current' => 90, 'limit' => 100]); $logger->error('Payment failed', ['order_id' => $orderId, 'error' => $e->getMessage()]); $logger->critical('Database connection lost', ['attempts' => 3]); // ❌ Bad: Everything as INFO $logger->info('Payment failed'); ``` ### 4. Avoid Logging Sensitive Data ```php // ✅ Good: Masked or excluded $logger->info('Payment processed', [ 'order_id' => $orderId, 'amount' => $amount, 'card_last4' => $card->getLast4(), ]); // ❌ Bad: Sensitive data exposed $logger->info('Payment processed', [ 'credit_card' => $card->getNumber(), 'cvv' => $card->getCvv() ]); ``` ### 5. Monitor Log Health ```php // Set up health checks $healthCheckManager->registerHealthCheck( new LoggingHealthCheck($logger, $logPath) ); // Monitor metrics $metricsCollector->track([ 'log_volume' => $logger->getMetrics()->getTotalLogs(), 'error_rate' => $logger->getMetrics()->getErrorRate(), 'disk_usage' => $diskMonitor->getUsage($logPath) ]); ``` ## Production Checklist ### Pre-Deployment - [ ] Log directory created: `/var/log/app` - [ ] Permissions set: `chown www-data:www-data /var/log/app` - [ ] Disk space allocated: Minimum 5GB free - [ ] Rotation configured: logrotate or built-in rotation - [ ] Environment configured: `.env.production` with correct settings ### Configuration - [ ] Production config selected (standard/sampling/aggregation) - [ ] Request ID generator integrated - [ ] Processors configured appropriately - [ ] Buffer size tuned for workload - [ ] Sampling rates validated (if enabled) - [ ] Aggregation tested (if enabled) ### Monitoring - [ ] Health check endpoint verified: `/health/detailed` - [ ] Metrics endpoint verified: `/metrics` - [ ] Prometheus/monitoring integration tested - [ ] Alert rules configured - [ ] Log aggregation tool configured (ELK, Datadog, etc.) ### Testing - [ ] Log writing tested in production environment - [ ] Fallback handler tested (simulate primary failure) - [ ] Log rotation tested (manual trigger) - [ ] Performance tested under load - [ ] Disk space monitoring tested ## Support and Troubleshooting For issues with production logging: 1. Check health endpoint: `curl http://localhost/health/detailed | jq '.checks.logging'` 2. Check metrics: `curl http://localhost/metrics | grep log_` 3. Review fallback logs: `tail -100 /var/log/app/fallback.log` 4. Verify disk space: `df -h /var/log/app` 5. Check permissions: `ls -la /var/log/app` For further assistance, see: - Framework Documentation: `/docs/claude/guidelines.md` - Error Handling Guide: `/docs/claude/error-handling.md` - Performance Monitoring: `/docs/claude/performance-monitoring.md`