# Production Deployment Documentation - Distributed Queue System ## Übersicht Diese Dokumentation beschreibt die Produktions-Bereitstellung des Distributed Queue Processing Systems des Custom PHP Frameworks. ## Systemvoraussetzungen ### Mindestanforderungen - **PHP**: 8.3+ - **MySQL/PostgreSQL**: 8.0+ / 13+ - **Redis**: 7.0+ (optional, für Redis-basierte Queues) - **RAM**: 2GB pro Worker-Node - **CPU**: 2 Cores pro Worker-Node - **Festplatte**: 10GB für Logs und temporäre Dateien ### Empfohlene Produktionsumgebung - **Load Balancer**: Nginx/HAProxy - **Database**: MySQL 8.0+ mit Master-Slave Setup - **Caching**: Redis Cluster für High Availability - **Monitoring**: Prometheus + Grafana - **Logging**: ELK Stack (Elasticsearch, Logstash, Kibana) ## Deployment-Schritte ### 1. Database Setup ```bash # Migrationen ausführen php console.php db:migrate # Verify migrations php console.php db:status ``` **Erwartete Tabellen:** - `queue_workers` - Worker Registration - `distributed_locks` - Distributed Locking - `job_assignments` - Job-Worker Assignments - `worker_health_checks` - Worker Health Monitoring - `failover_events` - Failover Event Tracking ### 2. Environment Configuration **Produktions-Environment (.env.production):** ```bash # Database Configuration DB_HOST=production-db-cluster DB_PORT=3306 DB_NAME=framework_production DB_USER=queue_user DB_PASS=secure_production_password # Queue Configuration QUEUE_DRIVER=database QUEUE_DEFAULT=default # Worker Configuration WORKER_HEALTH_CHECK_INTERVAL=30 WORKER_REGISTRATION_TTL=300 FAILOVER_CHECK_INTERVAL=60 # Performance Tuning DB_POOL_SIZE=20 DB_MAX_IDLE_TIME=3600 CACHE_TTL=3600 # Monitoring LOG_LEVEL=info PERFORMANCE_MONITORING=true HEALTH_CHECK_ENDPOINT=/health ``` ### 3. Worker Node Deployment **Docker Compose für Worker Node:** ```yaml version: '3.8' services: queue-worker: image: custom-php-framework:production environment: - NODE_ROLE=worker - WORKER_QUEUES=default,emails,reports - WORKER_CONCURRENCY=4 - DB_HOST=${DB_HOST} - DB_NAME=${DB_NAME} - DB_USER=${DB_USER} - DB_PASS=${DB_PASS} command: php console.php worker:start restart: unless-stopped deploy: replicas: 3 resources: limits: memory: 2G cpus: '2' reservations: memory: 1G cpus: '1' healthcheck: test: ["CMD", "php", "console.php", "worker:health"] interval: 30s timeout: 10s retries: 3 start_period: 10s ``` ### 4. Load Balancer Configuration **Nginx Configuration:** ```nginx upstream queue_workers { least_conn; server worker-node-1:80 max_fails=3 fail_timeout=30s; server worker-node-2:80 max_fails=3 fail_timeout=30s; server worker-node-3:80 max_fails=3 fail_timeout=30s; } server { listen 80; server_name queue.production.example.com; location /health { proxy_pass http://queue_workers/health; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; access_log off; } location /admin/queue { proxy_pass http://queue_workers/admin/queue; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; # Admin-Access nur von internen IPs allow 10.0.0.0/8; allow 172.16.0.0/12; allow 192.168.0.0/16; deny all; } } ``` ### 5. Monitoring Setup **Prometheus Metrics Configuration:** ```yaml # prometheus.yml global: scrape_interval: 15s scrape_configs: - job_name: 'queue-workers' static_configs: - targets: ['worker-node-1:9090', 'worker-node-2:9090', 'worker-node-3:9090'] metrics_path: '/metrics' scrape_interval: 30s - job_name: 'queue-system' static_configs: - targets: ['queue.production.example.com:80'] metrics_path: '/admin/metrics' scrape_interval: 60s ``` **Grafana Dashboard Queries:** ```promql # Worker Health Status sum(rate(worker_health_checks_total[5m])) by (status) # Job Processing Rate rate(jobs_processed_total[5m]) # Queue Length queue_length{queue_name=~".*"} # Worker CPU Usage worker_cpu_usage_percent # Database Connection Pool db_connection_pool_active / db_connection_pool_max * 100 ``` ## Operational Commands ### Worker Management ```bash # Worker starten php console.php worker:start --queues=default,emails --concurrency=4 # Worker-Status prüfen php console.php worker:list # Worker beenden (graceful shutdown) php console.php worker:stop --worker-id=worker_123 # Alle Worker beenden php console.php worker:stop-all # Worker-Gesundheitscheck php console.php worker:health # Failover-Recovery ausführen php console.php worker:failover-recovery # Worker deregistrieren php console.php worker:deregister --worker-id=worker_123 # Worker-Statistiken php console.php worker:stats ``` ### System Monitoring ```bash # System-Health Check curl -f http://queue.production.example.com/health # Worker-Status API curl http://queue.production.example.com/admin/queue/workers # Queue-Statistiken curl http://queue.production.example.com/admin/queue/stats # Metrics-Endpoint curl http://queue.production.example.com/admin/metrics ``` ## Performance Tuning ### Database Optimization **MySQL Configuration (my.cnf):** ```ini [mysqld] # InnoDB Settings für Queue-System innodb_buffer_pool_size = 2G innodb_log_file_size = 512M innodb_flush_log_at_trx_commit = 2 innodb_lock_wait_timeout = 5 # Connection Settings max_connections = 500 max_connect_errors = 100000 connect_timeout = 10 wait_timeout = 28800 # Query Cache (für Read-Heavy Workloads) query_cache_type = 1 query_cache_size = 256M query_cache_limit = 1M ``` **Empfohlene Indizes:** ```sql -- Queue Workers Performance CREATE INDEX idx_worker_status_updated ON queue_workers(status, updated_at); CREATE INDEX idx_worker_queues ON queue_workers(queues(255)); -- Distributed Locks Performance CREATE INDEX idx_lock_expires_worker ON distributed_locks(expires_at, worker_id); -- Job Assignments Performance CREATE INDEX idx_assignment_worker_time ON job_assignments(worker_id, assigned_at); CREATE INDEX idx_assignment_queue_time ON job_assignments(queue_name, assigned_at); -- Health Checks Performance CREATE INDEX idx_health_worker_time ON worker_health_checks(worker_id, checked_at); CREATE INDEX idx_health_status_time ON worker_health_checks(status, checked_at); -- Failover Events Performance CREATE INDEX idx_failover_worker_time ON failover_events(failed_worker_id, failover_at); CREATE INDEX idx_failover_event_type ON failover_events(event_type, failover_at); ``` ### Application Performance **PHP Configuration (php.ini):** ```ini ; Memory Limits memory_limit = 2G max_execution_time = 300 ; OPcache für Production opcache.enable = 1 opcache.memory_consumption = 256 opcache.max_accelerated_files = 20000 opcache.validate_timestamps = 0 ; Session (nicht für Worker benötigt) session.auto_start = 0 ``` **Worker Concurrency Tuning:** ```bash # Leichte Jobs (E-Mails, Notifications) php console.php worker:start --concurrency=8 # Schwere Jobs (Reports, Exports) php console.php worker:start --concurrency=2 # Mixed Workload php console.php worker:start --concurrency=4 ``` ## Security Configuration ### Network Security **Firewall Rules (iptables):** ```bash # Worker-Nodes untereinander (Health Checks) iptables -A INPUT -p tcp --dport 8080 -s 10.0.1.0/24 -j ACCEPT # Database Access nur von Worker-Nodes iptables -A INPUT -p tcp --dport 3306 -s 10.0.1.0/24 -j ACCEPT # Redis Access (falls verwendet) iptables -A INPUT -p tcp --dport 6379 -s 10.0.1.0/24 -j ACCEPT # Admin Interface nur von Management Network iptables -A INPUT -p tcp --dport 80 -s 10.0.0.0/24 -j ACCEPT ``` ### Database Security ```sql -- Dedicated Queue User mit minimalen Rechten CREATE USER 'queue_user'@'10.0.1.%' IDENTIFIED BY 'secure_production_password'; GRANT SELECT, INSERT, UPDATE, DELETE ON framework_production.queue_workers TO 'queue_user'@'10.0.1.%'; GRANT SELECT, INSERT, UPDATE, DELETE ON framework_production.distributed_locks TO 'queue_user'@'10.0.1.%'; GRANT SELECT, INSERT, UPDATE, DELETE ON framework_production.job_assignments TO 'queue_user'@'10.0.1.%'; GRANT SELECT, INSERT, UPDATE, DELETE ON framework_production.worker_health_checks TO 'queue_user'@'10.0.1.%'; GRANT SELECT, INSERT, UPDATE, DELETE ON framework_production.failover_events TO 'queue_user'@'10.0.1.%'; FLUSH PRIVILEGES; ``` ## Disaster Recovery ### Backup Strategy **Database Backup:** ```bash #!/bin/bash # daily-queue-backup.sh DATE=$(date +%Y%m%d_%H%M%S) BACKUP_DIR="/backup/queue-system" # Backup nur Queue-relevante Tabellen mysqldump --single-transaction \ --routines --triggers \ --tables queue_workers distributed_locks job_assignments worker_health_checks failover_events \ framework_production > $BACKUP_DIR/queue_backup_$DATE.sql # Retention: 30 Tage find $BACKUP_DIR -name "queue_backup_*.sql" -mtime +30 -delete ``` **Worker State Backup:** ```bash # Worker-Konfiguration sichern php console.php worker:export-config > /backup/worker-config-$(date +%Y%m%d).json ``` ### Recovery Procedures **Database Recovery:** ```bash # Tabellen wiederherstellen mysql framework_production < /backup/queue_backup_YYYYMMDD_HHMMSS.sql # Migrationen prüfen php console.php db:status # Worker neu starten docker-compose restart queue-worker ``` **Worker Recovery:** ```bash # Crashed Worker aufräumen php console.php worker:cleanup-crashed # Failover für verlorene Jobs php console.php worker:failover-recovery # Worker-Pool neu starten docker-compose up -d --scale queue-worker=3 ``` ## Monitoring & Alerting ### Critical Alerts **Prometheus Alert Rules (alerts.yml):** ```yaml groups: - name: queue-system rules: - alert: WorkerDown expr: up{job="queue-workers"} == 0 for: 2m labels: severity: critical annotations: summary: "Queue worker {{ $labels.instance }} is down" - alert: HighJobBacklog expr: queue_length > 1000 for: 5m labels: severity: warning annotations: summary: "High job backlog in queue {{ $labels.queue_name }}" - alert: WorkerHealthFailing expr: rate(worker_health_checks_failed_total[5m]) > 0.1 for: 2m labels: severity: critical annotations: summary: "Worker health checks failing" - alert: DatabaseConnectionExhaustion expr: db_connection_pool_active / db_connection_pool_max > 0.9 for: 1m labels: severity: critical annotations: summary: "Database connection pool near exhaustion" ``` ### Log Monitoring **Logstash Configuration:** ```ruby input { file { path => "/var/log/queue-workers/*.log" type => "queue-worker" codec => json } } filter { if [type] == "queue-worker" { if [level] == "ERROR" or [level] == "CRITICAL" { mutate { add_tag => ["alert"] } } } } output { if "alert" in [tags] { email { to => "ops-team@example.com" subject => "Queue System Alert: %{[level]} in %{[component]}" body => "Error: %{[message]}\nContext: %{[context]}" } } elasticsearch { hosts => ["elasticsearch:9200"] index => "queue-logs-%{+YYYY.MM.dd}" } } ``` ## Maintenance Procedures ### Routine Maintenance **Weekly Maintenance Script:** ```bash #!/bin/bash # weekly-queue-maintenance.sh echo "Starting weekly queue system maintenance..." # Cleanup alte Health Check Einträge (> 30 Tage) php console.php queue:cleanup-health-checks --days=30 # Cleanup alte Failover Events (> 90 Tage) php console.php queue:cleanup-failover-events --days=90 # Cleanup verwaiste Job Assignments php console.php queue:cleanup-orphaned-assignments # Database Statistiken aktualisieren mysql framework_production -e "ANALYZE TABLE queue_workers, distributed_locks, job_assignments, worker_health_checks, failover_events;" # Worker-Pool Health Check php console.php worker:validate-pool echo "Weekly maintenance completed." ``` **Rolling Updates:** ```bash #!/bin/bash # rolling-update-workers.sh WORKER_NODES=("worker-node-1" "worker-node-2" "worker-node-3") for node in "${WORKER_NODES[@]}"; do echo "Updating $node..." # Graceful Shutdown ssh $node "docker exec queue-worker php console.php worker:stop --graceful" # Wait for shutdown sleep 30 # Update Container ssh $node "docker-compose pull && docker-compose up -d" # Wait for startup sleep 60 # Health Check ssh $node "docker exec queue-worker php console.php worker:health" || exit 1 echo "$node updated successfully" done echo "Rolling update completed." ``` ## Troubleshooting ### Common Issues **1. Worker startet nicht:** ```bash # Check Database Connection php console.php db:status # Check Worker Configuration php console.php worker:validate-config # Check System Resources free -h && df -h # Check Docker Logs docker-compose logs queue-worker ``` **2. Jobs bleiben in der Queue hängen:** ```bash # Check Worker Status php console.php worker:list # Check Distributed Locks php console.php queue:list-locks # Force Failover Recovery php console.php worker:failover-recovery --force # Clear Stale Locks php console.php queue:clear-stale-locks ``` **3. Performance Issues:** ```bash # Check Database Performance php console.php db:performance-stats # Check Worker Resource Usage php console.php worker:resource-stats # Analyze Slow Queries mysql framework_production -e "SELECT * FROM performance_schema.events_statements_summary_by_digest ORDER BY avg_timer_wait DESC LIMIT 10;" ``` ### Emergency Procedures **Worker Pool Restart:** ```bash # Graceful restart aller Worker docker-compose exec queue-worker php console.php worker:stop-all --graceful # Wait for shutdown sleep 60 # Restart Container docker-compose restart queue-worker # Verify restart php console.php worker:list ``` **Database Failover:** ```bash # Switch to backup database sed -i 's/DB_HOST=primary-db/DB_HOST=backup-db/' .env.production # Restart worker pool docker-compose restart queue-worker # Verify connection php console.php db:status ``` ## Performance Benchmarks Basierend auf den Performance-Tests sind folgende Benchmarks zu erwarten: ### Einzelner Worker - **Job Distribution**: < 10ms pro Job - **Worker Selection**: < 5ms pro Auswahl - **Lock Acquisition**: < 2ms pro Lock - **Health Check**: < 1ms pro Check ### Multi-Worker Setup (3 Worker) - **Throughput**: 500+ Jobs/Sekunde - **Load Balancing**: Gleichmäßige Verteilung ±5% - **Failover Time**: < 30 Sekunden - **Resource Usage**: < 80% CPU bei Vollast ### Database Performance - **Connection Pool**: 95%+ Effizienz - **Query Response**: < 10ms für Standard-Operationen - **Lock Contention**: < 1% bei normalem Load Diese Dokumentation sollte als Grundlage für die Produktions-Bereitstellung dienen und regelmäßig aktualisiert werden, basierend auf operativen Erfahrungen.