Files
michaelschiemer/docs/queue-deployment.md
Michael Schiemer 5050c7d73a docs: consolidate documentation into organized structure
- Move 12 markdown files from root to docs/ subdirectories
- Organize documentation by category:
  • docs/troubleshooting/ (1 file)  - Technical troubleshooting guides
  • docs/deployment/      (4 files) - Deployment and security documentation
  • docs/guides/          (3 files) - Feature-specific guides
  • docs/planning/        (4 files) - Planning and improvement proposals

Root directory cleanup:
- Reduced from 16 to 4 markdown files in root
- Only essential project files remain:
  • CLAUDE.md (AI instructions)
  • README.md (Main project readme)
  • CLEANUP_PLAN.md (Current cleanup plan)
  • SRC_STRUCTURE_IMPROVEMENTS.md (Structure improvements)

This improves:
 Documentation discoverability
 Logical organization by purpose
 Clean root directory
 Better maintainability
2025-10-05 11:05:04 +02:00

632 lines
15 KiB
Markdown

# Production Deployment Documentation - Distributed Queue System
## Übersicht
Diese Dokumentation beschreibt die Produktions-Bereitstellung des Distributed Queue Processing Systems des Custom PHP Frameworks.
## Systemvoraussetzungen
### Mindestanforderungen
- **PHP**: 8.3+
- **MySQL/PostgreSQL**: 8.0+ / 13+
- **Redis**: 7.0+ (optional, für Redis-basierte Queues)
- **RAM**: 2GB pro Worker-Node
- **CPU**: 2 Cores pro Worker-Node
- **Festplatte**: 10GB für Logs und temporäre Dateien
### Empfohlene Produktionsumgebung
- **Load Balancer**: Nginx/HAProxy
- **Database**: MySQL 8.0+ mit Master-Slave Setup
- **Caching**: Redis Cluster für High Availability
- **Monitoring**: Prometheus + Grafana
- **Logging**: ELK Stack (Elasticsearch, Logstash, Kibana)
## Deployment-Schritte
### 1. Database Setup
```bash
# Migrationen ausführen
php console.php db:migrate
# Verify migrations
php console.php db:status
```
**Erwartete Tabellen:**
- `queue_workers` - Worker Registration
- `distributed_locks` - Distributed Locking
- `job_assignments` - Job-Worker Assignments
- `worker_health_checks` - Worker Health Monitoring
- `failover_events` - Failover Event Tracking
### 2. Environment Configuration
**Produktions-Environment (.env.production):**
```bash
# Database Configuration
DB_HOST=production-db-cluster
DB_PORT=3306
DB_NAME=framework_production
DB_USER=queue_user
DB_PASS=secure_production_password
# Queue Configuration
QUEUE_DRIVER=database
QUEUE_DEFAULT=default
# Worker Configuration
WORKER_HEALTH_CHECK_INTERVAL=30
WORKER_REGISTRATION_TTL=300
FAILOVER_CHECK_INTERVAL=60
# Performance Tuning
DB_POOL_SIZE=20
DB_MAX_IDLE_TIME=3600
CACHE_TTL=3600
# Monitoring
LOG_LEVEL=info
PERFORMANCE_MONITORING=true
HEALTH_CHECK_ENDPOINT=/health
```
### 3. Worker Node Deployment
**Docker Compose für Worker Node:**
```yaml
version: '3.8'
services:
queue-worker:
image: custom-php-framework:production
environment:
- NODE_ROLE=worker
- WORKER_QUEUES=default,emails,reports
- WORKER_CONCURRENCY=4
- DB_HOST=${DB_HOST}
- DB_NAME=${DB_NAME}
- DB_USER=${DB_USER}
- DB_PASS=${DB_PASS}
command: php console.php worker:start
restart: unless-stopped
deploy:
replicas: 3
resources:
limits:
memory: 2G
cpus: '2'
reservations:
memory: 1G
cpus: '1'
healthcheck:
test: ["CMD", "php", "console.php", "worker:health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 10s
```
### 4. Load Balancer Configuration
**Nginx Configuration:**
```nginx
upstream queue_workers {
least_conn;
server worker-node-1:80 max_fails=3 fail_timeout=30s;
server worker-node-2:80 max_fails=3 fail_timeout=30s;
server worker-node-3:80 max_fails=3 fail_timeout=30s;
}
server {
listen 80;
server_name queue.production.example.com;
location /health {
proxy_pass http://queue_workers/health;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
access_log off;
}
location /admin/queue {
proxy_pass http://queue_workers/admin/queue;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
# Admin-Access nur von internen IPs
allow 10.0.0.0/8;
allow 172.16.0.0/12;
allow 192.168.0.0/16;
deny all;
}
}
```
### 5. Monitoring Setup
**Prometheus Metrics Configuration:**
```yaml
# prometheus.yml
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'queue-workers'
static_configs:
- targets: ['worker-node-1:9090', 'worker-node-2:9090', 'worker-node-3:9090']
metrics_path: '/metrics'
scrape_interval: 30s
- job_name: 'queue-system'
static_configs:
- targets: ['queue.production.example.com:80']
metrics_path: '/admin/metrics'
scrape_interval: 60s
```
**Grafana Dashboard Queries:**
```promql
# Worker Health Status
sum(rate(worker_health_checks_total[5m])) by (status)
# Job Processing Rate
rate(jobs_processed_total[5m])
# Queue Length
queue_length{queue_name=~".*"}
# Worker CPU Usage
worker_cpu_usage_percent
# Database Connection Pool
db_connection_pool_active / db_connection_pool_max * 100
```
## Operational Commands
### Worker Management
```bash
# Worker starten
php console.php worker:start --queues=default,emails --concurrency=4
# Worker-Status prüfen
php console.php worker:list
# Worker beenden (graceful shutdown)
php console.php worker:stop --worker-id=worker_123
# Alle Worker beenden
php console.php worker:stop-all
# Worker-Gesundheitscheck
php console.php worker:health
# Failover-Recovery ausführen
php console.php worker:failover-recovery
# Worker deregistrieren
php console.php worker:deregister --worker-id=worker_123
# Worker-Statistiken
php console.php worker:stats
```
### System Monitoring
```bash
# System-Health Check
curl -f http://queue.production.example.com/health
# Worker-Status API
curl http://queue.production.example.com/admin/queue/workers
# Queue-Statistiken
curl http://queue.production.example.com/admin/queue/stats
# Metrics-Endpoint
curl http://queue.production.example.com/admin/metrics
```
## Performance Tuning
### Database Optimization
**MySQL Configuration (my.cnf):**
```ini
[mysqld]
# InnoDB Settings für Queue-System
innodb_buffer_pool_size = 2G
innodb_log_file_size = 512M
innodb_flush_log_at_trx_commit = 2
innodb_lock_wait_timeout = 5
# Connection Settings
max_connections = 500
max_connect_errors = 100000
connect_timeout = 10
wait_timeout = 28800
# Query Cache (für Read-Heavy Workloads)
query_cache_type = 1
query_cache_size = 256M
query_cache_limit = 1M
```
**Empfohlene Indizes:**
```sql
-- Queue Workers Performance
CREATE INDEX idx_worker_status_updated ON queue_workers(status, updated_at);
CREATE INDEX idx_worker_queues ON queue_workers(queues(255));
-- Distributed Locks Performance
CREATE INDEX idx_lock_expires_worker ON distributed_locks(expires_at, worker_id);
-- Job Assignments Performance
CREATE INDEX idx_assignment_worker_time ON job_assignments(worker_id, assigned_at);
CREATE INDEX idx_assignment_queue_time ON job_assignments(queue_name, assigned_at);
-- Health Checks Performance
CREATE INDEX idx_health_worker_time ON worker_health_checks(worker_id, checked_at);
CREATE INDEX idx_health_status_time ON worker_health_checks(status, checked_at);
-- Failover Events Performance
CREATE INDEX idx_failover_worker_time ON failover_events(failed_worker_id, failover_at);
CREATE INDEX idx_failover_event_type ON failover_events(event_type, failover_at);
```
### Application Performance
**PHP Configuration (php.ini):**
```ini
; Memory Limits
memory_limit = 2G
max_execution_time = 300
; OPcache für Production
opcache.enable = 1
opcache.memory_consumption = 256
opcache.max_accelerated_files = 20000
opcache.validate_timestamps = 0
; Session (nicht für Worker benötigt)
session.auto_start = 0
```
**Worker Concurrency Tuning:**
```bash
# Leichte Jobs (E-Mails, Notifications)
php console.php worker:start --concurrency=8
# Schwere Jobs (Reports, Exports)
php console.php worker:start --concurrency=2
# Mixed Workload
php console.php worker:start --concurrency=4
```
## Security Configuration
### Network Security
**Firewall Rules (iptables):**
```bash
# Worker-Nodes untereinander (Health Checks)
iptables -A INPUT -p tcp --dport 8080 -s 10.0.1.0/24 -j ACCEPT
# Database Access nur von Worker-Nodes
iptables -A INPUT -p tcp --dport 3306 -s 10.0.1.0/24 -j ACCEPT
# Redis Access (falls verwendet)
iptables -A INPUT -p tcp --dport 6379 -s 10.0.1.0/24 -j ACCEPT
# Admin Interface nur von Management Network
iptables -A INPUT -p tcp --dport 80 -s 10.0.0.0/24 -j ACCEPT
```
### Database Security
```sql
-- Dedicated Queue User mit minimalen Rechten
CREATE USER 'queue_user'@'10.0.1.%' IDENTIFIED BY 'secure_production_password';
GRANT SELECT, INSERT, UPDATE, DELETE ON framework_production.queue_workers TO 'queue_user'@'10.0.1.%';
GRANT SELECT, INSERT, UPDATE, DELETE ON framework_production.distributed_locks TO 'queue_user'@'10.0.1.%';
GRANT SELECT, INSERT, UPDATE, DELETE ON framework_production.job_assignments TO 'queue_user'@'10.0.1.%';
GRANT SELECT, INSERT, UPDATE, DELETE ON framework_production.worker_health_checks TO 'queue_user'@'10.0.1.%';
GRANT SELECT, INSERT, UPDATE, DELETE ON framework_production.failover_events TO 'queue_user'@'10.0.1.%';
FLUSH PRIVILEGES;
```
## Disaster Recovery
### Backup Strategy
**Database Backup:**
```bash
#!/bin/bash
# daily-queue-backup.sh
DATE=$(date +%Y%m%d_%H%M%S)
BACKUP_DIR="/backup/queue-system"
# Backup nur Queue-relevante Tabellen
mysqldump --single-transaction \
--routines --triggers \
--tables queue_workers distributed_locks job_assignments worker_health_checks failover_events \
framework_production > $BACKUP_DIR/queue_backup_$DATE.sql
# Retention: 30 Tage
find $BACKUP_DIR -name "queue_backup_*.sql" -mtime +30 -delete
```
**Worker State Backup:**
```bash
# Worker-Konfiguration sichern
php console.php worker:export-config > /backup/worker-config-$(date +%Y%m%d).json
```
### Recovery Procedures
**Database Recovery:**
```bash
# Tabellen wiederherstellen
mysql framework_production < /backup/queue_backup_YYYYMMDD_HHMMSS.sql
# Migrationen prüfen
php console.php db:status
# Worker neu starten
docker-compose restart queue-worker
```
**Worker Recovery:**
```bash
# Crashed Worker aufräumen
php console.php worker:cleanup-crashed
# Failover für verlorene Jobs
php console.php worker:failover-recovery
# Worker-Pool neu starten
docker-compose up -d --scale queue-worker=3
```
## Monitoring & Alerting
### Critical Alerts
**Prometheus Alert Rules (alerts.yml):**
```yaml
groups:
- name: queue-system
rules:
- alert: WorkerDown
expr: up{job="queue-workers"} == 0
for: 2m
labels:
severity: critical
annotations:
summary: "Queue worker {{ $labels.instance }} is down"
- alert: HighJobBacklog
expr: queue_length > 1000
for: 5m
labels:
severity: warning
annotations:
summary: "High job backlog in queue {{ $labels.queue_name }}"
- alert: WorkerHealthFailing
expr: rate(worker_health_checks_failed_total[5m]) > 0.1
for: 2m
labels:
severity: critical
annotations:
summary: "Worker health checks failing"
- alert: DatabaseConnectionExhaustion
expr: db_connection_pool_active / db_connection_pool_max > 0.9
for: 1m
labels:
severity: critical
annotations:
summary: "Database connection pool near exhaustion"
```
### Log Monitoring
**Logstash Configuration:**
```ruby
input {
file {
path => "/var/log/queue-workers/*.log"
type => "queue-worker"
codec => json
}
}
filter {
if [type] == "queue-worker" {
if [level] == "ERROR" or [level] == "CRITICAL" {
mutate {
add_tag => ["alert"]
}
}
}
}
output {
if "alert" in [tags] {
email {
to => "ops-team@example.com"
subject => "Queue System Alert: %{[level]} in %{[component]}"
body => "Error: %{[message]}\nContext: %{[context]}"
}
}
elasticsearch {
hosts => ["elasticsearch:9200"]
index => "queue-logs-%{+YYYY.MM.dd}"
}
}
```
## Maintenance Procedures
### Routine Maintenance
**Weekly Maintenance Script:**
```bash
#!/bin/bash
# weekly-queue-maintenance.sh
echo "Starting weekly queue system maintenance..."
# Cleanup alte Health Check Einträge (> 30 Tage)
php console.php queue:cleanup-health-checks --days=30
# Cleanup alte Failover Events (> 90 Tage)
php console.php queue:cleanup-failover-events --days=90
# Cleanup verwaiste Job Assignments
php console.php queue:cleanup-orphaned-assignments
# Database Statistiken aktualisieren
mysql framework_production -e "ANALYZE TABLE queue_workers, distributed_locks, job_assignments, worker_health_checks, failover_events;"
# Worker-Pool Health Check
php console.php worker:validate-pool
echo "Weekly maintenance completed."
```
**Rolling Updates:**
```bash
#!/bin/bash
# rolling-update-workers.sh
WORKER_NODES=("worker-node-1" "worker-node-2" "worker-node-3")
for node in "${WORKER_NODES[@]}"; do
echo "Updating $node..."
# Graceful Shutdown
ssh $node "docker exec queue-worker php console.php worker:stop --graceful"
# Wait for shutdown
sleep 30
# Update Container
ssh $node "docker-compose pull && docker-compose up -d"
# Wait for startup
sleep 60
# Health Check
ssh $node "docker exec queue-worker php console.php worker:health" || exit 1
echo "$node updated successfully"
done
echo "Rolling update completed."
```
## Troubleshooting
### Common Issues
**1. Worker startet nicht:**
```bash
# Check Database Connection
php console.php db:status
# Check Worker Configuration
php console.php worker:validate-config
# Check System Resources
free -h && df -h
# Check Docker Logs
docker-compose logs queue-worker
```
**2. Jobs bleiben in der Queue hängen:**
```bash
# Check Worker Status
php console.php worker:list
# Check Distributed Locks
php console.php queue:list-locks
# Force Failover Recovery
php console.php worker:failover-recovery --force
# Clear Stale Locks
php console.php queue:clear-stale-locks
```
**3. Performance Issues:**
```bash
# Check Database Performance
php console.php db:performance-stats
# Check Worker Resource Usage
php console.php worker:resource-stats
# Analyze Slow Queries
mysql framework_production -e "SELECT * FROM performance_schema.events_statements_summary_by_digest ORDER BY avg_timer_wait DESC LIMIT 10;"
```
### Emergency Procedures
**Worker Pool Restart:**
```bash
# Graceful restart aller Worker
docker-compose exec queue-worker php console.php worker:stop-all --graceful
# Wait for shutdown
sleep 60
# Restart Container
docker-compose restart queue-worker
# Verify restart
php console.php worker:list
```
**Database Failover:**
```bash
# Switch to backup database
sed -i 's/DB_HOST=primary-db/DB_HOST=backup-db/' .env.production
# Restart worker pool
docker-compose restart queue-worker
# Verify connection
php console.php db:status
```
## Performance Benchmarks
Basierend auf den Performance-Tests sind folgende Benchmarks zu erwarten:
### Einzelner Worker
- **Job Distribution**: < 10ms pro Job
- **Worker Selection**: < 5ms pro Auswahl
- **Lock Acquisition**: < 2ms pro Lock
- **Health Check**: < 1ms pro Check
### Multi-Worker Setup (3 Worker)
- **Throughput**: 500+ Jobs/Sekunde
- **Load Balancing**: Gleichmäßige Verteilung ±5%
- **Failover Time**: < 30 Sekunden
- **Resource Usage**: < 80% CPU bei Vollast
### Database Performance
- **Connection Pool**: 95%+ Effizienz
- **Query Response**: < 10ms für Standard-Operationen
- **Lock Contention**: < 1% bei normalem Load
Diese Dokumentation sollte als Grundlage für die Produktions-Bereitstellung dienen und regelmäßig aktualisiert werden, basierend auf operativen Erfahrungen.