- Move 12 markdown files from root to docs/ subdirectories - Organize documentation by category: • docs/troubleshooting/ (1 file) - Technical troubleshooting guides • docs/deployment/ (4 files) - Deployment and security documentation • docs/guides/ (3 files) - Feature-specific guides • docs/planning/ (4 files) - Planning and improvement proposals Root directory cleanup: - Reduced from 16 to 4 markdown files in root - Only essential project files remain: • CLAUDE.md (AI instructions) • README.md (Main project readme) • CLEANUP_PLAN.md (Current cleanup plan) • SRC_STRUCTURE_IMPROVEMENTS.md (Structure improvements) This improves: ✅ Documentation discoverability ✅ Logical organization by purpose ✅ Clean root directory ✅ Better maintainability
632 lines
15 KiB
Markdown
632 lines
15 KiB
Markdown
# Production Deployment Documentation - Distributed Queue System
|
|
|
|
## Übersicht
|
|
|
|
Diese Dokumentation beschreibt die Produktions-Bereitstellung des Distributed Queue Processing Systems des Custom PHP Frameworks.
|
|
|
|
## Systemvoraussetzungen
|
|
|
|
### Mindestanforderungen
|
|
- **PHP**: 8.3+
|
|
- **MySQL/PostgreSQL**: 8.0+ / 13+
|
|
- **Redis**: 7.0+ (optional, für Redis-basierte Queues)
|
|
- **RAM**: 2GB pro Worker-Node
|
|
- **CPU**: 2 Cores pro Worker-Node
|
|
- **Festplatte**: 10GB für Logs und temporäre Dateien
|
|
|
|
### Empfohlene Produktionsumgebung
|
|
- **Load Balancer**: Nginx/HAProxy
|
|
- **Database**: MySQL 8.0+ mit Master-Slave Setup
|
|
- **Caching**: Redis Cluster für High Availability
|
|
- **Monitoring**: Prometheus + Grafana
|
|
- **Logging**: ELK Stack (Elasticsearch, Logstash, Kibana)
|
|
|
|
## Deployment-Schritte
|
|
|
|
### 1. Database Setup
|
|
|
|
```bash
|
|
# Migrationen ausführen
|
|
php console.php db:migrate
|
|
|
|
# Verify migrations
|
|
php console.php db:status
|
|
```
|
|
|
|
**Erwartete Tabellen:**
|
|
- `queue_workers` - Worker Registration
|
|
- `distributed_locks` - Distributed Locking
|
|
- `job_assignments` - Job-Worker Assignments
|
|
- `worker_health_checks` - Worker Health Monitoring
|
|
- `failover_events` - Failover Event Tracking
|
|
|
|
### 2. Environment Configuration
|
|
|
|
**Produktions-Environment (.env.production):**
|
|
```bash
|
|
# Database Configuration
|
|
DB_HOST=production-db-cluster
|
|
DB_PORT=3306
|
|
DB_NAME=framework_production
|
|
DB_USER=queue_user
|
|
DB_PASS=secure_production_password
|
|
|
|
# Queue Configuration
|
|
QUEUE_DRIVER=database
|
|
QUEUE_DEFAULT=default
|
|
|
|
# Worker Configuration
|
|
WORKER_HEALTH_CHECK_INTERVAL=30
|
|
WORKER_REGISTRATION_TTL=300
|
|
FAILOVER_CHECK_INTERVAL=60
|
|
|
|
# Performance Tuning
|
|
DB_POOL_SIZE=20
|
|
DB_MAX_IDLE_TIME=3600
|
|
CACHE_TTL=3600
|
|
|
|
# Monitoring
|
|
LOG_LEVEL=info
|
|
PERFORMANCE_MONITORING=true
|
|
HEALTH_CHECK_ENDPOINT=/health
|
|
```
|
|
|
|
### 3. Worker Node Deployment
|
|
|
|
**Docker Compose für Worker Node:**
|
|
```yaml
|
|
version: '3.8'
|
|
services:
|
|
queue-worker:
|
|
image: custom-php-framework:production
|
|
environment:
|
|
- NODE_ROLE=worker
|
|
- WORKER_QUEUES=default,emails,reports
|
|
- WORKER_CONCURRENCY=4
|
|
- DB_HOST=${DB_HOST}
|
|
- DB_NAME=${DB_NAME}
|
|
- DB_USER=${DB_USER}
|
|
- DB_PASS=${DB_PASS}
|
|
command: php console.php worker:start
|
|
restart: unless-stopped
|
|
deploy:
|
|
replicas: 3
|
|
resources:
|
|
limits:
|
|
memory: 2G
|
|
cpus: '2'
|
|
reservations:
|
|
memory: 1G
|
|
cpus: '1'
|
|
healthcheck:
|
|
test: ["CMD", "php", "console.php", "worker:health"]
|
|
interval: 30s
|
|
timeout: 10s
|
|
retries: 3
|
|
start_period: 10s
|
|
```
|
|
|
|
### 4. Load Balancer Configuration
|
|
|
|
**Nginx Configuration:**
|
|
```nginx
|
|
upstream queue_workers {
|
|
least_conn;
|
|
server worker-node-1:80 max_fails=3 fail_timeout=30s;
|
|
server worker-node-2:80 max_fails=3 fail_timeout=30s;
|
|
server worker-node-3:80 max_fails=3 fail_timeout=30s;
|
|
}
|
|
|
|
server {
|
|
listen 80;
|
|
server_name queue.production.example.com;
|
|
|
|
location /health {
|
|
proxy_pass http://queue_workers/health;
|
|
proxy_set_header Host $host;
|
|
proxy_set_header X-Real-IP $remote_addr;
|
|
access_log off;
|
|
}
|
|
|
|
location /admin/queue {
|
|
proxy_pass http://queue_workers/admin/queue;
|
|
proxy_set_header Host $host;
|
|
proxy_set_header X-Real-IP $remote_addr;
|
|
|
|
# Admin-Access nur von internen IPs
|
|
allow 10.0.0.0/8;
|
|
allow 172.16.0.0/12;
|
|
allow 192.168.0.0/16;
|
|
deny all;
|
|
}
|
|
}
|
|
```
|
|
|
|
### 5. Monitoring Setup
|
|
|
|
**Prometheus Metrics Configuration:**
|
|
```yaml
|
|
# prometheus.yml
|
|
global:
|
|
scrape_interval: 15s
|
|
|
|
scrape_configs:
|
|
- job_name: 'queue-workers'
|
|
static_configs:
|
|
- targets: ['worker-node-1:9090', 'worker-node-2:9090', 'worker-node-3:9090']
|
|
metrics_path: '/metrics'
|
|
scrape_interval: 30s
|
|
|
|
- job_name: 'queue-system'
|
|
static_configs:
|
|
- targets: ['queue.production.example.com:80']
|
|
metrics_path: '/admin/metrics'
|
|
scrape_interval: 60s
|
|
```
|
|
|
|
**Grafana Dashboard Queries:**
|
|
```promql
|
|
# Worker Health Status
|
|
sum(rate(worker_health_checks_total[5m])) by (status)
|
|
|
|
# Job Processing Rate
|
|
rate(jobs_processed_total[5m])
|
|
|
|
# Queue Length
|
|
queue_length{queue_name=~".*"}
|
|
|
|
# Worker CPU Usage
|
|
worker_cpu_usage_percent
|
|
|
|
# Database Connection Pool
|
|
db_connection_pool_active / db_connection_pool_max * 100
|
|
```
|
|
|
|
## Operational Commands
|
|
|
|
### Worker Management
|
|
|
|
```bash
|
|
# Worker starten
|
|
php console.php worker:start --queues=default,emails --concurrency=4
|
|
|
|
# Worker-Status prüfen
|
|
php console.php worker:list
|
|
|
|
# Worker beenden (graceful shutdown)
|
|
php console.php worker:stop --worker-id=worker_123
|
|
|
|
# Alle Worker beenden
|
|
php console.php worker:stop-all
|
|
|
|
# Worker-Gesundheitscheck
|
|
php console.php worker:health
|
|
|
|
# Failover-Recovery ausführen
|
|
php console.php worker:failover-recovery
|
|
|
|
# Worker deregistrieren
|
|
php console.php worker:deregister --worker-id=worker_123
|
|
|
|
# Worker-Statistiken
|
|
php console.php worker:stats
|
|
```
|
|
|
|
### System Monitoring
|
|
|
|
```bash
|
|
# System-Health Check
|
|
curl -f http://queue.production.example.com/health
|
|
|
|
# Worker-Status API
|
|
curl http://queue.production.example.com/admin/queue/workers
|
|
|
|
# Queue-Statistiken
|
|
curl http://queue.production.example.com/admin/queue/stats
|
|
|
|
# Metrics-Endpoint
|
|
curl http://queue.production.example.com/admin/metrics
|
|
```
|
|
|
|
## Performance Tuning
|
|
|
|
### Database Optimization
|
|
|
|
**MySQL Configuration (my.cnf):**
|
|
```ini
|
|
[mysqld]
|
|
# InnoDB Settings für Queue-System
|
|
innodb_buffer_pool_size = 2G
|
|
innodb_log_file_size = 512M
|
|
innodb_flush_log_at_trx_commit = 2
|
|
innodb_lock_wait_timeout = 5
|
|
|
|
# Connection Settings
|
|
max_connections = 500
|
|
max_connect_errors = 100000
|
|
connect_timeout = 10
|
|
wait_timeout = 28800
|
|
|
|
# Query Cache (für Read-Heavy Workloads)
|
|
query_cache_type = 1
|
|
query_cache_size = 256M
|
|
query_cache_limit = 1M
|
|
```
|
|
|
|
**Empfohlene Indizes:**
|
|
```sql
|
|
-- Queue Workers Performance
|
|
CREATE INDEX idx_worker_status_updated ON queue_workers(status, updated_at);
|
|
CREATE INDEX idx_worker_queues ON queue_workers(queues(255));
|
|
|
|
-- Distributed Locks Performance
|
|
CREATE INDEX idx_lock_expires_worker ON distributed_locks(expires_at, worker_id);
|
|
|
|
-- Job Assignments Performance
|
|
CREATE INDEX idx_assignment_worker_time ON job_assignments(worker_id, assigned_at);
|
|
CREATE INDEX idx_assignment_queue_time ON job_assignments(queue_name, assigned_at);
|
|
|
|
-- Health Checks Performance
|
|
CREATE INDEX idx_health_worker_time ON worker_health_checks(worker_id, checked_at);
|
|
CREATE INDEX idx_health_status_time ON worker_health_checks(status, checked_at);
|
|
|
|
-- Failover Events Performance
|
|
CREATE INDEX idx_failover_worker_time ON failover_events(failed_worker_id, failover_at);
|
|
CREATE INDEX idx_failover_event_type ON failover_events(event_type, failover_at);
|
|
```
|
|
|
|
### Application Performance
|
|
|
|
**PHP Configuration (php.ini):**
|
|
```ini
|
|
; Memory Limits
|
|
memory_limit = 2G
|
|
max_execution_time = 300
|
|
|
|
; OPcache für Production
|
|
opcache.enable = 1
|
|
opcache.memory_consumption = 256
|
|
opcache.max_accelerated_files = 20000
|
|
opcache.validate_timestamps = 0
|
|
|
|
; Session (nicht für Worker benötigt)
|
|
session.auto_start = 0
|
|
```
|
|
|
|
**Worker Concurrency Tuning:**
|
|
```bash
|
|
# Leichte Jobs (E-Mails, Notifications)
|
|
php console.php worker:start --concurrency=8
|
|
|
|
# Schwere Jobs (Reports, Exports)
|
|
php console.php worker:start --concurrency=2
|
|
|
|
# Mixed Workload
|
|
php console.php worker:start --concurrency=4
|
|
```
|
|
|
|
## Security Configuration
|
|
|
|
### Network Security
|
|
|
|
**Firewall Rules (iptables):**
|
|
```bash
|
|
# Worker-Nodes untereinander (Health Checks)
|
|
iptables -A INPUT -p tcp --dport 8080 -s 10.0.1.0/24 -j ACCEPT
|
|
|
|
# Database Access nur von Worker-Nodes
|
|
iptables -A INPUT -p tcp --dport 3306 -s 10.0.1.0/24 -j ACCEPT
|
|
|
|
# Redis Access (falls verwendet)
|
|
iptables -A INPUT -p tcp --dport 6379 -s 10.0.1.0/24 -j ACCEPT
|
|
|
|
# Admin Interface nur von Management Network
|
|
iptables -A INPUT -p tcp --dport 80 -s 10.0.0.0/24 -j ACCEPT
|
|
```
|
|
|
|
### Database Security
|
|
|
|
```sql
|
|
-- Dedicated Queue User mit minimalen Rechten
|
|
CREATE USER 'queue_user'@'10.0.1.%' IDENTIFIED BY 'secure_production_password';
|
|
|
|
GRANT SELECT, INSERT, UPDATE, DELETE ON framework_production.queue_workers TO 'queue_user'@'10.0.1.%';
|
|
GRANT SELECT, INSERT, UPDATE, DELETE ON framework_production.distributed_locks TO 'queue_user'@'10.0.1.%';
|
|
GRANT SELECT, INSERT, UPDATE, DELETE ON framework_production.job_assignments TO 'queue_user'@'10.0.1.%';
|
|
GRANT SELECT, INSERT, UPDATE, DELETE ON framework_production.worker_health_checks TO 'queue_user'@'10.0.1.%';
|
|
GRANT SELECT, INSERT, UPDATE, DELETE ON framework_production.failover_events TO 'queue_user'@'10.0.1.%';
|
|
|
|
FLUSH PRIVILEGES;
|
|
```
|
|
|
|
## Disaster Recovery
|
|
|
|
### Backup Strategy
|
|
|
|
**Database Backup:**
|
|
```bash
|
|
#!/bin/bash
|
|
# daily-queue-backup.sh
|
|
|
|
DATE=$(date +%Y%m%d_%H%M%S)
|
|
BACKUP_DIR="/backup/queue-system"
|
|
|
|
# Backup nur Queue-relevante Tabellen
|
|
mysqldump --single-transaction \
|
|
--routines --triggers \
|
|
--tables queue_workers distributed_locks job_assignments worker_health_checks failover_events \
|
|
framework_production > $BACKUP_DIR/queue_backup_$DATE.sql
|
|
|
|
# Retention: 30 Tage
|
|
find $BACKUP_DIR -name "queue_backup_*.sql" -mtime +30 -delete
|
|
```
|
|
|
|
**Worker State Backup:**
|
|
```bash
|
|
# Worker-Konfiguration sichern
|
|
php console.php worker:export-config > /backup/worker-config-$(date +%Y%m%d).json
|
|
```
|
|
|
|
### Recovery Procedures
|
|
|
|
**Database Recovery:**
|
|
```bash
|
|
# Tabellen wiederherstellen
|
|
mysql framework_production < /backup/queue_backup_YYYYMMDD_HHMMSS.sql
|
|
|
|
# Migrationen prüfen
|
|
php console.php db:status
|
|
|
|
# Worker neu starten
|
|
docker-compose restart queue-worker
|
|
```
|
|
|
|
**Worker Recovery:**
|
|
```bash
|
|
# Crashed Worker aufräumen
|
|
php console.php worker:cleanup-crashed
|
|
|
|
# Failover für verlorene Jobs
|
|
php console.php worker:failover-recovery
|
|
|
|
# Worker-Pool neu starten
|
|
docker-compose up -d --scale queue-worker=3
|
|
```
|
|
|
|
## Monitoring & Alerting
|
|
|
|
### Critical Alerts
|
|
|
|
**Prometheus Alert Rules (alerts.yml):**
|
|
```yaml
|
|
groups:
|
|
- name: queue-system
|
|
rules:
|
|
- alert: WorkerDown
|
|
expr: up{job="queue-workers"} == 0
|
|
for: 2m
|
|
labels:
|
|
severity: critical
|
|
annotations:
|
|
summary: "Queue worker {{ $labels.instance }} is down"
|
|
|
|
- alert: HighJobBacklog
|
|
expr: queue_length > 1000
|
|
for: 5m
|
|
labels:
|
|
severity: warning
|
|
annotations:
|
|
summary: "High job backlog in queue {{ $labels.queue_name }}"
|
|
|
|
- alert: WorkerHealthFailing
|
|
expr: rate(worker_health_checks_failed_total[5m]) > 0.1
|
|
for: 2m
|
|
labels:
|
|
severity: critical
|
|
annotations:
|
|
summary: "Worker health checks failing"
|
|
|
|
- alert: DatabaseConnectionExhaustion
|
|
expr: db_connection_pool_active / db_connection_pool_max > 0.9
|
|
for: 1m
|
|
labels:
|
|
severity: critical
|
|
annotations:
|
|
summary: "Database connection pool near exhaustion"
|
|
```
|
|
|
|
### Log Monitoring
|
|
|
|
**Logstash Configuration:**
|
|
```ruby
|
|
input {
|
|
file {
|
|
path => "/var/log/queue-workers/*.log"
|
|
type => "queue-worker"
|
|
codec => json
|
|
}
|
|
}
|
|
|
|
filter {
|
|
if [type] == "queue-worker" {
|
|
if [level] == "ERROR" or [level] == "CRITICAL" {
|
|
mutate {
|
|
add_tag => ["alert"]
|
|
}
|
|
}
|
|
}
|
|
}
|
|
|
|
output {
|
|
if "alert" in [tags] {
|
|
email {
|
|
to => "ops-team@example.com"
|
|
subject => "Queue System Alert: %{[level]} in %{[component]}"
|
|
body => "Error: %{[message]}\nContext: %{[context]}"
|
|
}
|
|
}
|
|
|
|
elasticsearch {
|
|
hosts => ["elasticsearch:9200"]
|
|
index => "queue-logs-%{+YYYY.MM.dd}"
|
|
}
|
|
}
|
|
```
|
|
|
|
## Maintenance Procedures
|
|
|
|
### Routine Maintenance
|
|
|
|
**Weekly Maintenance Script:**
|
|
```bash
|
|
#!/bin/bash
|
|
# weekly-queue-maintenance.sh
|
|
|
|
echo "Starting weekly queue system maintenance..."
|
|
|
|
# Cleanup alte Health Check Einträge (> 30 Tage)
|
|
php console.php queue:cleanup-health-checks --days=30
|
|
|
|
# Cleanup alte Failover Events (> 90 Tage)
|
|
php console.php queue:cleanup-failover-events --days=90
|
|
|
|
# Cleanup verwaiste Job Assignments
|
|
php console.php queue:cleanup-orphaned-assignments
|
|
|
|
# Database Statistiken aktualisieren
|
|
mysql framework_production -e "ANALYZE TABLE queue_workers, distributed_locks, job_assignments, worker_health_checks, failover_events;"
|
|
|
|
# Worker-Pool Health Check
|
|
php console.php worker:validate-pool
|
|
|
|
echo "Weekly maintenance completed."
|
|
```
|
|
|
|
**Rolling Updates:**
|
|
```bash
|
|
#!/bin/bash
|
|
# rolling-update-workers.sh
|
|
|
|
WORKER_NODES=("worker-node-1" "worker-node-2" "worker-node-3")
|
|
|
|
for node in "${WORKER_NODES[@]}"; do
|
|
echo "Updating $node..."
|
|
|
|
# Graceful Shutdown
|
|
ssh $node "docker exec queue-worker php console.php worker:stop --graceful"
|
|
|
|
# Wait for shutdown
|
|
sleep 30
|
|
|
|
# Update Container
|
|
ssh $node "docker-compose pull && docker-compose up -d"
|
|
|
|
# Wait for startup
|
|
sleep 60
|
|
|
|
# Health Check
|
|
ssh $node "docker exec queue-worker php console.php worker:health" || exit 1
|
|
|
|
echo "$node updated successfully"
|
|
done
|
|
|
|
echo "Rolling update completed."
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
### Common Issues
|
|
|
|
**1. Worker startet nicht:**
|
|
```bash
|
|
# Check Database Connection
|
|
php console.php db:status
|
|
|
|
# Check Worker Configuration
|
|
php console.php worker:validate-config
|
|
|
|
# Check System Resources
|
|
free -h && df -h
|
|
|
|
# Check Docker Logs
|
|
docker-compose logs queue-worker
|
|
```
|
|
|
|
**2. Jobs bleiben in der Queue hängen:**
|
|
```bash
|
|
# Check Worker Status
|
|
php console.php worker:list
|
|
|
|
# Check Distributed Locks
|
|
php console.php queue:list-locks
|
|
|
|
# Force Failover Recovery
|
|
php console.php worker:failover-recovery --force
|
|
|
|
# Clear Stale Locks
|
|
php console.php queue:clear-stale-locks
|
|
```
|
|
|
|
**3. Performance Issues:**
|
|
```bash
|
|
# Check Database Performance
|
|
php console.php db:performance-stats
|
|
|
|
# Check Worker Resource Usage
|
|
php console.php worker:resource-stats
|
|
|
|
# Analyze Slow Queries
|
|
mysql framework_production -e "SELECT * FROM performance_schema.events_statements_summary_by_digest ORDER BY avg_timer_wait DESC LIMIT 10;"
|
|
```
|
|
|
|
### Emergency Procedures
|
|
|
|
**Worker Pool Restart:**
|
|
```bash
|
|
# Graceful restart aller Worker
|
|
docker-compose exec queue-worker php console.php worker:stop-all --graceful
|
|
|
|
# Wait for shutdown
|
|
sleep 60
|
|
|
|
# Restart Container
|
|
docker-compose restart queue-worker
|
|
|
|
# Verify restart
|
|
php console.php worker:list
|
|
```
|
|
|
|
**Database Failover:**
|
|
```bash
|
|
# Switch to backup database
|
|
sed -i 's/DB_HOST=primary-db/DB_HOST=backup-db/' .env.production
|
|
|
|
# Restart worker pool
|
|
docker-compose restart queue-worker
|
|
|
|
# Verify connection
|
|
php console.php db:status
|
|
```
|
|
|
|
## Performance Benchmarks
|
|
|
|
Basierend auf den Performance-Tests sind folgende Benchmarks zu erwarten:
|
|
|
|
### Einzelner Worker
|
|
- **Job Distribution**: < 10ms pro Job
|
|
- **Worker Selection**: < 5ms pro Auswahl
|
|
- **Lock Acquisition**: < 2ms pro Lock
|
|
- **Health Check**: < 1ms pro Check
|
|
|
|
### Multi-Worker Setup (3 Worker)
|
|
- **Throughput**: 500+ Jobs/Sekunde
|
|
- **Load Balancing**: Gleichmäßige Verteilung ±5%
|
|
- **Failover Time**: < 30 Sekunden
|
|
- **Resource Usage**: < 80% CPU bei Vollast
|
|
|
|
### Database Performance
|
|
- **Connection Pool**: 95%+ Effizienz
|
|
- **Query Response**: < 10ms für Standard-Operationen
|
|
- **Lock Contention**: < 1% bei normalem Load
|
|
|
|
Diese Dokumentation sollte als Grundlage für die Produktions-Bereitstellung dienen und regelmäßig aktualisiert werden, basierend auf operativen Erfahrungen. |