Files

Michael Schiemer 5050c7d73a docs: consolidate documentation into organized structure

- Move 12 markdown files from root to docs/ subdirectories
- Organize documentation by category:
  • docs/troubleshooting/ (1 file)  - Technical troubleshooting guides
  • docs/deployment/      (4 files) - Deployment and security documentation
  • docs/guides/          (3 files) - Feature-specific guides
  • docs/planning/        (4 files) - Planning and improvement proposals

Root directory cleanup:
- Reduced from 16 to 4 markdown files in root
- Only essential project files remain:
  • CLAUDE.md (AI instructions)
  • README.md (Main project readme)
  • CLEANUP_PLAN.md (Current cleanup plan)
  • SRC_STRUCTURE_IMPROVEMENTS.md (Structure improvements)

This improves:
✅ Documentation discoverability
✅ Logical organization by purpose
✅ Clean root directory
✅ Better maintainability

2025-10-05 11:05:04 +02:00

15 KiB

Raw Blame History

Production Deployment Documentation - Distributed Queue System

Übersicht

Diese Dokumentation beschreibt die Produktions-Bereitstellung des Distributed Queue Processing Systems des Custom PHP Frameworks.

Systemvoraussetzungen

Mindestanforderungen

PHP: 8.3+
MySQL/PostgreSQL: 8.0+ / 13+
Redis: 7.0+ (optional, für Redis-basierte Queues)
RAM: 2GB pro Worker-Node
CPU: 2 Cores pro Worker-Node
Festplatte: 10GB für Logs und temporäre Dateien

Empfohlene Produktionsumgebung

Load Balancer: Nginx/HAProxy
Database: MySQL 8.0+ mit Master-Slave Setup
Caching: Redis Cluster für High Availability
Monitoring: Prometheus + Grafana
Logging: ELK Stack (Elasticsearch, Logstash, Kibana)

Deployment-Schritte

1. Database Setup

# Migrationen ausführen
php console.php db:migrate

# Verify migrations
php console.php db:status

Erwartete Tabellen:

queue_workers - Worker Registration
distributed_locks - Distributed Locking
job_assignments - Job-Worker Assignments
worker_health_checks - Worker Health Monitoring
failover_events - Failover Event Tracking

2. Environment Configuration

Produktions-Environment (.env.production):

# Database Configuration
DB_HOST=production-db-cluster
DB_PORT=3306
DB_NAME=framework_production
DB_USER=queue_user
DB_PASS=secure_production_password

# Queue Configuration
QUEUE_DRIVER=database
QUEUE_DEFAULT=default

# Worker Configuration
WORKER_HEALTH_CHECK_INTERVAL=30
WORKER_REGISTRATION_TTL=300
FAILOVER_CHECK_INTERVAL=60

# Performance Tuning
DB_POOL_SIZE=20
DB_MAX_IDLE_TIME=3600
CACHE_TTL=3600

# Monitoring
LOG_LEVEL=info
PERFORMANCE_MONITORING=true
HEALTH_CHECK_ENDPOINT=/health

3. Worker Node Deployment

Docker Compose für Worker Node:

version: '3.8'
services:
  queue-worker:
    image: custom-php-framework:production
    environment:
      - NODE_ROLE=worker
      - WORKER_QUEUES=default,emails,reports
      - WORKER_CONCURRENCY=4
      - DB_HOST=${DB_HOST}
      - DB_NAME=${DB_NAME}
      - DB_USER=${DB_USER}
      - DB_PASS=${DB_PASS}
    command: php console.php worker:start
    restart: unless-stopped
    deploy:
      replicas: 3
      resources:
        limits:
          memory: 2G
          cpus: '2'
        reservations:
          memory: 1G
          cpus: '1'
    healthcheck:
      test: ["CMD", "php", "console.php", "worker:health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 10s

4. Load Balancer Configuration

Nginx Configuration:

upstream queue_workers {
    least_conn;
    server worker-node-1:80 max_fails=3 fail_timeout=30s;
    server worker-node-2:80 max_fails=3 fail_timeout=30s;
    server worker-node-3:80 max_fails=3 fail_timeout=30s;
}

server {
    listen 80;
    server_name queue.production.example.com;

    location /health {
        proxy_pass http://queue_workers/health;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        access_log off;
    }

    location /admin/queue {
        proxy_pass http://queue_workers/admin/queue;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;

        # Admin-Access nur von internen IPs
        allow 10.0.0.0/8;
        allow 172.16.0.0/12;
        allow 192.168.0.0/16;
        deny all;
    }
}

5. Monitoring Setup

Prometheus Metrics Configuration:

# prometheus.yml
global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'queue-workers'
    static_configs:
      - targets: ['worker-node-1:9090', 'worker-node-2:9090', 'worker-node-3:9090']
    metrics_path: '/metrics'
    scrape_interval: 30s

  - job_name: 'queue-system'
    static_configs:
      - targets: ['queue.production.example.com:80']
    metrics_path: '/admin/metrics'
    scrape_interval: 60s

Grafana Dashboard Queries:

# Worker Health Status
sum(rate(worker_health_checks_total[5m])) by (status)

# Job Processing Rate
rate(jobs_processed_total[5m])

# Queue Length
queue_length{queue_name=~".*"}

# Worker CPU Usage
worker_cpu_usage_percent

# Database Connection Pool
db_connection_pool_active / db_connection_pool_max * 100

Operational Commands

Worker Management

# Worker starten
php console.php worker:start --queues=default,emails --concurrency=4

# Worker-Status prüfen
php console.php worker:list

# Worker beenden (graceful shutdown)
php console.php worker:stop --worker-id=worker_123

# Alle Worker beenden
php console.php worker:stop-all

# Worker-Gesundheitscheck
php console.php worker:health

# Failover-Recovery ausführen
php console.php worker:failover-recovery

# Worker deregistrieren
php console.php worker:deregister --worker-id=worker_123

# Worker-Statistiken
php console.php worker:stats

System Monitoring

# System-Health Check
curl -f http://queue.production.example.com/health

# Worker-Status API
curl http://queue.production.example.com/admin/queue/workers

# Queue-Statistiken
curl http://queue.production.example.com/admin/queue/stats

# Metrics-Endpoint
curl http://queue.production.example.com/admin/metrics

Performance Tuning

Database Optimization

MySQL Configuration (my.cnf):

[mysqld]
# InnoDB Settings für Queue-System
innodb_buffer_pool_size = 2G
innodb_log_file_size = 512M
innodb_flush_log_at_trx_commit = 2
innodb_lock_wait_timeout = 5

# Connection Settings
max_connections = 500
max_connect_errors = 100000
connect_timeout = 10
wait_timeout = 28800

# Query Cache (für Read-Heavy Workloads)
query_cache_type = 1
query_cache_size = 256M
query_cache_limit = 1M

Empfohlene Indizes:

-- Queue Workers Performance
CREATE INDEX idx_worker_status_updated ON queue_workers(status, updated_at);
CREATE INDEX idx_worker_queues ON queue_workers(queues(255));

-- Distributed Locks Performance
CREATE INDEX idx_lock_expires_worker ON distributed_locks(expires_at, worker_id);

-- Job Assignments Performance
CREATE INDEX idx_assignment_worker_time ON job_assignments(worker_id, assigned_at);
CREATE INDEX idx_assignment_queue_time ON job_assignments(queue_name, assigned_at);

-- Health Checks Performance
CREATE INDEX idx_health_worker_time ON worker_health_checks(worker_id, checked_at);
CREATE INDEX idx_health_status_time ON worker_health_checks(status, checked_at);

-- Failover Events Performance
CREATE INDEX idx_failover_worker_time ON failover_events(failed_worker_id, failover_at);
CREATE INDEX idx_failover_event_type ON failover_events(event_type, failover_at);

Application Performance

PHP Configuration (php.ini):

; Memory Limits
memory_limit = 2G
max_execution_time = 300

; OPcache für Production
opcache.enable = 1
opcache.memory_consumption = 256
opcache.max_accelerated_files = 20000
opcache.validate_timestamps = 0

; Session (nicht für Worker benötigt)
session.auto_start = 0

Worker Concurrency Tuning:

# Leichte Jobs (E-Mails, Notifications)
php console.php worker:start --concurrency=8

# Schwere Jobs (Reports, Exports)
php console.php worker:start --concurrency=2

# Mixed Workload
php console.php worker:start --concurrency=4

Security Configuration

Network Security

Firewall Rules (iptables):

# Worker-Nodes untereinander (Health Checks)
iptables -A INPUT -p tcp --dport 8080 -s 10.0.1.0/24 -j ACCEPT

# Database Access nur von Worker-Nodes
iptables -A INPUT -p tcp --dport 3306 -s 10.0.1.0/24 -j ACCEPT

# Redis Access (falls verwendet)
iptables -A INPUT -p tcp --dport 6379 -s 10.0.1.0/24 -j ACCEPT

# Admin Interface nur von Management Network
iptables -A INPUT -p tcp --dport 80 -s 10.0.0.0/24 -j ACCEPT

Database Security

-- Dedicated Queue User mit minimalen Rechten
CREATE USER 'queue_user'@'10.0.1.%' IDENTIFIED BY 'secure_production_password';

GRANT SELECT, INSERT, UPDATE, DELETE ON framework_production.queue_workers TO 'queue_user'@'10.0.1.%';
GRANT SELECT, INSERT, UPDATE, DELETE ON framework_production.distributed_locks TO 'queue_user'@'10.0.1.%';
GRANT SELECT, INSERT, UPDATE, DELETE ON framework_production.job_assignments TO 'queue_user'@'10.0.1.%';
GRANT SELECT, INSERT, UPDATE, DELETE ON framework_production.worker_health_checks TO 'queue_user'@'10.0.1.%';
GRANT SELECT, INSERT, UPDATE, DELETE ON framework_production.failover_events TO 'queue_user'@'10.0.1.%';

FLUSH PRIVILEGES;

Disaster Recovery

Backup Strategy

Database Backup:

#!/bin/bash
# daily-queue-backup.sh

DATE=$(date +%Y%m%d_%H%M%S)
BACKUP_DIR="/backup/queue-system"

# Backup nur Queue-relevante Tabellen
mysqldump --single-transaction \
  --routines --triggers \
  --tables queue_workers distributed_locks job_assignments worker_health_checks failover_events \
  framework_production > $BACKUP_DIR/queue_backup_$DATE.sql

# Retention: 30 Tage
find $BACKUP_DIR -name "queue_backup_*.sql" -mtime +30 -delete

Worker State Backup:

# Worker-Konfiguration sichern
php console.php worker:export-config > /backup/worker-config-$(date +%Y%m%d).json

Recovery Procedures

Database Recovery:

# Tabellen wiederherstellen
mysql framework_production < /backup/queue_backup_YYYYMMDD_HHMMSS.sql

# Migrationen prüfen
php console.php db:status

# Worker neu starten
docker-compose restart queue-worker

Worker Recovery:

# Crashed Worker aufräumen
php console.php worker:cleanup-crashed

# Failover für verlorene Jobs
php console.php worker:failover-recovery

# Worker-Pool neu starten
docker-compose up -d --scale queue-worker=3

Monitoring & Alerting

Critical Alerts

Prometheus Alert Rules (alerts.yml):

groups:
  - name: queue-system
    rules:
      - alert: WorkerDown
        expr: up{job="queue-workers"} == 0
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "Queue worker {{ $labels.instance }} is down"

      - alert: HighJobBacklog
        expr: queue_length > 1000
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High job backlog in queue {{ $labels.queue_name }}"

      - alert: WorkerHealthFailing
        expr: rate(worker_health_checks_failed_total[5m]) > 0.1
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "Worker health checks failing"

      - alert: DatabaseConnectionExhaustion
        expr: db_connection_pool_active / db_connection_pool_max > 0.9
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Database connection pool near exhaustion"

Log Monitoring

Logstash Configuration:

input {
  file {
    path => "/var/log/queue-workers/*.log"
    type => "queue-worker"
    codec => json
  }
}

filter {
  if [type] == "queue-worker" {
    if [level] == "ERROR" or [level] == "CRITICAL" {
      mutate {
        add_tag => ["alert"]
      }
    }
  }
}

output {
  if "alert" in [tags] {
    email {
      to => "ops-team@example.com"
      subject => "Queue System Alert: %{[level]} in %{[component]}"
      body => "Error: %{[message]}\nContext: %{[context]}"
    }
  }

  elasticsearch {
    hosts => ["elasticsearch:9200"]
    index => "queue-logs-%{+YYYY.MM.dd}"
  }
}

Maintenance Procedures

Routine Maintenance

Weekly Maintenance Script:

#!/bin/bash
# weekly-queue-maintenance.sh

echo "Starting weekly queue system maintenance..."

# Cleanup alte Health Check Einträge (> 30 Tage)
php console.php queue:cleanup-health-checks --days=30

# Cleanup alte Failover Events (> 90 Tage)
php console.php queue:cleanup-failover-events --days=90

# Cleanup verwaiste Job Assignments
php console.php queue:cleanup-orphaned-assignments

# Database Statistiken aktualisieren
mysql framework_production -e "ANALYZE TABLE queue_workers, distributed_locks, job_assignments, worker_health_checks, failover_events;"

# Worker-Pool Health Check
php console.php worker:validate-pool

echo "Weekly maintenance completed."

Rolling Updates:

#!/bin/bash
# rolling-update-workers.sh

WORKER_NODES=("worker-node-1" "worker-node-2" "worker-node-3")

for node in "${WORKER_NODES[@]}"; do
    echo "Updating $node..."

    # Graceful Shutdown
    ssh $node "docker exec queue-worker php console.php worker:stop --graceful"

    # Wait for shutdown
    sleep 30

    # Update Container
    ssh $node "docker-compose pull && docker-compose up -d"

    # Wait for startup
    sleep 60

    # Health Check
    ssh $node "docker exec queue-worker php console.php worker:health" || exit 1

    echo "$node updated successfully"
done

echo "Rolling update completed."

Troubleshooting

Common Issues

1. Worker startet nicht:

# Check Database Connection
php console.php db:status

# Check Worker Configuration
php console.php worker:validate-config

# Check System Resources
free -h && df -h

# Check Docker Logs
docker-compose logs queue-worker

2. Jobs bleiben in der Queue hängen:

# Check Worker Status
php console.php worker:list

# Check Distributed Locks
php console.php queue:list-locks

# Force Failover Recovery
php console.php worker:failover-recovery --force

# Clear Stale Locks
php console.php queue:clear-stale-locks

3. Performance Issues:

# Check Database Performance
php console.php db:performance-stats

# Check Worker Resource Usage
php console.php worker:resource-stats

# Analyze Slow Queries
mysql framework_production -e "SELECT * FROM performance_schema.events_statements_summary_by_digest ORDER BY avg_timer_wait DESC LIMIT 10;"

Emergency Procedures

Worker Pool Restart:

# Graceful restart aller Worker
docker-compose exec queue-worker php console.php worker:stop-all --graceful

# Wait for shutdown
sleep 60

# Restart Container
docker-compose restart queue-worker

# Verify restart
php console.php worker:list

Database Failover:

# Switch to backup database
sed -i 's/DB_HOST=primary-db/DB_HOST=backup-db/' .env.production

# Restart worker pool
docker-compose restart queue-worker

# Verify connection
php console.php db:status

Performance Benchmarks

Basierend auf den Performance-Tests sind folgende Benchmarks zu erwarten:

Einzelner Worker

Job Distribution: < 10ms pro Job
Worker Selection: < 5ms pro Auswahl
Lock Acquisition: < 2ms pro Lock
Health Check: < 1ms pro Check

Multi-Worker Setup (3 Worker)

Throughput: 500+ Jobs/Sekunde
Load Balancing: Gleichmäßige Verteilung ±5%
Failover Time: < 30 Sekunden
Resource Usage: < 80% CPU bei Vollast

Database Performance

Connection Pool: 95%+ Effizienz
Query Response: < 10ms für Standard-Operationen
Lock Contention: < 1% bei normalem Load

Diese Dokumentation sollte als Grundlage für die Produktions-Bereitstellung dienen und regelmäßig aktualisiert werden, basierend auf operativen Erfahrungen.

15 KiB Raw Blame History