Files
michaelschiemer/docs/deployment/docker-compose-production.md
Michael Schiemer fc3d7e6357 feat(Production): Complete production deployment infrastructure
- Add comprehensive health check system with multiple endpoints
- Add Prometheus metrics endpoint
- Add production logging configurations (5 strategies)
- Add complete deployment documentation suite:
  * QUICKSTART.md - 30-minute deployment guide
  * DEPLOYMENT_CHECKLIST.md - Printable verification checklist
  * DEPLOYMENT_WORKFLOW.md - Complete deployment lifecycle
  * PRODUCTION_DEPLOYMENT.md - Comprehensive technical reference
  * production-logging.md - Logging configuration guide
  * ANSIBLE_DEPLOYMENT.md - Infrastructure as Code automation
  * README.md - Navigation hub
  * DEPLOYMENT_SUMMARY.md - Executive summary
- Add deployment scripts and automation
- Add DEPLOYMENT_PLAN.md - Concrete plan for immediate deployment
- Update README with production-ready features

All production infrastructure is now complete and ready for deployment.
2025-10-25 19:18:37 +02:00

17 KiB

Production Docker Compose Configuration

Production Docker Compose configuration mit Sicherheits-Härtung, Performance-Optimierung und Monitoring für das Custom PHP Framework.

Übersicht

Das Projekt verwendet Docker Compose Overlay-Pattern:

  • Base: docker-compose.yml - Entwicklungsumgebung
  • Production: docker-compose.production.yml - Production-spezifische Overrides

Usage

# Production-Stack starten
docker-compose -f docker-compose.yml \
               -f docker-compose.production.yml \
               --env-file .env.production \
               up -d

# Mit Build (bei Änderungen)
docker-compose -f docker-compose.yml \
               -f docker-compose.production.yml \
               --env-file .env.production \
               up -d --build

# Stack stoppen
docker-compose -f docker-compose.yml \
               -f docker-compose.production.yml \
               down

# Logs anzeigen
docker-compose -f docker-compose.yml \
               -f docker-compose.production.yml \
               logs -f [service]

# Service Health Check
docker-compose -f docker-compose.yml \
               -f docker-compose.production.yml \
               ps

Production Overrides

1. Web (Nginx) Service

Restart Policy:

restart: always  # Automatischer Neustart bei Fehlern

SSL/TLS Configuration:

volumes:
  - certbot-conf:/etc/letsencrypt:ro
  - certbot-www:/var/www/certbot:ro
  • Let's Encrypt Zertifikate via Certbot
  • Read-only Mounts für Sicherheit

Health Checks:

healthcheck:
  test: ["CMD", "curl", "-f", "https://localhost/health"]
  interval: 15s
  timeout: 5s
  retries: 5
  start_period: 30s
  • HTTPS Health Check auf /health Endpoint
  • 15 Sekunden Intervall für schnelle Fehler-Erkennung
  • 5 Retries vor Service-Nestart

Resource Limits:

deploy:
  resources:
    limits:
      memory: 512M
      cpus: '1.0'
    reservations:
      memory: 256M
      cpus: '0.5'
  • Nginx ist lightweight, moderate Limits

Logging:

logging:
  driver: json-file
  options:
    max-size: "10m"
    max-file: "5"
    compress: "true"
    labels: "service,environment"
  • JSON-Format für Log-Aggregation (ELK Stack kompatibel)
  • 10MB pro Datei, 5 Dateien = 50MB total
  • Komprimierte Rotation

2. PHP Service

Restart Policy:

restart: always

Build Configuration:

build:
  args:
    - ENV=production
    - COMPOSER_INSTALL_FLAGS=--no-dev --optimize-autoloader --classmap-authoritative
  • --no-dev: Keine Development-Dependencies
  • --optimize-autoloader: PSR-4 Optimization
  • --classmap-authoritative: Keine Filesystem-Lookups (Performance)

Environment:

environment:
  - APP_ENV=production
  - APP_DEBUG=false  # DEBUG AUS in Production!
  - PHP_MEMORY_LIMIT=512M
  - PHP_MAX_EXECUTION_TIME=30
  - XDEBUG_MODE=off  # Xdebug aus für Performance

Health Checks:

healthcheck:
  test: ["CMD", "php-fpm-healthcheck"]
  interval: 15s
  timeout: 5s
  retries: 5
  start_period: 30s
  • PHP-FPM Health Check via Custom Script
  • Schnelles Failure-Detection

Resource Limits:

deploy:
  resources:
    limits:
      memory: 1G
      cpus: '2.0'
    reservations:
      memory: 512M
      cpus: '1.0'
  • PHP benötigt mehr Memory als Nginx
  • 2 CPUs für parallele Request-Verarbeitung

Volumes:

volumes:
  - storage-logs:/var/www/html/storage/logs:rw
  - storage-cache:/var/www/html/storage/cache:rw
  - storage-queue:/var/www/html/storage/queue:rw
  - storage-discovery:/var/www/html/storage/discovery:rw
  - storage-uploads:/var/www/html/storage/uploads:rw
  • Nur notwendige Docker Volumes
  • KEINE Host-Mounts für Sicherheit
  • Application Code im Image (nicht gemountet)

3. Database (PostgreSQL 16) Service

Restart Policy:

restart: always

Production Configuration:

volumes:
  - db_data:/var/lib/postgresql/data
  - ./docker/postgres/postgresql.production.conf:/etc/postgresql/postgresql.conf:ro
  - ./docker/postgres/init:/docker-entrypoint-initdb.d:ro
  • Production-optimierte postgresql.production.conf
  • Init-Scripts für Schema-Setup

Resource Limits:

deploy:
  resources:
    limits:
      memory: 2G
      cpus: '2.0'
    reservations:
      memory: 1G
      cpus: '1.0'
  • PostgreSQL benötigt Memory für shared_buffers (2GB in Config)
  • 2 CPUs für parallele Query-Verarbeitung

Health Checks:

healthcheck:
  test: ["CMD-SHELL", "pg_isready -U ${DB_USERNAME:-postgres} -d ${DB_DATABASE:-michaelschiemer}"]
  interval: 10s
  timeout: 3s
  retries: 5
  start_period: 30s
  • pg_isready für schnelle Connection-Prüfung
  • 10 Sekunden Intervall (häufiger als andere Services)

Logging:

logging:
  driver: json-file
  options:
    max-size: "20m"  # Größere Log-Dateien für PostgreSQL
    max-file: "10"
    compress: "true"
  • PostgreSQL loggt mehr (Slow Queries, Checkpoints, etc.)
  • 20MB pro Datei, 10 Dateien = 200MB total

4. Redis Service

Restart Policy:

restart: always

Resource Limits:

deploy:
  resources:
    limits:
      memory: 512M
      cpus: '1.0'
    reservations:
      memory: 256M
      cpus: '0.5'
  • Redis ist Memory-basiert, moderate Limits

Health Checks:

healthcheck:
  test: ["CMD", "redis-cli", "--raw", "incr", "ping"]
  interval: 10s
  timeout: 3s
  retries: 5
  start_period: 10s
  • redis-cli ping für Connection-Check
  • Schneller Start (10s start_period)

5. Queue Worker Service

Restart Policy:

restart: always

Environment:

environment:
  - APP_ENV=production
  - WORKER_DEBUG=false
  - WORKER_SLEEP_TIME=100000
  - WORKER_MAX_JOBS=10000
  • Production-Modus ohne Debug
  • 10,000 Jobs pro Worker-Lifecycle

Resource Limits:

deploy:
  resources:
    limits:
      memory: 2G
      cpus: '2.0'
    reservations:
      memory: 1G
      cpus: '1.0'
  replicas: 2  # 2 Worker-Instanzen
  • Worker benötigen Memory für Job-Processing
  • 2 Replicas für Parallelität

Graceful Shutdown:

stop_grace_period: 60s
  • 60 Sekunden für Job-Completion vor Shutdown
  • Verhindert Job-Abbrüche

Logging:

logging:
  driver: json-file
  options:
    max-size: "20m"
    max-file: "10"
    compress: "true"
  • Worker loggen ausführlich (Job-Start, Completion, Errors)
  • 200MB total Log-Storage

6. Certbot Service

Restart Policy:

restart: always

Auto-Renewal:

entrypoint: "/bin/sh -c 'trap exit TERM; while :; do certbot renew --webroot -w /var/www/certbot --quiet; sleep 12h & wait $${!}; done;'"
  • Automatische Erneuerung alle 12 Stunden
  • Webroot-Challenge über Nginx

Volumes:

volumes:
  - certbot-conf:/etc/letsencrypt
  - certbot-www:/var/www/certbot
  - certbot-logs:/var/log/letsencrypt
  • Zertifikate werden mit Nginx geteilt

Network Configuration

Security Isolation:

networks:
  frontend:
    driver: bridge
  backend:
    driver: bridge
    internal: true  # Backend network is internal (no internet access)
  cache:
    driver: bridge
    internal: true  # Cache network is internal

Network-Segmentierung:

  • Frontend: Nginx, Certbot (Internet-Zugriff)
  • Backend: PHP, PostgreSQL, Queue Worker (KEIN Internet-Zugriff)
  • Cache: Redis (KEIN Internet-Zugriff)

Security Benefits:

  • Backend Services können nicht nach außen kommunizieren
  • Verhindert Data Exfiltration bei Compromise
  • Zero-Trust Network Architecture

Volumes Configuration

SSL/TLS Volumes:

certbot-conf:
  driver: local
certbot-www:
  driver: local
certbot-logs:
  driver: local

Application Storage Volumes:

storage-logs:
  driver: local
storage-cache:
  driver: local
storage-queue:
  driver: local
storage-discovery:
  driver: local
storage-uploads:
  driver: local

Database Volume:

db_data:
  driver: local
  # Optional: External volume for backups
  # driver_opts:
  #   type: none
  #   o: bind
  #   device: /mnt/db-backups/michaelschiemer-prod

Volume Best Practices:

  • Alle Volumes sind driver: local (nicht Host-Mounts)
  • Für Backups: Optional External Volume für Database
  • Keine Development-Host-Mounts in Production

Logging Strategy

JSON Logging für alle Services:

logging:
  driver: json-file
  options:
    max-size: "10m"  # Service-abhängig
    max-file: "5"    # Service-abhängig
    compress: "true"
    labels: "service,environment"

Log Rotation:

Service Max Size Max Files Total Storage
Nginx 10MB 5 50MB
PHP 10MB 10 100MB
PostgreSQL 20MB 10 200MB
Redis 10MB 5 50MB
Queue Worker 20MB 10 200MB
Certbot 5MB 3 15MB
TOTAL 615MB

Log Aggregation:

  • JSON-Format für ELK Stack (Elasticsearch, Logstash, Kibana)
  • Labels für Service-Identifikation
  • Komprimierte Log-Files für Storage-Effizienz

Resource Allocation

Total Resource Requirements:

Service Memory Limit Memory Reservation CPU Limit CPU Reservation
Nginx 512M 256M 1.0 0.5
PHP 1G 512M 2.0 1.0
PostgreSQL 2G 1G 2.0 1.0
Redis 512M 256M 1.0 0.5
Queue Worker (x2) 4G 2G 4.0 2.0
TOTAL 8GB 4GB 10 CPUs 5 CPUs

Server Sizing Recommendations:

  • Minimum: 8GB RAM, 4 CPUs (Resource Limits)
  • Recommended: 16GB RAM, 8 CPUs (Headroom für OS und Spikes)
  • Optimal: 32GB RAM, 16 CPUs (Production mit Monitoring)

Health Checks

Health Check Strategy:

Service Endpoint Interval Timeout Retries Start Period
Nginx HTTPS /health 15s 5s 5 30s
PHP php-fpm-healthcheck 15s 5s 5 30s
PostgreSQL pg_isready 10s 3s 5 30s
Redis redis-cli ping 10s 3s 5 10s

Health Check Benefits:

  • Automatische Service-Recovery bei Failures
  • Docker orchestriert Neustart nur bei unhealthy Services
  • Health-Status via docker-compose ps

Deployment Workflow

Initial Deployment

# 1. Server vorbereiten (siehe production-prerequisites.md)

# 2. .env.production konfigurieren (siehe env-production-template.md)

# 3. Build und Deploy
docker-compose -f docker-compose.yml \
               -f docker-compose.production.yml \
               --env-file .env.production \
               up -d --build

# 4. SSL Zertifikate initialisieren
docker exec php php console.php ssl:init

# 5. Database Migrationen
docker exec php php console.php db:migrate

# 6. Health Checks verifizieren
docker-compose -f docker-compose.yml \
               -f docker-compose.production.yml \
               ps

Rolling Update (Zero-Downtime)

# 1. Neue Version pullen
git pull origin main

# 2. Build neue Images
docker-compose -f docker-compose.yml \
               -f docker-compose.production.yml \
               --env-file .env.production \
               build --no-cache

# 3. Rolling Update (Service für Service)
# Nginx
docker-compose -f docker-compose.yml \
               -f docker-compose.production.yml \
               up -d --no-deps web

# PHP (nach Nginx)
docker-compose -f docker-compose.yml \
               -f docker-compose.production.yml \
               up -d --no-deps php

# Queue Worker (nach PHP)
docker-compose -f docker-compose.yml \
               -f docker-compose.production.yml \
               up -d --no-deps --scale queue-worker=2 queue-worker

# 4. Health Checks verifizieren
docker-compose -f docker-compose.yml \
               -f docker-compose.production.yml \
               ps

Rollback Strategy

# 1. Previous Git Commit
git log --oneline -5
git checkout <previous-commit>

# 2. Rebuild und Deploy
docker-compose -f docker-compose.yml \
               -f docker-compose.production.yml \
               --env-file .env.production \
               up -d --build

# 3. Database Rollback (wenn nötig)
docker exec php php console.php db:rollback 1

Monitoring

Container Status

# Status aller Services
docker-compose -f docker-compose.yml \
               -f docker-compose.production.yml \
               ps

# Detaillierte Informationen
docker inspect <container-name>

Resource Usage

# CPU/Memory Usage
docker stats

# Service-spezifisch
docker stats php db redis

Logs

# Alle Logs (Follow)
docker-compose -f docker-compose.yml \
               -f docker-compose.production.yml \
               logs -f

# Service-spezifisch
docker-compose -f docker-compose.yml \
               -f docker-compose.production.yml \
               logs -f php

# Letzte N Zeilen
docker-compose -f docker-compose.yml \
               -f docker-compose.production.yml \
               logs --tail=100 php

Health Check Status

# Health Check Logs
docker inspect --format='{{json .State.Health}}' php | jq

# Health History
docker inspect --format='{{range .State.Health.Log}}{{.Start}} {{.ExitCode}} {{.Output}}{{end}}' php

Backup Strategy

Database Backup

# Manual Backup
docker exec db pg_dump -U postgres michaelschiemer_prod > backup_$(date +%Y%m%d_%H%M%S).sql

# Automated Backup (Cron)
# /etc/cron.daily/postgres-backup
#!/bin/bash
docker exec db pg_dump -U postgres michaelschiemer_prod | gzip > /mnt/backups/michaelschiemer_$(date +%Y%m%d).sql.gz

Volume Backup

# Backup all volumes
docker run --rm \
  -v michaelschiemer_db_data:/data:ro \
  -v $(pwd)/backups:/backup \
  alpine tar czf /backup/db_data_$(date +%Y%m%d).tar.gz -C /data .

Troubleshooting

Service Won't Start

# Check logs
docker-compose -f docker-compose.yml \
               -f docker-compose.production.yml \
               logs <service>

# Check configuration
docker-compose -f docker-compose.yml \
               -f docker-compose.production.yml \
               config

Health Check Failing

# Manual health check
docker exec php php-fpm-healthcheck
docker exec db pg_isready -U postgres
docker exec redis redis-cli ping

# Check health logs
docker inspect --format='{{json .State.Health}}' <container> | jq

Memory Issues

# Check memory usage
docker stats

# Increase limits in docker-compose.production.yml
# Then restart service
docker-compose -f docker-compose.yml \
               -f docker-compose.production.yml \
               up -d --no-deps <service>

Network Issues

# Check networks
docker network ls
docker network inspect michaelschiemer-prod_backend

# Test connectivity
docker exec php ping db
docker exec php nc -zv db 5432

Security Considerations

1. Network Isolation

  • Backend network is internal (no internet access)
  • Cache network is internal
  • Only frontend services expose ports

2. Volume Security

  • No host mounts (application code in image)
  • Read-only mounts where possible (SSL certificates)
  • Named Docker volumes (managed by Docker)

3. Secrets Management

  • Use .env.production (not committed to git)
  • Use Vault for sensitive data
  • No secrets in docker-compose files

4. Resource Limits

  • All services have memory limits (prevent OOM)
  • CPU limits prevent resource starvation
  • Restart policies for automatic recovery

5. Logging

  • JSON logging for security monitoring
  • Log rotation prevents disk exhaustion
  • Compressed logs for storage efficiency

Best Practices

  1. Always use .env.production - Never commit production secrets
  2. Test updates in staging first - Use same docker-compose setup
  3. Monitor resource usage - Adjust limits based on metrics
  4. Regular backups - Automate database and volume backups
  5. Health checks - Ensure all services have working health checks
  6. Log aggregation - Send logs to centralized logging system (ELK)
  7. SSL renewal - Monitor Certbot logs for renewal issues
  8. Security updates - Regularly update Docker images

See Also

  • Prerequisites: docs/deployment/production-prerequisites.md
  • Environment Configuration: docs/deployment/env-production-template.md
  • SSL Setup: docs/deployment/ssl-setup.md
  • Database Migrations: docs/deployment/database-migration-strategy.md
  • Logging Configuration: docs/deployment/logging-configuration.md