Files

Michael Schiemer fc3d7e6357 feat(Production): Complete production deployment infrastructure

- Add comprehensive health check system with multiple endpoints
- Add Prometheus metrics endpoint
- Add production logging configurations (5 strategies)
- Add complete deployment documentation suite:
  * QUICKSTART.md - 30-minute deployment guide
  * DEPLOYMENT_CHECKLIST.md - Printable verification checklist
  * DEPLOYMENT_WORKFLOW.md - Complete deployment lifecycle
  * PRODUCTION_DEPLOYMENT.md - Comprehensive technical reference
  * production-logging.md - Logging configuration guide
  * ANSIBLE_DEPLOYMENT.md - Infrastructure as Code automation
  * README.md - Navigation hub
  * DEPLOYMENT_SUMMARY.md - Executive summary
- Add deployment scripts and automation
- Add DEPLOYMENT_PLAN.md - Concrete plan for immediate deployment
- Update README with production-ready features

All production infrastructure is now complete and ready for deployment.

2025-10-25 19:18:37 +02:00

17 KiB

Raw Blame History

Production Docker Compose Configuration

Production Docker Compose configuration mit Sicherheits-Härtung, Performance-Optimierung und Monitoring für das Custom PHP Framework.

Übersicht

Das Projekt verwendet Docker Compose Overlay-Pattern:

Base: docker-compose.yml - Entwicklungsumgebung
Production: docker-compose.production.yml - Production-spezifische Overrides

Usage

# Production-Stack starten
docker-compose -f docker-compose.yml \
               -f docker-compose.production.yml \
               --env-file .env.production \
               up -d

# Mit Build (bei Änderungen)
docker-compose -f docker-compose.yml \
               -f docker-compose.production.yml \
               --env-file .env.production \
               up -d --build

# Stack stoppen
docker-compose -f docker-compose.yml \
               -f docker-compose.production.yml \
               down

# Logs anzeigen
docker-compose -f docker-compose.yml \
               -f docker-compose.production.yml \
               logs -f [service]

# Service Health Check
docker-compose -f docker-compose.yml \
               -f docker-compose.production.yml \
               ps

Production Overrides

1. Web (Nginx) Service

Restart Policy:

restart: always  # Automatischer Neustart bei Fehlern

SSL/TLS Configuration:

volumes:
  - certbot-conf:/etc/letsencrypt:ro
  - certbot-www:/var/www/certbot:ro

Let's Encrypt Zertifikate via Certbot
Read-only Mounts für Sicherheit

Health Checks:

healthcheck:
  test: ["CMD", "curl", "-f", "https://localhost/health"]
  interval: 15s
  timeout: 5s
  retries: 5
  start_period: 30s

HTTPS Health Check auf /health Endpoint
15 Sekunden Intervall für schnelle Fehler-Erkennung
5 Retries vor Service-Nestart

Resource Limits:

deploy:
  resources:
    limits:
      memory: 512M
      cpus: '1.0'
    reservations:
      memory: 256M
      cpus: '0.5'

Nginx ist lightweight, moderate Limits

Logging:

logging:
  driver: json-file
  options:
    max-size: "10m"
    max-file: "5"
    compress: "true"
    labels: "service,environment"

JSON-Format für Log-Aggregation (ELK Stack kompatibel)
10MB pro Datei, 5 Dateien = 50MB total
Komprimierte Rotation

2. PHP Service

Restart Policy:

restart: always

Build Configuration:

build:
  args:
    - ENV=production
    - COMPOSER_INSTALL_FLAGS=--no-dev --optimize-autoloader --classmap-authoritative

--no-dev: Keine Development-Dependencies
--optimize-autoloader: PSR-4 Optimization
--classmap-authoritative: Keine Filesystem-Lookups (Performance)

Environment:

environment:
  - APP_ENV=production
  - APP_DEBUG=false  # DEBUG AUS in Production!
  - PHP_MEMORY_LIMIT=512M
  - PHP_MAX_EXECUTION_TIME=30
  - XDEBUG_MODE=off  # Xdebug aus für Performance

Health Checks:

healthcheck:
  test: ["CMD", "php-fpm-healthcheck"]
  interval: 15s
  timeout: 5s
  retries: 5
  start_period: 30s

PHP-FPM Health Check via Custom Script
Schnelles Failure-Detection

Resource Limits:

deploy:
  resources:
    limits:
      memory: 1G
      cpus: '2.0'
    reservations:
      memory: 512M
      cpus: '1.0'

PHP benötigt mehr Memory als Nginx
2 CPUs für parallele Request-Verarbeitung

Volumes:

volumes:
  - storage-logs:/var/www/html/storage/logs:rw
  - storage-cache:/var/www/html/storage/cache:rw
  - storage-queue:/var/www/html/storage/queue:rw
  - storage-discovery:/var/www/html/storage/discovery:rw
  - storage-uploads:/var/www/html/storage/uploads:rw

Nur notwendige Docker Volumes
KEINE Host-Mounts für Sicherheit
Application Code im Image (nicht gemountet)

3. Database (PostgreSQL 16) Service

Restart Policy:

restart: always

Production Configuration:

volumes:
  - db_data:/var/lib/postgresql/data
  - ./docker/postgres/postgresql.production.conf:/etc/postgresql/postgresql.conf:ro
  - ./docker/postgres/init:/docker-entrypoint-initdb.d:ro

Production-optimierte postgresql.production.conf
Init-Scripts für Schema-Setup

Resource Limits:

deploy:
  resources:
    limits:
      memory: 2G
      cpus: '2.0'
    reservations:
      memory: 1G
      cpus: '1.0'

PostgreSQL benötigt Memory für shared_buffers (2GB in Config)
2 CPUs für parallele Query-Verarbeitung

Health Checks:

healthcheck:
  test: ["CMD-SHELL", "pg_isready -U ${DB_USERNAME:-postgres} -d ${DB_DATABASE:-michaelschiemer}"]
  interval: 10s
  timeout: 3s
  retries: 5
  start_period: 30s

pg_isready für schnelle Connection-Prüfung
10 Sekunden Intervall (häufiger als andere Services)

Logging:

logging:
  driver: json-file
  options:
    max-size: "20m"  # Größere Log-Dateien für PostgreSQL
    max-file: "10"
    compress: "true"

PostgreSQL loggt mehr (Slow Queries, Checkpoints, etc.)
20MB pro Datei, 10 Dateien = 200MB total

4. Redis Service

Restart Policy:

restart: always

Resource Limits:

deploy:
  resources:
    limits:
      memory: 512M
      cpus: '1.0'
    reservations:
      memory: 256M
      cpus: '0.5'

Redis ist Memory-basiert, moderate Limits

Health Checks:

healthcheck:
  test: ["CMD", "redis-cli", "--raw", "incr", "ping"]
  interval: 10s
  timeout: 3s
  retries: 5
  start_period: 10s

redis-cli ping für Connection-Check
Schneller Start (10s start_period)

5. Queue Worker Service

Restart Policy:

restart: always

Environment:

environment:
  - APP_ENV=production
  - WORKER_DEBUG=false
  - WORKER_SLEEP_TIME=100000
  - WORKER_MAX_JOBS=10000

Production-Modus ohne Debug
10,000 Jobs pro Worker-Lifecycle

Resource Limits:

deploy:
  resources:
    limits:
      memory: 2G
      cpus: '2.0'
    reservations:
      memory: 1G
      cpus: '1.0'
  replicas: 2  # 2 Worker-Instanzen

Worker benötigen Memory für Job-Processing
2 Replicas für Parallelität

Graceful Shutdown:

stop_grace_period: 60s

60 Sekunden für Job-Completion vor Shutdown
Verhindert Job-Abbrüche

Logging:

logging:
  driver: json-file
  options:
    max-size: "20m"
    max-file: "10"
    compress: "true"

Worker loggen ausführlich (Job-Start, Completion, Errors)
200MB total Log-Storage

6. Certbot Service

Restart Policy:

restart: always

Auto-Renewal:

entrypoint: "/bin/sh -c 'trap exit TERM; while :; do certbot renew --webroot -w /var/www/certbot --quiet; sleep 12h & wait $${!}; done;'"

Automatische Erneuerung alle 12 Stunden
Webroot-Challenge über Nginx

Volumes:

volumes:
  - certbot-conf:/etc/letsencrypt
  - certbot-www:/var/www/certbot
  - certbot-logs:/var/log/letsencrypt

Zertifikate werden mit Nginx geteilt

Network Configuration

Security Isolation:

networks:
  frontend:
    driver: bridge
  backend:
    driver: bridge
    internal: true  # Backend network is internal (no internet access)
  cache:
    driver: bridge
    internal: true  # Cache network is internal

Network-Segmentierung:

Frontend: Nginx, Certbot (Internet-Zugriff)
Backend: PHP, PostgreSQL, Queue Worker (KEIN Internet-Zugriff)
Cache: Redis (KEIN Internet-Zugriff)

Security Benefits:

Backend Services können nicht nach außen kommunizieren
Verhindert Data Exfiltration bei Compromise
Zero-Trust Network Architecture

Volumes Configuration

SSL/TLS Volumes:

certbot-conf:
  driver: local
certbot-www:
  driver: local
certbot-logs:
  driver: local

Application Storage Volumes:

storage-logs:
  driver: local
storage-cache:
  driver: local
storage-queue:
  driver: local
storage-discovery:
  driver: local
storage-uploads:
  driver: local

Database Volume:

db_data:
  driver: local
  # Optional: External volume for backups
  # driver_opts:
  #   type: none
  #   o: bind
  #   device: /mnt/db-backups/michaelschiemer-prod

Volume Best Practices:

Alle Volumes sind driver: local (nicht Host-Mounts)
Für Backups: Optional External Volume für Database
Keine Development-Host-Mounts in Production

Logging Strategy

JSON Logging für alle Services:

logging:
  driver: json-file
  options:
    max-size: "10m"  # Service-abhängig
    max-file: "5"    # Service-abhängig
    compress: "true"
    labels: "service,environment"

Log Rotation:

Service	Max Size	Max Files	Total Storage
Nginx	10MB	5	50MB
PHP	10MB	10	100MB
PostgreSQL	20MB	10	200MB
Redis	10MB	5	50MB
Queue Worker	20MB	10	200MB
Certbot	5MB	3	15MB
TOTAL			615MB

Log Aggregation:

JSON-Format für ELK Stack (Elasticsearch, Logstash, Kibana)
Labels für Service-Identifikation
Komprimierte Log-Files für Storage-Effizienz

Resource Allocation

Total Resource Requirements:

Service	Memory Limit	Memory Reservation	CPU Limit	CPU Reservation
Nginx	512M	256M	1.0	0.5
PHP	1G	512M	2.0	1.0
PostgreSQL	2G	1G	2.0	1.0
Redis	512M	256M	1.0	0.5
Queue Worker (x2)	4G	2G	4.0	2.0
TOTAL	8GB	4GB	10 CPUs	5 CPUs

Server Sizing Recommendations:

Minimum: 8GB RAM, 4 CPUs (Resource Limits)
Recommended: 16GB RAM, 8 CPUs (Headroom für OS und Spikes)
Optimal: 32GB RAM, 16 CPUs (Production mit Monitoring)

Health Checks

Health Check Strategy:

Service	Endpoint	Interval	Timeout	Retries	Start Period
Nginx	HTTPS /health	15s	5s	5	30s
PHP	php-fpm-healthcheck	15s	5s	5	30s
PostgreSQL	pg_isready	10s	3s	5	30s
Redis	redis-cli ping	10s	3s	5	10s

Health Check Benefits:

Automatische Service-Recovery bei Failures
Docker orchestriert Neustart nur bei unhealthy Services
Health-Status via docker-compose ps

Deployment Workflow

Initial Deployment

# 1. Server vorbereiten (siehe production-prerequisites.md)

# 2. .env.production konfigurieren (siehe env-production-template.md)

# 3. Build und Deploy
docker-compose -f docker-compose.yml \
               -f docker-compose.production.yml \
               --env-file .env.production \
               up -d --build

# 4. SSL Zertifikate initialisieren
docker exec php php console.php ssl:init

# 5. Database Migrationen
docker exec php php console.php db:migrate

# 6. Health Checks verifizieren
docker-compose -f docker-compose.yml \
               -f docker-compose.production.yml \
               ps

Rolling Update (Zero-Downtime)

# 1. Neue Version pullen
git pull origin main

# 2. Build neue Images
docker-compose -f docker-compose.yml \
               -f docker-compose.production.yml \
               --env-file .env.production \
               build --no-cache

# 3. Rolling Update (Service für Service)
# Nginx
docker-compose -f docker-compose.yml \
               -f docker-compose.production.yml \
               up -d --no-deps web

# PHP (nach Nginx)
docker-compose -f docker-compose.yml \
               -f docker-compose.production.yml \
               up -d --no-deps php

# Queue Worker (nach PHP)
docker-compose -f docker-compose.yml \
               -f docker-compose.production.yml \
               up -d --no-deps --scale queue-worker=2 queue-worker

# 4. Health Checks verifizieren
docker-compose -f docker-compose.yml \
               -f docker-compose.production.yml \
               ps

Rollback Strategy

# 1. Previous Git Commit
git log --oneline -5
git checkout <previous-commit>

# 2. Rebuild und Deploy
docker-compose -f docker-compose.yml \
               -f docker-compose.production.yml \
               --env-file .env.production \
               up -d --build

# 3. Database Rollback (wenn nötig)
docker exec php php console.php db:rollback 1

Monitoring

Container Status

# Status aller Services
docker-compose -f docker-compose.yml \
               -f docker-compose.production.yml \
               ps

# Detaillierte Informationen
docker inspect <container-name>

Resource Usage

# CPU/Memory Usage
docker stats

# Service-spezifisch
docker stats php db redis

Logs

# Alle Logs (Follow)
docker-compose -f docker-compose.yml \
               -f docker-compose.production.yml \
               logs -f

# Service-spezifisch
docker-compose -f docker-compose.yml \
               -f docker-compose.production.yml \
               logs -f php

# Letzte N Zeilen
docker-compose -f docker-compose.yml \
               -f docker-compose.production.yml \
               logs --tail=100 php

Health Check Status

# Health Check Logs
docker inspect --format='{{json .State.Health}}' php | jq

# Health History
docker inspect --format='{{range .State.Health.Log}}{{.Start}} {{.ExitCode}} {{.Output}}{{end}}' php

Backup Strategy

Database Backup

# Manual Backup
docker exec db pg_dump -U postgres michaelschiemer_prod > backup_$(date +%Y%m%d_%H%M%S).sql

# Automated Backup (Cron)
# /etc/cron.daily/postgres-backup
#!/bin/bash
docker exec db pg_dump -U postgres michaelschiemer_prod | gzip > /mnt/backups/michaelschiemer_$(date +%Y%m%d).sql.gz

Volume Backup

# Backup all volumes
docker run --rm \
  -v michaelschiemer_db_data:/data:ro \
  -v $(pwd)/backups:/backup \
  alpine tar czf /backup/db_data_$(date +%Y%m%d).tar.gz -C /data .

Troubleshooting

Service Won't Start

# Check logs
docker-compose -f docker-compose.yml \
               -f docker-compose.production.yml \
               logs <service>

# Check configuration
docker-compose -f docker-compose.yml \
               -f docker-compose.production.yml \
               config

Health Check Failing

# Manual health check
docker exec php php-fpm-healthcheck
docker exec db pg_isready -U postgres
docker exec redis redis-cli ping

# Check health logs
docker inspect --format='{{json .State.Health}}' <container> | jq

Memory Issues

# Check memory usage
docker stats

# Increase limits in docker-compose.production.yml
# Then restart service
docker-compose -f docker-compose.yml \
               -f docker-compose.production.yml \
               up -d --no-deps <service>

Network Issues

# Check networks
docker network ls
docker network inspect michaelschiemer-prod_backend

# Test connectivity
docker exec php ping db
docker exec php nc -zv db 5432

Security Considerations

1. Network Isolation

✅ Backend network is internal (no internet access)
✅ Cache network is internal
✅ Only frontend services expose ports

2. Volume Security

✅ No host mounts (application code in image)
✅ Read-only mounts where possible (SSL certificates)
✅ Named Docker volumes (managed by Docker)

3. Secrets Management

✅ Use .env.production (not committed to git)
✅ Use Vault for sensitive data
✅ No secrets in docker-compose files

4. Resource Limits

✅ All services have memory limits (prevent OOM)
✅ CPU limits prevent resource starvation
✅ Restart policies for automatic recovery

5. Logging

✅ JSON logging for security monitoring
✅ Log rotation prevents disk exhaustion
✅ Compressed logs for storage efficiency

Best Practices

Always use .env.production - Never commit production secrets
Test updates in staging first - Use same docker-compose setup
Monitor resource usage - Adjust limits based on metrics
Regular backups - Automate database and volume backups
Health checks - Ensure all services have working health checks
Log aggregation - Send logs to centralized logging system (ELK)
SSL renewal - Monitor Certbot logs for renewal issues
Security updates - Regularly update Docker images

17 KiB Raw Blame History

Production Docker Compose Configuration

Übersicht

Usage

Production Overrides

1. Web (Nginx) Service

2. PHP Service

3. Database (PostgreSQL 16) Service

4. Redis Service

5. Queue Worker Service

6. Certbot Service

Network Configuration

Volumes Configuration

Logging Strategy

Resource Allocation

Health Checks

Deployment Workflow

Initial Deployment

Rolling Update (Zero-Downtime)

Rollback Strategy

Monitoring

Container Status

Resource Usage

Logs

Health Check Status

Backup Strategy

Database Backup

Volume Backup

Troubleshooting

Service Won't Start

Health Check Failing

Memory Issues

Network Issues

Security Considerations

1. Network Isolation

2. Volume Security

3. Secrets Management

4. Resource Limits

5. Logging

Best Practices

See Also

17 KiB

Raw Blame History