Files
michaelschiemer/docs/deployment/docker-compose-production.md
Michael Schiemer fc3d7e6357 feat(Production): Complete production deployment infrastructure
- Add comprehensive health check system with multiple endpoints
- Add Prometheus metrics endpoint
- Add production logging configurations (5 strategies)
- Add complete deployment documentation suite:
  * QUICKSTART.md - 30-minute deployment guide
  * DEPLOYMENT_CHECKLIST.md - Printable verification checklist
  * DEPLOYMENT_WORKFLOW.md - Complete deployment lifecycle
  * PRODUCTION_DEPLOYMENT.md - Comprehensive technical reference
  * production-logging.md - Logging configuration guide
  * ANSIBLE_DEPLOYMENT.md - Infrastructure as Code automation
  * README.md - Navigation hub
  * DEPLOYMENT_SUMMARY.md - Executive summary
- Add deployment scripts and automation
- Add DEPLOYMENT_PLAN.md - Concrete plan for immediate deployment
- Update README with production-ready features

All production infrastructure is now complete and ready for deployment.
2025-10-25 19:18:37 +02:00

712 lines
17 KiB
Markdown

# Production Docker Compose Configuration
Production Docker Compose configuration mit Sicherheits-Härtung, Performance-Optimierung und Monitoring für das Custom PHP Framework.
## Übersicht
Das Projekt verwendet Docker Compose Overlay-Pattern:
- **Base**: `docker-compose.yml` - Entwicklungsumgebung
- **Production**: `docker-compose.production.yml` - Production-spezifische Overrides
## Usage
```bash
# Production-Stack starten
docker-compose -f docker-compose.yml \
-f docker-compose.production.yml \
--env-file .env.production \
up -d
# Mit Build (bei Änderungen)
docker-compose -f docker-compose.yml \
-f docker-compose.production.yml \
--env-file .env.production \
up -d --build
# Stack stoppen
docker-compose -f docker-compose.yml \
-f docker-compose.production.yml \
down
# Logs anzeigen
docker-compose -f docker-compose.yml \
-f docker-compose.production.yml \
logs -f [service]
# Service Health Check
docker-compose -f docker-compose.yml \
-f docker-compose.production.yml \
ps
```
## Production Overrides
### 1. Web (Nginx) Service
**Restart Policy**:
```yaml
restart: always # Automatischer Neustart bei Fehlern
```
**SSL/TLS Configuration**:
```yaml
volumes:
- certbot-conf:/etc/letsencrypt:ro
- certbot-www:/var/www/certbot:ro
```
- Let's Encrypt Zertifikate via Certbot
- Read-only Mounts für Sicherheit
**Health Checks**:
```yaml
healthcheck:
test: ["CMD", "curl", "-f", "https://localhost/health"]
interval: 15s
timeout: 5s
retries: 5
start_period: 30s
```
- HTTPS Health Check auf `/health` Endpoint
- 15 Sekunden Intervall für schnelle Fehler-Erkennung
- 5 Retries vor Service-Nestart
**Resource Limits**:
```yaml
deploy:
resources:
limits:
memory: 512M
cpus: '1.0'
reservations:
memory: 256M
cpus: '0.5'
```
- Nginx ist lightweight, moderate Limits
**Logging**:
```yaml
logging:
driver: json-file
options:
max-size: "10m"
max-file: "5"
compress: "true"
labels: "service,environment"
```
- JSON-Format für Log-Aggregation (ELK Stack kompatibel)
- 10MB pro Datei, 5 Dateien = 50MB total
- Komprimierte Rotation
### 2. PHP Service
**Restart Policy**:
```yaml
restart: always
```
**Build Configuration**:
```yaml
build:
args:
- ENV=production
- COMPOSER_INSTALL_FLAGS=--no-dev --optimize-autoloader --classmap-authoritative
```
- `--no-dev`: Keine Development-Dependencies
- `--optimize-autoloader`: PSR-4 Optimization
- `--classmap-authoritative`: Keine Filesystem-Lookups (Performance)
**Environment**:
```yaml
environment:
- APP_ENV=production
- APP_DEBUG=false # DEBUG AUS in Production!
- PHP_MEMORY_LIMIT=512M
- PHP_MAX_EXECUTION_TIME=30
- XDEBUG_MODE=off # Xdebug aus für Performance
```
**Health Checks**:
```yaml
healthcheck:
test: ["CMD", "php-fpm-healthcheck"]
interval: 15s
timeout: 5s
retries: 5
start_period: 30s
```
- PHP-FPM Health Check via Custom Script
- Schnelles Failure-Detection
**Resource Limits**:
```yaml
deploy:
resources:
limits:
memory: 1G
cpus: '2.0'
reservations:
memory: 512M
cpus: '1.0'
```
- PHP benötigt mehr Memory als Nginx
- 2 CPUs für parallele Request-Verarbeitung
**Volumes**:
```yaml
volumes:
- storage-logs:/var/www/html/storage/logs:rw
- storage-cache:/var/www/html/storage/cache:rw
- storage-queue:/var/www/html/storage/queue:rw
- storage-discovery:/var/www/html/storage/discovery:rw
- storage-uploads:/var/www/html/storage/uploads:rw
```
- Nur notwendige Docker Volumes
- **KEINE Host-Mounts** für Sicherheit
- Application Code im Image (nicht gemountet)
### 3. Database (PostgreSQL 16) Service
**Restart Policy**:
```yaml
restart: always
```
**Production Configuration**:
```yaml
volumes:
- db_data:/var/lib/postgresql/data
- ./docker/postgres/postgresql.production.conf:/etc/postgresql/postgresql.conf:ro
- ./docker/postgres/init:/docker-entrypoint-initdb.d:ro
```
- Production-optimierte `postgresql.production.conf`
- Init-Scripts für Schema-Setup
**Resource Limits**:
```yaml
deploy:
resources:
limits:
memory: 2G
cpus: '2.0'
reservations:
memory: 1G
cpus: '1.0'
```
- PostgreSQL benötigt Memory für `shared_buffers` (2GB in Config)
- 2 CPUs für parallele Query-Verarbeitung
**Health Checks**:
```yaml
healthcheck:
test: ["CMD-SHELL", "pg_isready -U ${DB_USERNAME:-postgres} -d ${DB_DATABASE:-michaelschiemer}"]
interval: 10s
timeout: 3s
retries: 5
start_period: 30s
```
- `pg_isready` für schnelle Connection-Prüfung
- 10 Sekunden Intervall (häufiger als andere Services)
**Logging**:
```yaml
logging:
driver: json-file
options:
max-size: "20m" # Größere Log-Dateien für PostgreSQL
max-file: "10"
compress: "true"
```
- PostgreSQL loggt mehr (Slow Queries, Checkpoints, etc.)
- 20MB pro Datei, 10 Dateien = 200MB total
### 4. Redis Service
**Restart Policy**:
```yaml
restart: always
```
**Resource Limits**:
```yaml
deploy:
resources:
limits:
memory: 512M
cpus: '1.0'
reservations:
memory: 256M
cpus: '0.5'
```
- Redis ist Memory-basiert, moderate Limits
**Health Checks**:
```yaml
healthcheck:
test: ["CMD", "redis-cli", "--raw", "incr", "ping"]
interval: 10s
timeout: 3s
retries: 5
start_period: 10s
```
- `redis-cli ping` für Connection-Check
- Schneller Start (10s start_period)
### 5. Queue Worker Service
**Restart Policy**:
```yaml
restart: always
```
**Environment**:
```yaml
environment:
- APP_ENV=production
- WORKER_DEBUG=false
- WORKER_SLEEP_TIME=100000
- WORKER_MAX_JOBS=10000
```
- Production-Modus ohne Debug
- 10,000 Jobs pro Worker-Lifecycle
**Resource Limits**:
```yaml
deploy:
resources:
limits:
memory: 2G
cpus: '2.0'
reservations:
memory: 1G
cpus: '1.0'
replicas: 2 # 2 Worker-Instanzen
```
- Worker benötigen Memory für Job-Processing
- **2 Replicas** für Parallelität
**Graceful Shutdown**:
```yaml
stop_grace_period: 60s
```
- 60 Sekunden für Job-Completion vor Shutdown
- Verhindert Job-Abbrüche
**Logging**:
```yaml
logging:
driver: json-file
options:
max-size: "20m"
max-file: "10"
compress: "true"
```
- Worker loggen ausführlich (Job-Start, Completion, Errors)
- 200MB total Log-Storage
### 6. Certbot Service
**Restart Policy**:
```yaml
restart: always
```
**Auto-Renewal**:
```yaml
entrypoint: "/bin/sh -c 'trap exit TERM; while :; do certbot renew --webroot -w /var/www/certbot --quiet; sleep 12h & wait $${!}; done;'"
```
- Automatische Erneuerung alle 12 Stunden
- Webroot-Challenge über Nginx
**Volumes**:
```yaml
volumes:
- certbot-conf:/etc/letsencrypt
- certbot-www:/var/www/certbot
- certbot-logs:/var/log/letsencrypt
```
- Zertifikate werden mit Nginx geteilt
## Network Configuration
**Security Isolation**:
```yaml
networks:
frontend:
driver: bridge
backend:
driver: bridge
internal: true # Backend network is internal (no internet access)
cache:
driver: bridge
internal: true # Cache network is internal
```
**Network-Segmentierung**:
- **Frontend**: Nginx, Certbot (Internet-Zugriff)
- **Backend**: PHP, PostgreSQL, Queue Worker (KEIN Internet-Zugriff)
- **Cache**: Redis (KEIN Internet-Zugriff)
**Security Benefits**:
- Backend Services können nicht nach außen kommunizieren
- Verhindert Data Exfiltration bei Compromise
- Zero-Trust Network Architecture
## Volumes Configuration
**SSL/TLS Volumes**:
```yaml
certbot-conf:
driver: local
certbot-www:
driver: local
certbot-logs:
driver: local
```
**Application Storage Volumes**:
```yaml
storage-logs:
driver: local
storage-cache:
driver: local
storage-queue:
driver: local
storage-discovery:
driver: local
storage-uploads:
driver: local
```
**Database Volume**:
```yaml
db_data:
driver: local
# Optional: External volume for backups
# driver_opts:
# type: none
# o: bind
# device: /mnt/db-backups/michaelschiemer-prod
```
**Volume Best Practices**:
- Alle Volumes sind `driver: local` (nicht Host-Mounts)
- Für Backups: Optional External Volume für Database
- Keine Development-Host-Mounts in Production
## Logging Strategy
**JSON Logging** für alle Services:
```yaml
logging:
driver: json-file
options:
max-size: "10m" # Service-abhängig
max-file: "5" # Service-abhängig
compress: "true"
labels: "service,environment"
```
**Log Rotation**:
| Service | Max Size | Max Files | Total Storage |
|---------|----------|-----------|---------------|
| Nginx | 10MB | 5 | 50MB |
| PHP | 10MB | 10 | 100MB |
| PostgreSQL | 20MB | 10 | 200MB |
| Redis | 10MB | 5 | 50MB |
| Queue Worker | 20MB | 10 | 200MB |
| Certbot | 5MB | 3 | 15MB |
| **TOTAL** | | | **615MB** |
**Log Aggregation**:
- JSON-Format für ELK Stack (Elasticsearch, Logstash, Kibana)
- Labels für Service-Identifikation
- Komprimierte Log-Files für Storage-Effizienz
## Resource Allocation
**Total Resource Requirements**:
| Service | Memory Limit | Memory Reservation | CPU Limit | CPU Reservation |
|---------|--------------|-------------------|-----------|-----------------|
| Nginx | 512M | 256M | 1.0 | 0.5 |
| PHP | 1G | 512M | 2.0 | 1.0 |
| PostgreSQL | 2G | 1G | 2.0 | 1.0 |
| Redis | 512M | 256M | 1.0 | 0.5 |
| Queue Worker (x2) | 4G | 2G | 4.0 | 2.0 |
| **TOTAL** | **8GB** | **4GB** | **10 CPUs** | **5 CPUs** |
**Server Sizing Recommendations**:
- **Minimum**: 8GB RAM, 4 CPUs (Resource Limits)
- **Recommended**: 16GB RAM, 8 CPUs (Headroom für OS und Spikes)
- **Optimal**: 32GB RAM, 16 CPUs (Production mit Monitoring)
## Health Checks
**Health Check Strategy**:
| Service | Endpoint | Interval | Timeout | Retries | Start Period |
|---------|----------|----------|---------|---------|--------------|
| Nginx | HTTPS /health | 15s | 5s | 5 | 30s |
| PHP | php-fpm-healthcheck | 15s | 5s | 5 | 30s |
| PostgreSQL | pg_isready | 10s | 3s | 5 | 30s |
| Redis | redis-cli ping | 10s | 3s | 5 | 10s |
**Health Check Benefits**:
- Automatische Service-Recovery bei Failures
- Docker orchestriert Neustart nur bei unhealthy Services
- Health-Status via `docker-compose ps`
## Deployment Workflow
### Initial Deployment
```bash
# 1. Server vorbereiten (siehe production-prerequisites.md)
# 2. .env.production konfigurieren (siehe env-production-template.md)
# 3. Build und Deploy
docker-compose -f docker-compose.yml \
-f docker-compose.production.yml \
--env-file .env.production \
up -d --build
# 4. SSL Zertifikate initialisieren
docker exec php php console.php ssl:init
# 5. Database Migrationen
docker exec php php console.php db:migrate
# 6. Health Checks verifizieren
docker-compose -f docker-compose.yml \
-f docker-compose.production.yml \
ps
```
### Rolling Update (Zero-Downtime)
```bash
# 1. Neue Version pullen
git pull origin main
# 2. Build neue Images
docker-compose -f docker-compose.yml \
-f docker-compose.production.yml \
--env-file .env.production \
build --no-cache
# 3. Rolling Update (Service für Service)
# Nginx
docker-compose -f docker-compose.yml \
-f docker-compose.production.yml \
up -d --no-deps web
# PHP (nach Nginx)
docker-compose -f docker-compose.yml \
-f docker-compose.production.yml \
up -d --no-deps php
# Queue Worker (nach PHP)
docker-compose -f docker-compose.yml \
-f docker-compose.production.yml \
up -d --no-deps --scale queue-worker=2 queue-worker
# 4. Health Checks verifizieren
docker-compose -f docker-compose.yml \
-f docker-compose.production.yml \
ps
```
### Rollback Strategy
```bash
# 1. Previous Git Commit
git log --oneline -5
git checkout <previous-commit>
# 2. Rebuild und Deploy
docker-compose -f docker-compose.yml \
-f docker-compose.production.yml \
--env-file .env.production \
up -d --build
# 3. Database Rollback (wenn nötig)
docker exec php php console.php db:rollback 1
```
## Monitoring
### Container Status
```bash
# Status aller Services
docker-compose -f docker-compose.yml \
-f docker-compose.production.yml \
ps
# Detaillierte Informationen
docker inspect <container-name>
```
### Resource Usage
```bash
# CPU/Memory Usage
docker stats
# Service-spezifisch
docker stats php db redis
```
### Logs
```bash
# Alle Logs (Follow)
docker-compose -f docker-compose.yml \
-f docker-compose.production.yml \
logs -f
# Service-spezifisch
docker-compose -f docker-compose.yml \
-f docker-compose.production.yml \
logs -f php
# Letzte N Zeilen
docker-compose -f docker-compose.yml \
-f docker-compose.production.yml \
logs --tail=100 php
```
### Health Check Status
```bash
# Health Check Logs
docker inspect --format='{{json .State.Health}}' php | jq
# Health History
docker inspect --format='{{range .State.Health.Log}}{{.Start}} {{.ExitCode}} {{.Output}}{{end}}' php
```
## Backup Strategy
### Database Backup
```bash
# Manual Backup
docker exec db pg_dump -U postgres michaelschiemer_prod > backup_$(date +%Y%m%d_%H%M%S).sql
# Automated Backup (Cron)
# /etc/cron.daily/postgres-backup
#!/bin/bash
docker exec db pg_dump -U postgres michaelschiemer_prod | gzip > /mnt/backups/michaelschiemer_$(date +%Y%m%d).sql.gz
```
### Volume Backup
```bash
# Backup all volumes
docker run --rm \
-v michaelschiemer_db_data:/data:ro \
-v $(pwd)/backups:/backup \
alpine tar czf /backup/db_data_$(date +%Y%m%d).tar.gz -C /data .
```
## Troubleshooting
### Service Won't Start
```bash
# Check logs
docker-compose -f docker-compose.yml \
-f docker-compose.production.yml \
logs <service>
# Check configuration
docker-compose -f docker-compose.yml \
-f docker-compose.production.yml \
config
```
### Health Check Failing
```bash
# Manual health check
docker exec php php-fpm-healthcheck
docker exec db pg_isready -U postgres
docker exec redis redis-cli ping
# Check health logs
docker inspect --format='{{json .State.Health}}' <container> | jq
```
### Memory Issues
```bash
# Check memory usage
docker stats
# Increase limits in docker-compose.production.yml
# Then restart service
docker-compose -f docker-compose.yml \
-f docker-compose.production.yml \
up -d --no-deps <service>
```
### Network Issues
```bash
# Check networks
docker network ls
docker network inspect michaelschiemer-prod_backend
# Test connectivity
docker exec php ping db
docker exec php nc -zv db 5432
```
## Security Considerations
### 1. Network Isolation
- ✅ Backend network is internal (no internet access)
- ✅ Cache network is internal
- ✅ Only frontend services expose ports
### 2. Volume Security
- ✅ No host mounts (application code in image)
- ✅ Read-only mounts where possible (SSL certificates)
- ✅ Named Docker volumes (managed by Docker)
### 3. Secrets Management
- ✅ Use `.env.production` (not committed to git)
- ✅ Use Vault for sensitive data
- ✅ No secrets in docker-compose files
### 4. Resource Limits
- ✅ All services have memory limits (prevent OOM)
- ✅ CPU limits prevent resource starvation
- ✅ Restart policies for automatic recovery
### 5. Logging
- ✅ JSON logging for security monitoring
- ✅ Log rotation prevents disk exhaustion
- ✅ Compressed logs for storage efficiency
## Best Practices
1. **Always use `.env.production`** - Never commit production secrets
2. **Test updates in staging first** - Use same docker-compose setup
3. **Monitor resource usage** - Adjust limits based on metrics
4. **Regular backups** - Automate database and volume backups
5. **Health checks** - Ensure all services have working health checks
6. **Log aggregation** - Send logs to centralized logging system (ELK)
7. **SSL renewal** - Monitor Certbot logs for renewal issues
8. **Security updates** - Regularly update Docker images
## See Also
- **Prerequisites**: `docs/deployment/production-prerequisites.md`
- **Environment Configuration**: `docs/deployment/env-production-template.md`
- **SSL Setup**: `docs/deployment/ssl-setup.md`
- **Database Migrations**: `docs/deployment/database-migration-strategy.md`
- **Logging Configuration**: `docs/deployment/logging-configuration.md`