# Production Docker Compose Configuration

Production Docker Compose configuration mit Sicherheits-Härtung, Performance-Optimierung und Monitoring für das Custom PHP Framework.

## Übersicht

Das Projekt verwendet Docker Compose Overlay-Pattern:
- **Base**: `docker-compose.yml` - Entwicklungsumgebung
- **Production**: `docker-compose.production.yml` - Production-spezifische Overrides

## Usage

```bash
# Production-Stack starten
docker-compose -f docker-compose.yml \
               -f docker-compose.production.yml \
               --env-file .env.production \
               up -d

# Mit Build (bei Änderungen)
docker-compose -f docker-compose.yml \
               -f docker-compose.production.yml \
               --env-file .env.production \
               up -d --build

# Stack stoppen
docker-compose -f docker-compose.yml \
               -f docker-compose.production.yml \
               down

# Logs anzeigen
docker-compose -f docker-compose.yml \
               -f docker-compose.production.yml \
               logs -f [service]

# Service Health Check
docker-compose -f docker-compose.yml \
               -f docker-compose.production.yml \
               ps
```

## Production Overrides

### 1. Web (Nginx) Service

**Restart Policy**:
```yaml
restart: always  # Automatischer Neustart bei Fehlern
```

**SSL/TLS Configuration**:
```yaml
volumes:
  - certbot-conf:/etc/letsencrypt:ro
  - certbot-www:/var/www/certbot:ro
```
- Let's Encrypt Zertifikate via Certbot
- Read-only Mounts für Sicherheit

**Health Checks**:
```yaml
healthcheck:
  test: ["CMD", "curl", "-f", "https://localhost/health"]
  interval: 15s
  timeout: 5s
  retries: 5
  start_period: 30s
```
- HTTPS Health Check auf `/health` Endpoint
- 15 Sekunden Intervall für schnelle Fehler-Erkennung
- 5 Retries vor Service-Nestart

**Resource Limits**:
```yaml
deploy:
  resources:
    limits:
      memory: 512M
      cpus: '1.0'
    reservations:
      memory: 256M
      cpus: '0.5'
```
- Nginx ist lightweight, moderate Limits

**Logging**:
```yaml
logging:
  driver: json-file
  options:
    max-size: "10m"
    max-file: "5"
    compress: "true"
    labels: "service,environment"
```
- JSON-Format für Log-Aggregation (ELK Stack kompatibel)
- 10MB pro Datei, 5 Dateien = 50MB total
- Komprimierte Rotation

### 2. PHP Service

**Restart Policy**:
```yaml
restart: always
```

**Build Configuration**:
```yaml
build:
  args:
    - ENV=production
    - COMPOSER_INSTALL_FLAGS=--no-dev --optimize-autoloader --classmap-authoritative
```
- `--no-dev`: Keine Development-Dependencies
- `--optimize-autoloader`: PSR-4 Optimization
- `--classmap-authoritative`: Keine Filesystem-Lookups (Performance)

**Environment**:
```yaml
environment:
  - APP_ENV=production
  - APP_DEBUG=false  # DEBUG AUS in Production!
  - PHP_MEMORY_LIMIT=512M
  - PHP_MAX_EXECUTION_TIME=30
  - XDEBUG_MODE=off  # Xdebug aus für Performance
```

**Health Checks**:
```yaml
healthcheck:
  test: ["CMD", "php-fpm-healthcheck"]
  interval: 15s
  timeout: 5s
  retries: 5
  start_period: 30s
```
- PHP-FPM Health Check via Custom Script
- Schnelles Failure-Detection

**Resource Limits**:
```yaml
deploy:
  resources:
    limits:
      memory: 1G
      cpus: '2.0'
    reservations:
      memory: 512M
      cpus: '1.0'
```
- PHP benötigt mehr Memory als Nginx
- 2 CPUs für parallele Request-Verarbeitung

**Volumes**:
```yaml
volumes:
  - storage-logs:/var/www/html/storage/logs:rw
  - storage-cache:/var/www/html/storage/cache:rw
  - storage-queue:/var/www/html/storage/queue:rw
  - storage-discovery:/var/www/html/storage/discovery:rw
  - storage-uploads:/var/www/html/storage/uploads:rw
```
- Nur notwendige Docker Volumes
- **KEINE Host-Mounts** für Sicherheit
- Application Code im Image (nicht gemountet)

### 3. Database (PostgreSQL 16) Service

**Restart Policy**:
```yaml
restart: always
```

**Production Configuration**:
```yaml
volumes:
  - db_data:/var/lib/postgresql/data
  - ./docker/postgres/postgresql.production.conf:/etc/postgresql/postgresql.conf:ro
  - ./docker/postgres/init:/docker-entrypoint-initdb.d:ro
```
- Production-optimierte `postgresql.production.conf`
- Init-Scripts für Schema-Setup

**Resource Limits**:
```yaml
deploy:
  resources:
    limits:
      memory: 2G
      cpus: '2.0'
    reservations:
      memory: 1G
      cpus: '1.0'
```
- PostgreSQL benötigt Memory für `shared_buffers` (2GB in Config)
- 2 CPUs für parallele Query-Verarbeitung

**Health Checks**:
```yaml
healthcheck:
  test: ["CMD-SHELL", "pg_isready -U ${DB_USERNAME:-postgres} -d ${DB_DATABASE:-michaelschiemer}"]
  interval: 10s
  timeout: 3s
  retries: 5
  start_period: 30s
```
- `pg_isready` für schnelle Connection-Prüfung
- 10 Sekunden Intervall (häufiger als andere Services)

**Logging**:
```yaml
logging:
  driver: json-file
  options:
    max-size: "20m"  # Größere Log-Dateien für PostgreSQL
    max-file: "10"
    compress: "true"
```
- PostgreSQL loggt mehr (Slow Queries, Checkpoints, etc.)
- 20MB pro Datei, 10 Dateien = 200MB total

### 4. Redis Service

**Restart Policy**:
```yaml
restart: always
```

**Resource Limits**:
```yaml
deploy:
  resources:
    limits:
      memory: 512M
      cpus: '1.0'
    reservations:
      memory: 256M
      cpus: '0.5'
```
- Redis ist Memory-basiert, moderate Limits

**Health Checks**:
```yaml
healthcheck:
  test: ["CMD", "redis-cli", "--raw", "incr", "ping"]
  interval: 10s
  timeout: 3s
  retries: 5
  start_period: 10s
```
- `redis-cli ping` für Connection-Check
- Schneller Start (10s start_period)

### 5. Queue Worker Service

**Restart Policy**:
```yaml
restart: always
```

**Environment**:
```yaml
environment:
  - APP_ENV=production
  - WORKER_DEBUG=false
  - WORKER_SLEEP_TIME=100000
  - WORKER_MAX_JOBS=10000
```
- Production-Modus ohne Debug
- 10,000 Jobs pro Worker-Lifecycle

**Resource Limits**:
```yaml
deploy:
  resources:
    limits:
      memory: 2G
      cpus: '2.0'
    reservations:
      memory: 1G
      cpus: '1.0'
  replicas: 2  # 2 Worker-Instanzen
```
- Worker benötigen Memory für Job-Processing
- **2 Replicas** für Parallelität

**Graceful Shutdown**:
```yaml
stop_grace_period: 60s
```
- 60 Sekunden für Job-Completion vor Shutdown
- Verhindert Job-Abbrüche

**Logging**:
```yaml
logging:
  driver: json-file
  options:
    max-size: "20m"
    max-file: "10"
    compress: "true"
```
- Worker loggen ausführlich (Job-Start, Completion, Errors)
- 200MB total Log-Storage

### 6. Certbot Service

**Restart Policy**:
```yaml
restart: always
```

**Auto-Renewal**:
```yaml
entrypoint: "/bin/sh -c 'trap exit TERM; while :; do certbot renew --webroot -w /var/www/certbot --quiet; sleep 12h & wait $${!}; done;'"
```
- Automatische Erneuerung alle 12 Stunden
- Webroot-Challenge über Nginx

**Volumes**:
```yaml
volumes:
  - certbot-conf:/etc/letsencrypt
  - certbot-www:/var/www/certbot
  - certbot-logs:/var/log/letsencrypt
```
- Zertifikate werden mit Nginx geteilt

## Network Configuration

**Security Isolation**:
```yaml
networks:
  frontend:
    driver: bridge
  backend:
    driver: bridge
    internal: true  # Backend network is internal (no internet access)
  cache:
    driver: bridge
    internal: true  # Cache network is internal
```

**Network-Segmentierung**:
- **Frontend**: Nginx, Certbot (Internet-Zugriff)
- **Backend**: PHP, PostgreSQL, Queue Worker (KEIN Internet-Zugriff)
- **Cache**: Redis (KEIN Internet-Zugriff)

**Security Benefits**:
- Backend Services können nicht nach außen kommunizieren
- Verhindert Data Exfiltration bei Compromise
- Zero-Trust Network Architecture

## Volumes Configuration

**SSL/TLS Volumes**:
```yaml
certbot-conf:
  driver: local
certbot-www:
  driver: local
certbot-logs:
  driver: local
```

**Application Storage Volumes**:
```yaml
storage-logs:
  driver: local
storage-cache:
  driver: local
storage-queue:
  driver: local
storage-discovery:
  driver: local
storage-uploads:
  driver: local
```

**Database Volume**:
```yaml
db_data:
  driver: local
  # Optional: External volume for backups
  # driver_opts:
  #   type: none
  #   o: bind
  #   device: /mnt/db-backups/michaelschiemer-prod
```

**Volume Best Practices**:
- Alle Volumes sind `driver: local` (nicht Host-Mounts)
- Für Backups: Optional External Volume für Database
- Keine Development-Host-Mounts in Production

## Logging Strategy

**JSON Logging** für alle Services:
```yaml
logging:
  driver: json-file
  options:
    max-size: "10m"  # Service-abhängig
    max-file: "5"    # Service-abhängig
    compress: "true"
    labels: "service,environment"
```

**Log Rotation**:
| Service | Max Size | Max Files | Total Storage |
|---------|----------|-----------|---------------|
| Nginx   | 10MB     | 5         | 50MB          |
| PHP     | 10MB     | 10        | 100MB         |
| PostgreSQL | 20MB  | 10        | 200MB         |
| Redis   | 10MB     | 5         | 50MB          |
| Queue Worker | 20MB | 10      | 200MB         |
| Certbot | 5MB      | 3         | 15MB          |
| **TOTAL** |        |           | **615MB**     |

**Log Aggregation**:
- JSON-Format für ELK Stack (Elasticsearch, Logstash, Kibana)
- Labels für Service-Identifikation
- Komprimierte Log-Files für Storage-Effizienz

## Resource Allocation

**Total Resource Requirements**:

| Service | Memory Limit | Memory Reservation | CPU Limit | CPU Reservation |
|---------|--------------|-------------------|-----------|-----------------|
| Nginx   | 512M         | 256M              | 1.0       | 0.5             |
| PHP     | 1G           | 512M              | 2.0       | 1.0             |
| PostgreSQL | 2G        | 1G                | 2.0       | 1.0             |
| Redis   | 512M         | 256M              | 1.0       | 0.5             |
| Queue Worker (x2) | 4G | 2G              | 4.0       | 2.0             |
| **TOTAL** | **8GB**    | **4GB**           | **10 CPUs** | **5 CPUs**  |

**Server Sizing Recommendations**:
- **Minimum**: 8GB RAM, 4 CPUs (Resource Limits)
- **Recommended**: 16GB RAM, 8 CPUs (Headroom für OS und Spikes)
- **Optimal**: 32GB RAM, 16 CPUs (Production mit Monitoring)

## Health Checks

**Health Check Strategy**:

| Service | Endpoint | Interval | Timeout | Retries | Start Period |
|---------|----------|----------|---------|---------|--------------|
| Nginx   | HTTPS /health | 15s | 5s | 5 | 30s |
| PHP     | php-fpm-healthcheck | 15s | 5s | 5 | 30s |
| PostgreSQL | pg_isready | 10s | 3s | 5 | 30s |
| Redis   | redis-cli ping | 10s | 3s | 5 | 10s |

**Health Check Benefits**:
- Automatische Service-Recovery bei Failures
- Docker orchestriert Neustart nur bei unhealthy Services
- Health-Status via `docker-compose ps`

## Deployment Workflow

### Initial Deployment

```bash
# 1. Server vorbereiten (siehe production-prerequisites.md)

# 2. .env.production konfigurieren (siehe env-production-template.md)

# 3. Build und Deploy
docker-compose -f docker-compose.yml \
               -f docker-compose.production.yml \
               --env-file .env.production \
               up -d --build

# 4. SSL Zertifikate initialisieren
docker exec php php console.php ssl:init

# 5. Database Migrationen
docker exec php php console.php db:migrate

# 6. Health Checks verifizieren
docker-compose -f docker-compose.yml \
               -f docker-compose.production.yml \
               ps
```

### Rolling Update (Zero-Downtime)

```bash
# 1. Neue Version pullen
git pull origin main

# 2. Build neue Images
docker-compose -f docker-compose.yml \
               -f docker-compose.production.yml \
               --env-file .env.production \
               build --no-cache

# 3. Rolling Update (Service für Service)
# Nginx
docker-compose -f docker-compose.yml \
               -f docker-compose.production.yml \
               up -d --no-deps web

# PHP (nach Nginx)
docker-compose -f docker-compose.yml \
               -f docker-compose.production.yml \
               up -d --no-deps php

# Queue Worker (nach PHP)
docker-compose -f docker-compose.yml \
               -f docker-compose.production.yml \
               up -d --no-deps --scale queue-worker=2 queue-worker

# 4. Health Checks verifizieren
docker-compose -f docker-compose.yml \
               -f docker-compose.production.yml \
               ps
```

### Rollback Strategy

```bash
# 1. Previous Git Commit
git log --oneline -5
git checkout <previous-commit>

# 2. Rebuild und Deploy
docker-compose -f docker-compose.yml \
               -f docker-compose.production.yml \
               --env-file .env.production \
               up -d --build

# 3. Database Rollback (wenn nötig)
docker exec php php console.php db:rollback 1
```

## Monitoring

### Container Status

```bash
# Status aller Services
docker-compose -f docker-compose.yml \
               -f docker-compose.production.yml \
               ps

# Detaillierte Informationen
docker inspect <container-name>
```

### Resource Usage

```bash
# CPU/Memory Usage
docker stats

# Service-spezifisch
docker stats php db redis
```

### Logs

```bash
# Alle Logs (Follow)
docker-compose -f docker-compose.yml \
               -f docker-compose.production.yml \
               logs -f

# Service-spezifisch
docker-compose -f docker-compose.yml \
               -f docker-compose.production.yml \
               logs -f php

# Letzte N Zeilen
docker-compose -f docker-compose.yml \
               -f docker-compose.production.yml \
               logs --tail=100 php
```

### Health Check Status

```bash
# Health Check Logs
docker inspect --format='{{json .State.Health}}' php | jq

# Health History
docker inspect --format='{{range .State.Health.Log}}{{.Start}} {{.ExitCode}} {{.Output}}{{end}}' php
```

## Backup Strategy

### Database Backup

```bash
# Manual Backup
docker exec db pg_dump -U postgres michaelschiemer_prod > backup_$(date +%Y%m%d_%H%M%S).sql

# Automated Backup (Cron)
# /etc/cron.daily/postgres-backup
#!/bin/bash
docker exec db pg_dump -U postgres michaelschiemer_prod | gzip > /mnt/backups/michaelschiemer_$(date +%Y%m%d).sql.gz
```

### Volume Backup

```bash
# Backup all volumes
docker run --rm \
  -v michaelschiemer_db_data:/data:ro \
  -v $(pwd)/backups:/backup \
  alpine tar czf /backup/db_data_$(date +%Y%m%d).tar.gz -C /data .
```

## Troubleshooting

### Service Won't Start

```bash
# Check logs
docker-compose -f docker-compose.yml \
               -f docker-compose.production.yml \
               logs <service>

# Check configuration
docker-compose -f docker-compose.yml \
               -f docker-compose.production.yml \
               config
```

### Health Check Failing

```bash
# Manual health check
docker exec php php-fpm-healthcheck
docker exec db pg_isready -U postgres
docker exec redis redis-cli ping

# Check health logs
docker inspect --format='{{json .State.Health}}' <container> | jq
```

### Memory Issues

```bash
# Check memory usage
docker stats

# Increase limits in docker-compose.production.yml
# Then restart service
docker-compose -f docker-compose.yml \
               -f docker-compose.production.yml \
               up -d --no-deps <service>
```

### Network Issues

```bash
# Check networks
docker network ls
docker network inspect michaelschiemer-prod_backend

# Test connectivity
docker exec php ping db
docker exec php nc -zv db 5432
```

## Security Considerations

### 1. Network Isolation
- ✅ Backend network is internal (no internet access)
- ✅ Cache network is internal
- ✅ Only frontend services expose ports

### 2. Volume Security
- ✅ No host mounts (application code in image)
- ✅ Read-only mounts where possible (SSL certificates)
- ✅ Named Docker volumes (managed by Docker)

### 3. Secrets Management
- ✅ Use `.env.production` (not committed to git)
- ✅ Use Vault for sensitive data
- ✅ No secrets in docker-compose files

### 4. Resource Limits
- ✅ All services have memory limits (prevent OOM)
- ✅ CPU limits prevent resource starvation
- ✅ Restart policies for automatic recovery

### 5. Logging
- ✅ JSON logging for security monitoring
- ✅ Log rotation prevents disk exhaustion
- ✅ Compressed logs for storage efficiency

## Best Practices

1. **Always use `.env.production`** - Never commit production secrets
2. **Test updates in staging first** - Use same docker-compose setup
3. **Monitor resource usage** - Adjust limits based on metrics
4. **Regular backups** - Automate database and volume backups
5. **Health checks** - Ensure all services have working health checks
6. **Log aggregation** - Send logs to centralized logging system (ELK)
7. **SSL renewal** - Monitor Certbot logs for renewal issues
8. **Security updates** - Regularly update Docker images

## See Also

- **Prerequisites**: `docs/deployment/production-prerequisites.md`
- **Environment Configuration**: `docs/deployment/env-production-template.md`
- **SSL Setup**: `docs/deployment/ssl-setup.md`
- **Database Migrations**: `docs/deployment/database-migration-strategy.md`
- **Logging Configuration**: `docs/deployment/logging-configuration.md`