Files
michaelschiemer/docs/deployment/troubleshooting-checklist.md

14 KiB

Production Deployment Troubleshooting Checklist

Systematische Problemlösung für häufige Deployment-Issues.

Issue 1: Supervisor Log File Permission Denied

Symptom

PermissionError: [Errno 13] Permission denied: '/var/log/supervisor/supervisord.log'

Container startet nicht, Supervisor kann Logfile nicht schreiben.

Diagnose

docker logs web  # Zeigt Permission Error
docker exec web ls -la /var/log/supervisor/  # Directory existiert nicht oder keine Permissions

Root Cause

  • Supervisor versucht in /var/log/supervisor/supervisord.log zu schreiben
  • Directory existiert nicht oder keine Write-Permissions
  • Auch als root problematisch in containerisierter Umgebung

Lösung 1 (FUNKTIONIERT NICHT)

Versuch: /proc/self/fd/1 verwenden

docker/supervisor/supervisord.conf:

logfile=/proc/self/fd/1

Fehler: PermissionError: [Errno 13] Permission denied: '/proc/self/fd/1'

Grund: Python's logging library (verwendet von Supervisor) kann /proc/self/fd/1 oder /dev/stdout nicht im append-mode öffnen.

Lösung 2 (ERFOLGREICH)

Fix: /dev/null mit silent=false

docker/supervisor/supervisord.conf:

[supervisord]
nodaemon=true
silent=false        # WICHTIG: Logging trotz /dev/null
logfile=/dev/null
logfile_maxbytes=0
pidfile=/var/run/supervisord.pid
loglevel=info

Warum funktioniert das?

  • logfile=/dev/null: Kein File-Logging
  • silent=false: Supervisor loggt nach stdout/stderr
  • Logs erscheinen in docker logs web

Verification

docker logs web
# Output:
# 2025-10-28 16:29:59,976 INFO supervisord started with pid 1
# 2025-10-28 16:30:00,980 INFO spawned: 'nginx' with pid 7
# 2025-10-28 16:30:00,982 INFO spawned: 'php-fpm' with pid 8
# 2025-10-28 16:30:02,077 INFO success: nginx entered RUNNING state
# 2025-10-28 16:30:02,077 INFO success: php-fpm entered RUNNING state
  • docker/supervisor/supervisord.conf
  • Dockerfile.production (COPY supervisord.conf)

Issue 2: Web Container EACCES Errors

Symptom

2025-10-28 16:16:52,152 CRIT could not write pidfile /var/run/supervisord.pid
2025-10-28 16:16:53,154 INFO spawnerr: unknown error making dispatchers for 'nginx': EACCES
2025-10-28 16:16:53,154 INFO spawnerr: unknown error making dispatchers for 'php-fpm': EACCES

Diagnose

# Container User checken
docker exec web whoami
# Falls nicht "root", dann ist das der Issue

# Docker Compose Config checken
docker inspect web | grep -i user
# Zeigt inherited user von base config

Root Cause

  • web service in docker-compose.prod.yml hat kein user: root gesetzt
  • Inherited user: 1000:1000 oder user: www-data von base docker-compose.yml
  • Supervisor benötigt root um nginx/php-fpm master processes zu starten

Lösung

Fix: user: root explizit setzen

docker-compose.prod.yml:

web:
  image: 94.16.110.151:5000/framework:latest
  user: root  # ← HINZUFÜGEN
  # ... rest der config

Auch für php und queue-worker services hinzufügen:

php:
  image: 94.16.110.151:5000/framework:latest
  user: root  # ← HINZUFÜGEN

queue-worker:
  image: 94.16.110.151:5000/framework:latest
  user: root  # ← HINZUFÜGEN

Warum user: root?

  • Container läuft als root: Supervisor master process
  • Nginx master: root (worker processes als www-data via nginx.conf)
  • PHP-FPM master: root (pool workers als www-data via php-fpm.conf)

docker/php/zz-docker.production.conf:

[www]
user = www-data     # ← Worker processes laufen als www-data
group = www-data

Verification

docker exec web whoami
# root

docker exec web ps aux | grep -E 'nginx|php-fpm'
# root       1  supervisord
# root       7  nginx: master process
# www-data  10  nginx: worker process
# root       8  php-fpm: master process
# www-data  11  php-fpm: pool www
  • docker-compose.prod.yml (web, php, queue-worker services)
  • docker/php/zz-docker.production.conf
  • docker/nginx/nginx.production.conf

Issue 3: Docker Entrypoint Override funktioniert nicht

Symptom

Container command zeigt Entrypoint prepended:

docker ps
# COMMAND: "/usr/local/bin/docker-entrypoint.sh /usr/bin/supervisord -c ..."

Supervisor wird nicht direkt gestartet, sondern durch einen wrapper script.

Diagnose

# Container Command checken
docker inspect web --format='{{.Config.Entrypoint}}'
# [/usr/local/bin/docker-entrypoint.sh]

docker inspect web --format='{{.Config.Cmd}}'
# [/usr/bin/supervisord -c /etc/supervisor/conf.d/supervisord.conf]

Root Cause

  1. Base docker-compose.yml hat web service mit separate build:

    web:
      build:
        context: docker/nginx
        dockerfile: Dockerfile
    
  2. Production override setzt image: aber cleared nicht den inherited ENTRYPOINT:

    web:
      image: 94.16.110.151:5000/framework:latest
      command: ["/usr/bin/supervisord", "-c", "/etc/supervisor/conf.d/supervisord.conf"]
    
  3. Base PHP image hat ENTRYPOINT der prepended wird

  4. Docker Compose merge: ENTRYPOINT + CMD = final command

Lösung - Iteration 1 (FUNKTIONIERT NICHT)

Versuch: Nur command: setzen

docker-compose.prod.yml:

web:
  image: 94.16.110.151:5000/framework:latest
  command: ["/usr/bin/supervisord", "-c", "/etc/supervisor/conf.d/supervisord.conf"]

Result: Entrypoint wird trotzdem prepended

Lösung - Iteration 2 (FUNKTIONIERT NICHT)

Versuch: pull_policy: always hinzufügen

docker-compose.prod.yml:

web:
  image: 94.16.110.151:5000/framework:latest
  pull_policy: always  # Force registry pull
  command: ["/usr/bin/supervisord", "-c", "/etc/supervisor/conf.d/supervisord.conf"]

Result: Image wird von Registry gepullt, aber Entrypoint wird trotzdem prepended

Lösung - Iteration 3 (ERFOLGREICH)

Fix: entrypoint: [] explizit clearen

docker-compose.prod.yml:

web:
  image: 94.16.110.151:5000/framework:latest
  pull_policy: always  # Always pull from registry, never build
  entrypoint: []       # ← WICHTIG: Entrypoint clearen
  command: ["/usr/bin/supervisord", "-c", "/etc/supervisor/conf.d/supervisord.conf"]
  user: root

Warum entrypoint: []?

  • Leeres Array cleared den inherited entrypoint komplett
  • command: wird dann direkt als PID 1 gestartet
  • Keine wrapper scripts, keine indirection

Verification

docker inspect web --format='{{.Config.Entrypoint}}'
# []  ← Leer!

docker inspect web --format='{{.Config.Cmd}}'
# [/usr/bin/supervisord -c /etc/supervisor/conf.d/supervisord.conf]

docker exec web ps aux
# PID 1: /usr/bin/supervisord -c /etc/supervisor/conf.d/supervisord.conf
# Kein entrypoint wrapper!
  • docker-compose.prod.yml (web service)

Docker Compose Override Rules

Base Config + Override = Final Config

Base:
  web:
    build: docker/nginx
    → inherited ENTRYPOINT from base image

Override (insufficient):
  web:
    image: 94.16.110.151:5000/framework:latest
    command: [...]
    → ENTRYPOINT still prepended to command

Override (correct):
  web:
    image: 94.16.110.151:5000/framework:latest
    entrypoint: []         ← Clears inherited entrypoint
    command: [...]         ← Runs directly as PID 1

Issue 4: Queue Worker Container Restarts

Symptom

docker ps
# queue-worker   Restarting (1) 5 seconds ago

Container restart loop, nie healthy.

Diagnose

docker logs queue-worker
# Error: /var/www/html/worker.php not found
# oder
# php: command not found

Root Cause

Base docker-compose.yml hat Queue Worker Command für Development:

queue-worker:
  command: ["php", "/var/www/html/worker.php"]

worker.php existiert nicht im Production Image.

Lösung - Option 1: Service deaktivieren

Quick Fix: Queue Worker deaktivieren

docker-compose.prod.yml:

queue-worker:
  deploy:
    replicas: 0  # Disable service

Lösung - Option 2: Richtigen Command setzen

Proper Fix: Console Command verwenden

docker-compose.prod.yml:

queue-worker:
  image: 94.16.110.151:5000/framework:latest
  user: root
  command: ["php", "/var/www/html/console.php", "queue:work"]
  # oder für Supervisor-managed:
  # entrypoint: []
  # command: ["/usr/bin/supervisord", "-c", "/etc/supervisor/conf.d/queue-worker-supervisord.conf"]

Verification

docker logs queue-worker
# [timestamp] INFO Queue worker started
# [timestamp] INFO Processing job: ...
  • docker-compose.yml (base queue-worker definition)
  • docker-compose.prod.yml (production override)
  • console.php (framework console application)

Issue 5: HTTP Port 80 nicht erreichbar

Symptom

curl http://94.16.110.151:8888/
# curl: (7) Failed to connect to 94.16.110.151 port 8888: Connection refused

docker exec web curl http://localhost/
# curl: (7) Failed to connect to localhost port 80: Connection refused

Diagnose

# Nginx listening ports checken
docker exec web netstat -tlnp | grep nginx
# Zeigt nur: 0.0.0.0:443

# Nginx Config checken
docker exec web cat /etc/nginx/http.d/default.conf
# Kein "listen 80;" block

Root Cause - Option 1: Intentional HTTPS-only

Möglicherweise ist HTTP absichtlich disabled (Security Best Practice).

Root Cause - Option 2: Missing HTTP Block

Nginx config hat keinen HTTP listener, nur HTTPS.

Lösung - HTTP→HTTPS Redirect hinzufügen

Fix: HTTP Redirect konfigurieren

docker/nginx/default.production.conf:

# HTTP → HTTPS Redirect
server {
    listen 80;
    server_name _;

    location / {
        return 301 https://$host$request_uri;
    }
}

# HTTPS Server
server {
    listen 443 ssl http2;
    server_name _;

    ssl_certificate /var/www/ssl/cert.pem;
    ssl_certificate_key /var/www/ssl/key.pem;

    root /var/www/html/public;
    index index.php;

    location / {
        try_files $uri $uri/ /index.php?$query_string;
    }

    location ~ \.php$ {
        fastcgi_pass php:9000;
        fastcgi_index index.php;
        include fastcgi_params;
        fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
    }
}

Verification

curl -I http://94.16.110.151:8888/
# HTTP/1.1 301 Moved Permanently
# Location: https://94.16.110.151:8888/

curl -k -I https://94.16.110.151:8443/
# HTTP/2 200
# server: nginx
  • docker/nginx/default.production.conf
  • Dockerfile.production (COPY nginx config)

General Debugging Commands

Container Inspection

# Alle Container Status
docker-compose -f docker-compose.yml -f docker-compose.prod.yml ps

# Container Details
docker inspect web

# Container Logs
docker logs -f web
docker logs --tail 100 web

# Inside Container
docker exec -it web sh
docker exec -it php sh

Supervisor Debugging

# Supervisor Status
docker exec web supervisorctl status

# Supervisor Logs
docker exec web tail -f /dev/null  # Logs gehen nach stdout/stderr

# Supervisor Config testen
docker exec web supervisord -c /etc/supervisor/conf.d/supervisord.conf -n

Nginx Debugging

# Nginx Config testen
docker exec web nginx -t

# Nginx reload
docker exec web nginx -s reload

# Nginx listening ports
docker exec web netstat -tlnp | grep nginx

# Nginx processes
docker exec web ps aux | grep nginx

PHP-FPM Debugging

# PHP-FPM Status
docker exec web curl http://localhost/php-fpm-status

# PHP-FPM Config testen
docker exec web php-fpm -t

# PHP-FPM processes
docker exec web ps aux | grep php-fpm

# PHP Version
docker exec web php -v

# PHP Modules
docker exec web php -m

Network Debugging

# Port listening
docker exec web netstat -tlnp

# DNS resolution
docker exec web nslookup db
docker exec web nslookup redis

# Network connectivity
docker exec web ping db
docker exec web ping redis

# HTTP request
docker exec web curl http://localhost/

Database Debugging

# PostgreSQL Connection
docker exec php php -r "new PDO('pgsql:host=db;dbname=framework_db', 'framework_user', 'password');"

# Database Logs
docker logs db

# Connect to DB
docker exec -it db psql -U framework_user -d framework_db

# Check connections
docker exec db psql -U framework_user -d framework_db -c "SELECT count(*) FROM pg_stat_activity;"

Performance Monitoring

# Container Resource Usage
docker stats

# Disk Usage
docker system df

# Image Sizes
docker images

# Volume Sizes
docker system df -v

Checklist für erfolgreichen Deploy

Pre-Deployment

  • Image gebaut: docker build -f Dockerfile.production -t 94.16.110.151:5000/framework:latest .
  • Image gepusht: docker push 94.16.110.151:5000/framework:latest
  • Registry verfügbar: curl http://94.16.110.151:5000/v2/_catalog
  • WireGuard VPN aktiv: wg show
  • .env.production auf Server aktuell
  • docker-compose.prod.yml auf Server aktuell

Deployment

  • SSH auf Server: ssh deploy@94.16.110.151
  • Image pullen: docker-compose -f docker-compose.yml -f docker-compose.prod.yml pull
  • Stack starten: docker-compose -f docker-compose.yml -f docker-compose.prod.yml up -d

Post-Deployment Verification

  • Container laufen: docker-compose ps zeigt alle "Up (healthy)"
  • Supervisor Status: docker exec web supervisorctl status zeigt nginx/php-fpm RUNNING
  • Nginx lauscht: docker exec web netstat -tlnp | grep :443
  • PHP-FPM lauscht: docker exec web netstat -tlnp | grep :9000
  • Application erreichbar: curl -k -I https://94.16.110.151:8443/ → HTTP/2 200
  • Database erreichbar: docker exec php php -r "new PDO(...);"
  • Redis erreichbar: docker exec php php -r "new Redis()->connect('redis', 6379);"
  • Logs sauber: docker logs web zeigt keine Errors

Monitoring


Quick Reference

Häufigste Fehlerursachen

  1. Supervisor Logging: Verwende logfile=/dev/null + silent=false
  2. User Permissions: Setze user: root in docker-compose.prod.yml
  3. Entrypoint Override: Setze entrypoint: [] um inherited entrypoint zu clearen
  4. Pull Policy: Verwende pull_policy: always um registry image zu forcen

Wichtigste Config-Änderungen

  • docker/supervisor/supervisord.conf: logfile=/dev/null, silent=false
  • docker-compose.prod.yml: user: root, entrypoint: [], pull_policy: always
  • docker/php/zz-docker.production.conf: user = www-data, group = www-data