Files
michaelschiemer/docs/deployment/troubleshooting-checklist.md

582 lines
14 KiB
Markdown

# Production Deployment Troubleshooting Checklist
Systematische Problemlösung für häufige Deployment-Issues.
## Issue 1: Supervisor Log File Permission Denied
### Symptom
```
PermissionError: [Errno 13] Permission denied: '/var/log/supervisor/supervisord.log'
```
Container startet nicht, Supervisor kann Logfile nicht schreiben.
### Diagnose
```bash
docker logs web # Zeigt Permission Error
docker exec web ls -la /var/log/supervisor/ # Directory existiert nicht oder keine Permissions
```
### Root Cause
- Supervisor versucht in `/var/log/supervisor/supervisord.log` zu schreiben
- Directory existiert nicht oder keine Write-Permissions
- Auch als root problematisch in containerisierter Umgebung
### Lösung 1 (FUNKTIONIERT NICHT)
**Versuch**: `/proc/self/fd/1` verwenden
`docker/supervisor/supervisord.conf`:
```ini
logfile=/proc/self/fd/1
```
**Fehler**: `PermissionError: [Errno 13] Permission denied: '/proc/self/fd/1'`
**Grund**: Python's logging library (verwendet von Supervisor) kann `/proc/self/fd/1` oder `/dev/stdout` nicht im append-mode öffnen.
### Lösung 2 (ERFOLGREICH)
**Fix**: `/dev/null` mit `silent=false`
`docker/supervisor/supervisord.conf`:
```ini
[supervisord]
nodaemon=true
silent=false # WICHTIG: Logging trotz /dev/null
logfile=/dev/null
logfile_maxbytes=0
pidfile=/var/run/supervisord.pid
loglevel=info
```
**Warum funktioniert das?**
- `logfile=/dev/null`: Kein File-Logging
- `silent=false`: Supervisor loggt nach stdout/stderr
- Logs erscheinen in `docker logs web`
### Verification
```bash
docker logs web
# Output:
# 2025-10-28 16:29:59,976 INFO supervisord started with pid 1
# 2025-10-28 16:30:00,980 INFO spawned: 'nginx' with pid 7
# 2025-10-28 16:30:00,982 INFO spawned: 'php-fpm' with pid 8
# 2025-10-28 16:30:02,077 INFO success: nginx entered RUNNING state
# 2025-10-28 16:30:02,077 INFO success: php-fpm entered RUNNING state
```
### Related Files
- `docker/supervisor/supervisord.conf`
- `Dockerfile.production` (COPY supervisord.conf)
---
## Issue 2: Web Container EACCES Errors
### Symptom
```
2025-10-28 16:16:52,152 CRIT could not write pidfile /var/run/supervisord.pid
2025-10-28 16:16:53,154 INFO spawnerr: unknown error making dispatchers for 'nginx': EACCES
2025-10-28 16:16:53,154 INFO spawnerr: unknown error making dispatchers for 'php-fpm': EACCES
```
### Diagnose
```bash
# Container User checken
docker exec web whoami
# Falls nicht "root", dann ist das der Issue
# Docker Compose Config checken
docker inspect web | grep -i user
# Zeigt inherited user von base config
```
### Root Cause
- `web` service in `docker-compose.prod.yml` hat **kein** `user: root` gesetzt
- Inherited `user: 1000:1000` oder `user: www-data` von base `docker-compose.yml`
- Supervisor benötigt root um nginx/php-fpm master processes zu starten
### Lösung
**Fix**: `user: root` explizit setzen
`docker-compose.prod.yml`:
```yaml
web:
image: 94.16.110.151:5000/framework:latest
user: root # ← HINZUFÜGEN
# ... rest der config
```
Auch für `php` und `queue-worker` services hinzufügen:
```yaml
php:
image: 94.16.110.151:5000/framework:latest
user: root # ← HINZUFÜGEN
queue-worker:
image: 94.16.110.151:5000/framework:latest
user: root # ← HINZUFÜGEN
```
### Warum user: root?
- **Container läuft als root**: Supervisor master process
- **Nginx master**: root (worker processes als www-data via nginx.conf)
- **PHP-FPM master**: root (pool workers als www-data via php-fpm.conf)
`docker/php/zz-docker.production.conf`:
```ini
[www]
user = www-data # ← Worker processes laufen als www-data
group = www-data
```
### Verification
```bash
docker exec web whoami
# root
docker exec web ps aux | grep -E 'nginx|php-fpm'
# root 1 supervisord
# root 7 nginx: master process
# www-data 10 nginx: worker process
# root 8 php-fpm: master process
# www-data 11 php-fpm: pool www
```
### Related Files
- `docker-compose.prod.yml` (web, php, queue-worker services)
- `docker/php/zz-docker.production.conf`
- `docker/nginx/nginx.production.conf`
---
## Issue 3: Docker Entrypoint Override funktioniert nicht
### Symptom
Container command zeigt Entrypoint prepended:
```bash
docker ps
# COMMAND: "/usr/local/bin/docker-entrypoint.sh /usr/bin/supervisord -c ..."
```
Supervisor wird nicht direkt gestartet, sondern durch einen wrapper script.
### Diagnose
```bash
# Container Command checken
docker inspect web --format='{{.Config.Entrypoint}}'
# [/usr/local/bin/docker-entrypoint.sh]
docker inspect web --format='{{.Config.Cmd}}'
# [/usr/bin/supervisord -c /etc/supervisor/conf.d/supervisord.conf]
```
### Root Cause
1. Base `docker-compose.yml` hat `web` service mit separate build:
```yaml
web:
build:
context: docker/nginx
dockerfile: Dockerfile
```
2. Production override setzt `image:` aber cleared **nicht** den inherited ENTRYPOINT:
```yaml
web:
image: 94.16.110.151:5000/framework:latest
command: ["/usr/bin/supervisord", "-c", "/etc/supervisor/conf.d/supervisord.conf"]
```
3. Base PHP image hat ENTRYPOINT der prepended wird
4. Docker Compose merge: ENTRYPOINT + CMD = final command
### Lösung - Iteration 1 (FUNKTIONIERT NICHT)
❌ **Versuch**: Nur `command:` setzen
`docker-compose.prod.yml`:
```yaml
web:
image: 94.16.110.151:5000/framework:latest
command: ["/usr/bin/supervisord", "-c", "/etc/supervisor/conf.d/supervisord.conf"]
```
**Result**: Entrypoint wird trotzdem prepended
### Lösung - Iteration 2 (FUNKTIONIERT NICHT)
❌ **Versuch**: `pull_policy: always` hinzufügen
`docker-compose.prod.yml`:
```yaml
web:
image: 94.16.110.151:5000/framework:latest
pull_policy: always # Force registry pull
command: ["/usr/bin/supervisord", "-c", "/etc/supervisor/conf.d/supervisord.conf"]
```
**Result**: Image wird von Registry gepullt, aber Entrypoint wird trotzdem prepended
### Lösung - Iteration 3 (ERFOLGREICH)
✅ **Fix**: `entrypoint: []` explizit clearen
`docker-compose.prod.yml`:
```yaml
web:
image: 94.16.110.151:5000/framework:latest
pull_policy: always # Always pull from registry, never build
entrypoint: [] # ← WICHTIG: Entrypoint clearen
command: ["/usr/bin/supervisord", "-c", "/etc/supervisor/conf.d/supervisord.conf"]
user: root
```
**Warum `entrypoint: []`?**
- Leeres Array cleared den inherited entrypoint komplett
- `command:` wird dann direkt als PID 1 gestartet
- Keine wrapper scripts, keine indirection
### Verification
```bash
docker inspect web --format='{{.Config.Entrypoint}}'
# [] ← Leer!
docker inspect web --format='{{.Config.Cmd}}'
# [/usr/bin/supervisord -c /etc/supervisor/conf.d/supervisord.conf]
docker exec web ps aux
# PID 1: /usr/bin/supervisord -c /etc/supervisor/conf.d/supervisord.conf
# Kein entrypoint wrapper!
```
### Related Files
- `docker-compose.prod.yml` (web service)
### Docker Compose Override Rules
```
Base Config + Override = Final Config
Base:
web:
build: docker/nginx
→ inherited ENTRYPOINT from base image
Override (insufficient):
web:
image: 94.16.110.151:5000/framework:latest
command: [...]
→ ENTRYPOINT still prepended to command
Override (correct):
web:
image: 94.16.110.151:5000/framework:latest
entrypoint: [] ← Clears inherited entrypoint
command: [...] ← Runs directly as PID 1
```
---
## Issue 4: Queue Worker Container Restarts
### Symptom
```bash
docker ps
# queue-worker Restarting (1) 5 seconds ago
```
Container restart loop, nie healthy.
### Diagnose
```bash
docker logs queue-worker
# Error: /var/www/html/worker.php not found
# oder
# php: command not found
```
### Root Cause
Base `docker-compose.yml` hat Queue Worker Command für Development:
```yaml
queue-worker:
command: ["php", "/var/www/html/worker.php"]
```
`worker.php` existiert nicht im Production Image.
### Lösung - Option 1: Service deaktivieren
✅ **Quick Fix**: Queue Worker deaktivieren
`docker-compose.prod.yml`:
```yaml
queue-worker:
deploy:
replicas: 0 # Disable service
```
### Lösung - Option 2: Richtigen Command setzen
✅ **Proper Fix**: Console Command verwenden
`docker-compose.prod.yml`:
```yaml
queue-worker:
image: 94.16.110.151:5000/framework:latest
user: root
command: ["php", "/var/www/html/console.php", "queue:work"]
# oder für Supervisor-managed:
# entrypoint: []
# command: ["/usr/bin/supervisord", "-c", "/etc/supervisor/conf.d/queue-worker-supervisord.conf"]
```
### Verification
```bash
docker logs queue-worker
# [timestamp] INFO Queue worker started
# [timestamp] INFO Processing job: ...
```
### Related Files
- `docker-compose.yml` (base queue-worker definition)
- `docker-compose.prod.yml` (production override)
- `console.php` (framework console application)
---
## Issue 5: HTTP Port 80 nicht erreichbar
### Symptom
```bash
curl http://94.16.110.151:8888/
# curl: (7) Failed to connect to 94.16.110.151 port 8888: Connection refused
docker exec web curl http://localhost/
# curl: (7) Failed to connect to localhost port 80: Connection refused
```
### Diagnose
```bash
# Nginx listening ports checken
docker exec web netstat -tlnp | grep nginx
# Zeigt nur: 0.0.0.0:443
# Nginx Config checken
docker exec web cat /etc/nginx/http.d/default.conf
# Kein "listen 80;" block
```
### Root Cause - Option 1: Intentional HTTPS-only
Möglicherweise ist HTTP absichtlich disabled (Security Best Practice).
### Root Cause - Option 2: Missing HTTP Block
Nginx config hat keinen HTTP listener, nur HTTPS.
### Lösung - HTTP→HTTPS Redirect hinzufügen
✅ **Fix**: HTTP Redirect konfigurieren
`docker/nginx/default.production.conf`:
```nginx
# HTTP → HTTPS Redirect
server {
listen 80;
server_name _;
location / {
return 301 https://$host$request_uri;
}
}
# HTTPS Server
server {
listen 443 ssl http2;
server_name _;
ssl_certificate /var/www/ssl/cert.pem;
ssl_certificate_key /var/www/ssl/key.pem;
root /var/www/html/public;
index index.php;
location / {
try_files $uri $uri/ /index.php?$query_string;
}
location ~ \.php$ {
fastcgi_pass php:9000;
fastcgi_index index.php;
include fastcgi_params;
fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
}
}
```
### Verification
```bash
curl -I http://94.16.110.151:8888/
# HTTP/1.1 301 Moved Permanently
# Location: https://94.16.110.151:8888/
curl -k -I https://94.16.110.151:8443/
# HTTP/2 200
# server: nginx
```
### Related Files
- `docker/nginx/default.production.conf`
- `Dockerfile.production` (COPY nginx config)
---
## General Debugging Commands
### Container Inspection
```bash
# Alle Container Status
docker-compose -f docker-compose.yml -f docker-compose.prod.yml ps
# Container Details
docker inspect web
# Container Logs
docker logs -f web
docker logs --tail 100 web
# Inside Container
docker exec -it web sh
docker exec -it php sh
```
### Supervisor Debugging
```bash
# Supervisor Status
docker exec web supervisorctl status
# Supervisor Logs
docker exec web tail -f /dev/null # Logs gehen nach stdout/stderr
# Supervisor Config testen
docker exec web supervisord -c /etc/supervisor/conf.d/supervisord.conf -n
```
### Nginx Debugging
```bash
# Nginx Config testen
docker exec web nginx -t
# Nginx reload
docker exec web nginx -s reload
# Nginx listening ports
docker exec web netstat -tlnp | grep nginx
# Nginx processes
docker exec web ps aux | grep nginx
```
### PHP-FPM Debugging
```bash
# PHP-FPM Status
docker exec web curl http://localhost/php-fpm-status
# PHP-FPM Config testen
docker exec web php-fpm -t
# PHP-FPM processes
docker exec web ps aux | grep php-fpm
# PHP Version
docker exec web php -v
# PHP Modules
docker exec web php -m
```
### Network Debugging
```bash
# Port listening
docker exec web netstat -tlnp
# DNS resolution
docker exec web nslookup db
docker exec web nslookup redis
# Network connectivity
docker exec web ping db
docker exec web ping redis
# HTTP request
docker exec web curl http://localhost/
```
### Database Debugging
```bash
# PostgreSQL Connection
docker exec php php -r "new PDO('pgsql:host=db;dbname=framework_db', 'framework_user', 'password');"
# Database Logs
docker logs db
# Connect to DB
docker exec -it db psql -U framework_user -d framework_db
# Check connections
docker exec db psql -U framework_user -d framework_db -c "SELECT count(*) FROM pg_stat_activity;"
```
### Performance Monitoring
```bash
# Container Resource Usage
docker stats
# Disk Usage
docker system df
# Image Sizes
docker images
# Volume Sizes
docker system df -v
```
---
## Checklist für erfolgreichen Deploy
### Pre-Deployment
- [ ] Image gebaut: `docker build -f Dockerfile.production -t 94.16.110.151:5000/framework:latest .`
- [ ] Image gepusht: `docker push 94.16.110.151:5000/framework:latest`
- [ ] Registry verfügbar: `curl http://94.16.110.151:5000/v2/_catalog`
- [ ] WireGuard VPN aktiv: `wg show`
- [ ] `.env.production` auf Server aktuell
- [ ] `docker-compose.prod.yml` auf Server aktuell
### Deployment
- [ ] SSH auf Server: `ssh deploy@94.16.110.151`
- [ ] Image pullen: `docker-compose -f docker-compose.yml -f docker-compose.prod.yml pull`
- [ ] Stack starten: `docker-compose -f docker-compose.yml -f docker-compose.prod.yml up -d`
### Post-Deployment Verification
- [ ] Container laufen: `docker-compose ps` zeigt alle "Up (healthy)"
- [ ] Supervisor Status: `docker exec web supervisorctl status` zeigt nginx/php-fpm RUNNING
- [ ] Nginx lauscht: `docker exec web netstat -tlnp | grep :443`
- [ ] PHP-FPM lauscht: `docker exec web netstat -tlnp | grep :9000`
- [ ] Application erreichbar: `curl -k -I https://94.16.110.151:8443/` → HTTP/2 200
- [ ] Database erreichbar: `docker exec php php -r "new PDO(...);"`
- [ ] Redis erreichbar: `docker exec php php -r "new Redis()->connect('redis', 6379);"`
- [ ] Logs sauber: `docker logs web` zeigt keine Errors
### Monitoring
- [ ] Prometheus: http://10.8.0.1:9090 erreichbar
- [ ] Grafana: http://10.8.0.1:3000 erreichbar
- [ ] Portainer: https://10.8.0.1:9443 erreichbar
- [ ] Watchtower aktiv: `docker logs watchtower` zeigt Checks
---
## Quick Reference
### Häufigste Fehlerursachen
1. **Supervisor Logging**: Verwende `logfile=/dev/null` + `silent=false`
2. **User Permissions**: Setze `user: root` in docker-compose.prod.yml
3. **Entrypoint Override**: Setze `entrypoint: []` um inherited entrypoint zu clearen
4. **Pull Policy**: Verwende `pull_policy: always` um registry image zu forcen
### Wichtigste Config-Änderungen
- `docker/supervisor/supervisord.conf`: `logfile=/dev/null`, `silent=false`
- `docker-compose.prod.yml`: `user: root`, `entrypoint: []`, `pull_policy: always`
- `docker/php/zz-docker.production.conf`: `user = www-data`, `group = www-data`