8.6 KiB
8.6 KiB
Docker Swarm + Traefik Deployment Guide
Production deployment guide for the Custom PHP Framework using Docker Swarm orchestration with Traefik load balancer.
Architecture Overview
Internet → Traefik (SSL Termination, Load Balancing)
↓
[Web Service - 3 Replicas]
↓ ↓
Database Redis Queue Workers
(PostgreSQL) (Cache/Sessions) (2 Replicas)
Key Components:
- Traefik v2.10: Reverse proxy, SSL termination, automatic service discovery
- Web Service: 3 replicas of PHP-FPM + Nginx (HTTP only, Traefik handles HTTPS)
- PostgreSQL 16: Single instance database (manager node)
- Redis 7: Sessions and cache (manager node)
- Queue Workers: 2 replicas for background job processing
- Docker Swarm: Native container orchestration with rolling updates and health checks
Prerequisites
- Docker Engine 28.0+ with Swarm mode enabled
- Production Server with SSH access
- SSL Certificates in
./ssl/directory (cert.pem, key.pem) - Environment Variables in
.envfile on production server - Docker Image built and available
Initial Setup
1. Initialize Docker Swarm
On production server:
docker swarm init
Verify:
docker node ls
# Should show 1 node as Leader
2. Create Docker Secrets
Create secrets from .env file values:
cd /home/deploy/framework
# Create secrets (one-time setup)
echo "$DB_PASSWORD" | docker secret create db_password -
echo "$APP_KEY" | docker secret create app_key -
echo "$VAULT_ENCRYPTION_KEY" | docker secret create vault_encryption_key -
echo "$SHOPIFY_WEBHOOK_SECRET" | docker secret create shopify_webhook_secret -
echo "$RAPIDMAIL_PASSWORD" | docker secret create rapidmail_password -
Or use the automated script:
./scripts/setup-production-secrets.sh
Verify secrets:
docker secret ls
3. Build and Transfer Docker Image
On local machine:
Option A: Via Private Registry (if available):
# Build image
docker build -f Dockerfile.production -t 94.16.110.151:5000/framework:latest .
# Push to registry
docker push 94.16.110.151:5000/framework:latest
Option B: Direct Transfer via SSH (recommended for now):
# Build image
docker build -f Dockerfile.production -t 94.16.110.151:5000/framework:latest .
# Save and transfer to production
docker save 94.16.110.151:5000/framework:latest | \
ssh -i ~/.ssh/production deploy@94.16.110.151 'docker load'
4. Deploy Stack
On production server:
cd /home/deploy/framework
# Deploy the stack
docker stack deploy -c docker-compose.prod.yml framework
# Monitor deployment
watch docker stack ps framework
# Check service status
docker stack services framework
Health Monitoring
Check Service Status
# List all services
docker stack services framework
# Check specific service
docker service ps framework_web
# View service logs
docker service logs framework_web -f
docker service logs framework_traefik -f
docker service logs framework_db -f
Health Check Endpoints
- Main Health: http://localhost/health (via Traefik)
- Traefik Dashboard: http://traefik.localhost:8080 (manager node only)
Expected Service Replicas
| Service | Replicas | Purpose |
|---|---|---|
| traefik | 1 | Reverse proxy + SSL |
| web | 3 | Application servers |
| db | 1 | PostgreSQL database |
| redis | 1 | Cache + sessions |
| queue-worker | 2 | Background jobs |
Rolling Updates
Update Application
- Build new image with updated code:
docker build -f Dockerfile.production -t 94.16.110.151:5000/framework:latest .
- Transfer to production (if no registry):
docker save 94.16.110.151:5000/framework:latest | \
ssh -i ~/.ssh/production deploy@94.16.110.151 'docker load'
- Update the service:
# On production server
docker service update --image 94.16.110.151:5000/framework:latest framework_web
The update will:
- Roll out to 1 container at a time (
parallelism: 1) - Wait 10 seconds between updates (
delay: 10s) - Start new container before stopping old one (
order: start-first) - Automatically rollback on failure (
failure_action: rollback)
Monitor Update Progress
# Watch update status
watch docker service ps framework_web
# View update logs
docker service logs framework_web -f --tail 50
Manual Rollback
If needed, rollback to previous version:
docker service rollback framework_web
Troubleshooting
Service Won't Start
Check service logs:
docker service logs framework_web --tail 100
Check task failures:
docker service ps framework_web --no-trunc
Container Crashing
Inspect individual container:
# Get container ID
docker ps -a | grep framework_web
# View logs
docker logs <container_id>
# Exec into running container
docker exec -it <container_id> bash
SSL/TLS Issues
Traefik handles SSL termination. Check Traefik logs:
docker service logs framework_traefik -f
Verify SSL certificates are mounted in docker-compose.prod.yml:
volumes:
- ./ssl:/ssl:ro
Database Connection Issues
Check PostgreSQL health:
docker service logs framework_db --tail 50
# Exec into db container
docker exec -it $(docker ps -q -f name=framework_db) psql -U postgres -d framework_prod
Redis Connection Issues
Check Redis availability:
docker service logs framework_redis --tail 50
# Test Redis connection
docker exec -it $(docker ps -q -f name=framework_redis) redis-cli ping
Performance Issues
Check resource usage:
# Service resource limits
docker service inspect framework_web --format='{{json .Spec.TaskTemplate.Resources}}' | jq
# Container stats
docker stats
Scaling
Scale Web Service
# Scale up to 5 replicas
docker service scale framework_web=5
# Scale down to 2 replicas
docker service scale framework_web=2
Scale Queue Workers
# Scale workers based on queue backlog
docker service scale framework_queue-worker=4
Cleanup
Remove Stack
# Remove entire stack
docker stack rm framework
# Verify removal
docker stack ls
Remove Secrets
# List secrets
docker secret ls
# Remove specific secret
docker secret rm db_password
# Remove all framework secrets
docker secret ls | grep -E "db_password|app_key|vault_encryption_key" | awk '{print $2}' | xargs docker secret rm
Leave Swarm
# Force leave Swarm (removes all services and secrets)
docker swarm leave --force
Network Architecture
Overlay Networks
- traefik-public: External network for Traefik ↔ Web communication
- backend: Internal network for Web ↔ Database/Redis communication
Port Mappings
| Port | Service | Purpose |
|---|---|---|
| 80 | Traefik | HTTP (redirects to 443) |
| 443 | Traefik | HTTPS (production traffic) |
| 8080 | Traefik | Dashboard (manager node only) |
Volume Management
Named Volumes
| Volume | Purpose | Mounted In |
|---|---|---|
| traefik-logs | Traefik access logs | traefik |
| storage-logs | Application logs | web, queue-worker |
| storage-uploads | User uploads | web |
| storage-queue | Queue data | queue-worker |
| db-data | PostgreSQL data | db |
| redis-data | Redis persistence | redis |
Backup Volumes
# Backup database
docker exec $(docker ps -q -f name=framework_db) pg_dump -U postgres framework_prod > backup.sql
# Backup Redis (if persistence enabled)
docker exec $(docker ps -q -f name=framework_redis) redis-cli --rdb /data/dump.rdb
Security Best Practices
- Secrets Management: Never commit secrets to version control, use Docker Secrets
- Network Isolation: Backend network is internal-only, no external access
- SSL/TLS: Traefik enforces HTTPS, redirects HTTP → HTTPS
- Health Checks: All services have health checks with automatic restart
- Resource Limits: Production services have memory/CPU limits
- Least Privilege: Containers run as www-data (not root) where possible
Phase 2 - Monitoring (Coming Soon)
- Prometheus for metrics collection
- Grafana dashboards
- Automated PostgreSQL backups
- Email/Slack alerting
Phase 3 - CI/CD (Coming Soon)
- Gitea Actions workflow
- Loki + Promtail for log aggregation
- Performance tuning
Phase 4 - High Availability (Future)
- Multi-node Swarm cluster
- Varnish CDN cache layer
- PostgreSQL Primary/Replica with pgpool
- MinIO object storage