Files
michaelschiemer/docs/deployment/docker-swarm-deployment.md

8.6 KiB

Docker Swarm + Traefik Deployment Guide

Production deployment guide for the Custom PHP Framework using Docker Swarm orchestration with Traefik load balancer.

Architecture Overview

Internet → Traefik (SSL Termination, Load Balancing)
            ↓
    [Web Service - 3 Replicas]
       ↓         ↓
    Database   Redis   Queue Workers
    (PostgreSQL) (Cache/Sessions) (2 Replicas)

Key Components:

  • Traefik v2.10: Reverse proxy, SSL termination, automatic service discovery
  • Web Service: 3 replicas of PHP-FPM + Nginx (HTTP only, Traefik handles HTTPS)
  • PostgreSQL 16: Single instance database (manager node)
  • Redis 7: Sessions and cache (manager node)
  • Queue Workers: 2 replicas for background job processing
  • Docker Swarm: Native container orchestration with rolling updates and health checks

Prerequisites

  1. Docker Engine 28.0+ with Swarm mode enabled
  2. Production Server with SSH access
  3. SSL Certificates in ./ssl/ directory (cert.pem, key.pem)
  4. Environment Variables in .env file on production server
  5. Docker Image built and available

Initial Setup

1. Initialize Docker Swarm

On production server:

docker swarm init

Verify:

docker node ls
# Should show 1 node as Leader

2. Create Docker Secrets

Create secrets from .env file values:

cd /home/deploy/framework

# Create secrets (one-time setup)
echo "$DB_PASSWORD" | docker secret create db_password -
echo "$APP_KEY" | docker secret create app_key -
echo "$VAULT_ENCRYPTION_KEY" | docker secret create vault_encryption_key -
echo "$SHOPIFY_WEBHOOK_SECRET" | docker secret create shopify_webhook_secret -
echo "$RAPIDMAIL_PASSWORD" | docker secret create rapidmail_password -

Or use the automated script:

./scripts/setup-production-secrets.sh

Verify secrets:

docker secret ls

3. Build and Transfer Docker Image

On local machine:

Option A: Via Private Registry (if available):

# Build image
docker build -f Dockerfile.production -t 94.16.110.151:5000/framework:latest .

# Push to registry
docker push 94.16.110.151:5000/framework:latest

Option B: Direct Transfer via SSH (recommended for now):

# Build image
docker build -f Dockerfile.production -t 94.16.110.151:5000/framework:latest .

# Save and transfer to production
docker save 94.16.110.151:5000/framework:latest | \
    ssh -i ~/.ssh/production deploy@94.16.110.151 'docker load'

4. Deploy Stack

On production server:

cd /home/deploy/framework

# Deploy the stack
docker stack deploy -c docker-compose.prod.yml framework

# Monitor deployment
watch docker stack ps framework

# Check service status
docker stack services framework

Health Monitoring

Check Service Status

# List all services
docker stack services framework

# Check specific service
docker service ps framework_web

# View service logs
docker service logs framework_web -f
docker service logs framework_traefik -f
docker service logs framework_db -f

Health Check Endpoints

Expected Service Replicas

Service Replicas Purpose
traefik 1 Reverse proxy + SSL
web 3 Application servers
db 1 PostgreSQL database
redis 1 Cache + sessions
queue-worker 2 Background jobs

Rolling Updates

Update Application

  1. Build new image with updated code:
docker build -f Dockerfile.production -t 94.16.110.151:5000/framework:latest .
  1. Transfer to production (if no registry):
docker save 94.16.110.151:5000/framework:latest | \
    ssh -i ~/.ssh/production deploy@94.16.110.151 'docker load'
  1. Update the service:
# On production server
docker service update --image 94.16.110.151:5000/framework:latest framework_web

The update will:

  • Roll out to 1 container at a time (parallelism: 1)
  • Wait 10 seconds between updates (delay: 10s)
  • Start new container before stopping old one (order: start-first)
  • Automatically rollback on failure (failure_action: rollback)

Monitor Update Progress

# Watch update status
watch docker service ps framework_web

# View update logs
docker service logs framework_web -f --tail 50

Manual Rollback

If needed, rollback to previous version:

docker service rollback framework_web

Troubleshooting

Service Won't Start

Check service logs:

docker service logs framework_web --tail 100

Check task failures:

docker service ps framework_web --no-trunc

Container Crashing

Inspect individual container:

# Get container ID
docker ps -a | grep framework_web

# View logs
docker logs <container_id>

# Exec into running container
docker exec -it <container_id> bash

SSL/TLS Issues

Traefik handles SSL termination. Check Traefik logs:

docker service logs framework_traefik -f

Verify SSL certificates are mounted in docker-compose.prod.yml:

volumes:
  - ./ssl:/ssl:ro

Database Connection Issues

Check PostgreSQL health:

docker service logs framework_db --tail 50

# Exec into db container
docker exec -it $(docker ps -q -f name=framework_db) psql -U postgres -d framework_prod

Redis Connection Issues

Check Redis availability:

docker service logs framework_redis --tail 50

# Test Redis connection
docker exec -it $(docker ps -q -f name=framework_redis) redis-cli ping

Performance Issues

Check resource usage:

# Service resource limits
docker service inspect framework_web --format='{{json .Spec.TaskTemplate.Resources}}' | jq

# Container stats
docker stats

Scaling

Scale Web Service

# Scale up to 5 replicas
docker service scale framework_web=5

# Scale down to 2 replicas
docker service scale framework_web=2

Scale Queue Workers

# Scale workers based on queue backlog
docker service scale framework_queue-worker=4

Cleanup

Remove Stack

# Remove entire stack
docker stack rm framework

# Verify removal
docker stack ls

Remove Secrets

# List secrets
docker secret ls

# Remove specific secret
docker secret rm db_password

# Remove all framework secrets
docker secret ls | grep -E "db_password|app_key|vault_encryption_key" | awk '{print $2}' | xargs docker secret rm

Leave Swarm

# Force leave Swarm (removes all services and secrets)
docker swarm leave --force

Network Architecture

Overlay Networks

  • traefik-public: External network for Traefik ↔ Web communication
  • backend: Internal network for Web ↔ Database/Redis communication

Port Mappings

Port Service Purpose
80 Traefik HTTP (redirects to 443)
443 Traefik HTTPS (production traffic)
8080 Traefik Dashboard (manager node only)

Volume Management

Named Volumes

Volume Purpose Mounted In
traefik-logs Traefik access logs traefik
storage-logs Application logs web, queue-worker
storage-uploads User uploads web
storage-queue Queue data queue-worker
db-data PostgreSQL data db
redis-data Redis persistence redis

Backup Volumes

# Backup database
docker exec $(docker ps -q -f name=framework_db) pg_dump -U postgres framework_prod > backup.sql

# Backup Redis (if persistence enabled)
docker exec $(docker ps -q -f name=framework_redis) redis-cli --rdb /data/dump.rdb

Security Best Practices

  1. Secrets Management: Never commit secrets to version control, use Docker Secrets
  2. Network Isolation: Backend network is internal-only, no external access
  3. SSL/TLS: Traefik enforces HTTPS, redirects HTTP → HTTPS
  4. Health Checks: All services have health checks with automatic restart
  5. Resource Limits: Production services have memory/CPU limits
  6. Least Privilege: Containers run as www-data (not root) where possible

Phase 2 - Monitoring (Coming Soon)

  • Prometheus for metrics collection
  • Grafana dashboards
  • Automated PostgreSQL backups
  • Email/Slack alerting

Phase 3 - CI/CD (Coming Soon)

  • Gitea Actions workflow
  • Loki + Promtail for log aggregation
  • Performance tuning

Phase 4 - High Availability (Future)

  • Multi-node Swarm cluster
  • Varnish CDN cache layer
  • PostgreSQL Primary/Replica with pgpool
  • MinIO object storage

References