feat: CI/CD pipeline setup complete - Ansible playbooks updated, secrets configured, workflow ready
This commit is contained in:
381
docs/deployment/docker-swarm-deployment.md
Normal file
381
docs/deployment/docker-swarm-deployment.md
Normal file
@@ -0,0 +1,381 @@
|
||||
# Docker Swarm + Traefik Deployment Guide
|
||||
|
||||
Production deployment guide for the Custom PHP Framework using Docker Swarm orchestration with Traefik load balancer.
|
||||
|
||||
## Architecture Overview
|
||||
|
||||
```
|
||||
Internet → Traefik (SSL Termination, Load Balancing)
|
||||
↓
|
||||
[Web Service - 3 Replicas]
|
||||
↓ ↓
|
||||
Database Redis Queue Workers
|
||||
(PostgreSQL) (Cache/Sessions) (2 Replicas)
|
||||
```
|
||||
|
||||
**Key Components**:
|
||||
- **Traefik v2.10**: Reverse proxy, SSL termination, automatic service discovery
|
||||
- **Web Service**: 3 replicas of PHP-FPM + Nginx (HTTP only, Traefik handles HTTPS)
|
||||
- **PostgreSQL 16**: Single instance database (manager node)
|
||||
- **Redis 7**: Sessions and cache (manager node)
|
||||
- **Queue Workers**: 2 replicas for background job processing
|
||||
- **Docker Swarm**: Native container orchestration with rolling updates and health checks
|
||||
|
||||
## Prerequisites
|
||||
|
||||
1. **Docker Engine 28.0+** with Swarm mode enabled
|
||||
2. **Production Server** with SSH access
|
||||
3. **SSL Certificates** in `./ssl/` directory (cert.pem, key.pem)
|
||||
4. **Environment Variables** in `.env` file on production server
|
||||
5. **Docker Image** built and available
|
||||
|
||||
## Initial Setup
|
||||
|
||||
### 1. Initialize Docker Swarm
|
||||
|
||||
On production server:
|
||||
```bash
|
||||
docker swarm init
|
||||
```
|
||||
|
||||
Verify:
|
||||
```bash
|
||||
docker node ls
|
||||
# Should show 1 node as Leader
|
||||
```
|
||||
|
||||
### 2. Create Docker Secrets
|
||||
|
||||
Create secrets from .env file values:
|
||||
|
||||
```bash
|
||||
cd /home/deploy/framework
|
||||
|
||||
# Create secrets (one-time setup)
|
||||
echo "$DB_PASSWORD" | docker secret create db_password -
|
||||
echo "$APP_KEY" | docker secret create app_key -
|
||||
echo "$VAULT_ENCRYPTION_KEY" | docker secret create vault_encryption_key -
|
||||
echo "$SHOPIFY_WEBHOOK_SECRET" | docker secret create shopify_webhook_secret -
|
||||
echo "$RAPIDMAIL_PASSWORD" | docker secret create rapidmail_password -
|
||||
```
|
||||
|
||||
Or use the automated script:
|
||||
```bash
|
||||
./scripts/setup-production-secrets.sh
|
||||
```
|
||||
|
||||
Verify secrets:
|
||||
```bash
|
||||
docker secret ls
|
||||
```
|
||||
|
||||
### 3. Build and Transfer Docker Image
|
||||
|
||||
On local machine:
|
||||
|
||||
**Option A: Via Private Registry** (if available):
|
||||
```bash
|
||||
# Build image
|
||||
docker build -f Dockerfile.production -t 94.16.110.151:5000/framework:latest .
|
||||
|
||||
# Push to registry
|
||||
docker push 94.16.110.151:5000/framework:latest
|
||||
```
|
||||
|
||||
**Option B: Direct Transfer via SSH** (recommended for now):
|
||||
```bash
|
||||
# Build image
|
||||
docker build -f Dockerfile.production -t 94.16.110.151:5000/framework:latest .
|
||||
|
||||
# Save and transfer to production
|
||||
docker save 94.16.110.151:5000/framework:latest | \
|
||||
ssh -i ~/.ssh/production deploy@94.16.110.151 'docker load'
|
||||
```
|
||||
|
||||
### 4. Deploy Stack
|
||||
|
||||
On production server:
|
||||
```bash
|
||||
cd /home/deploy/framework
|
||||
|
||||
# Deploy the stack
|
||||
docker stack deploy -c docker-compose.prod.yml framework
|
||||
|
||||
# Monitor deployment
|
||||
watch docker stack ps framework
|
||||
|
||||
# Check service status
|
||||
docker stack services framework
|
||||
```
|
||||
|
||||
## Health Monitoring
|
||||
|
||||
### Check Service Status
|
||||
|
||||
```bash
|
||||
# List all services
|
||||
docker stack services framework
|
||||
|
||||
# Check specific service
|
||||
docker service ps framework_web
|
||||
|
||||
# View service logs
|
||||
docker service logs framework_web -f
|
||||
docker service logs framework_traefik -f
|
||||
docker service logs framework_db -f
|
||||
```
|
||||
|
||||
### Health Check Endpoints
|
||||
|
||||
- **Main Health**: http://localhost/health (via Traefik)
|
||||
- **Traefik Dashboard**: http://traefik.localhost:8080 (manager node only)
|
||||
|
||||
### Expected Service Replicas
|
||||
|
||||
| Service | Replicas | Purpose |
|
||||
|---------|----------|---------|
|
||||
| traefik | 1 | Reverse proxy + SSL |
|
||||
| web | 3 | Application servers |
|
||||
| db | 1 | PostgreSQL database |
|
||||
| redis | 1 | Cache + sessions |
|
||||
| queue-worker | 2 | Background jobs |
|
||||
|
||||
## Rolling Updates
|
||||
|
||||
### Update Application
|
||||
|
||||
1. Build new image with updated code:
|
||||
```bash
|
||||
docker build -f Dockerfile.production -t 94.16.110.151:5000/framework:latest .
|
||||
```
|
||||
|
||||
2. Transfer to production (if no registry):
|
||||
```bash
|
||||
docker save 94.16.110.151:5000/framework:latest | \
|
||||
ssh -i ~/.ssh/production deploy@94.16.110.151 'docker load'
|
||||
```
|
||||
|
||||
3. Update the service:
|
||||
```bash
|
||||
# On production server
|
||||
docker service update --image 94.16.110.151:5000/framework:latest framework_web
|
||||
```
|
||||
|
||||
The update will:
|
||||
- Roll out to 1 container at a time (`parallelism: 1`)
|
||||
- Wait 10 seconds between updates (`delay: 10s`)
|
||||
- Start new container before stopping old one (`order: start-first`)
|
||||
- Automatically rollback on failure (`failure_action: rollback`)
|
||||
|
||||
### Monitor Update Progress
|
||||
|
||||
```bash
|
||||
# Watch update status
|
||||
watch docker service ps framework_web
|
||||
|
||||
# View update logs
|
||||
docker service logs framework_web -f --tail 50
|
||||
```
|
||||
|
||||
### Manual Rollback
|
||||
|
||||
If needed, rollback to previous version:
|
||||
```bash
|
||||
docker service rollback framework_web
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Service Won't Start
|
||||
|
||||
Check service logs:
|
||||
```bash
|
||||
docker service logs framework_web --tail 100
|
||||
```
|
||||
|
||||
Check task failures:
|
||||
```bash
|
||||
docker service ps framework_web --no-trunc
|
||||
```
|
||||
|
||||
### Container Crashing
|
||||
|
||||
Inspect individual container:
|
||||
```bash
|
||||
# Get container ID
|
||||
docker ps -a | grep framework_web
|
||||
|
||||
# View logs
|
||||
docker logs <container_id>
|
||||
|
||||
# Exec into running container
|
||||
docker exec -it <container_id> bash
|
||||
```
|
||||
|
||||
### SSL/TLS Issues
|
||||
|
||||
Traefik handles SSL termination. Check Traefik logs:
|
||||
```bash
|
||||
docker service logs framework_traefik -f
|
||||
```
|
||||
|
||||
Verify SSL certificates are mounted in docker-compose.prod.yml:
|
||||
```yaml
|
||||
volumes:
|
||||
- ./ssl:/ssl:ro
|
||||
```
|
||||
|
||||
### Database Connection Issues
|
||||
|
||||
Check PostgreSQL health:
|
||||
```bash
|
||||
docker service logs framework_db --tail 50
|
||||
|
||||
# Exec into db container
|
||||
docker exec -it $(docker ps -q -f name=framework_db) psql -U postgres -d framework_prod
|
||||
```
|
||||
|
||||
### Redis Connection Issues
|
||||
|
||||
Check Redis availability:
|
||||
```bash
|
||||
docker service logs framework_redis --tail 50
|
||||
|
||||
# Test Redis connection
|
||||
docker exec -it $(docker ps -q -f name=framework_redis) redis-cli ping
|
||||
```
|
||||
|
||||
### Performance Issues
|
||||
|
||||
Check resource usage:
|
||||
```bash
|
||||
# Service resource limits
|
||||
docker service inspect framework_web --format='{{json .Spec.TaskTemplate.Resources}}' | jq
|
||||
|
||||
# Container stats
|
||||
docker stats
|
||||
```
|
||||
|
||||
## Scaling
|
||||
|
||||
### Scale Web Service
|
||||
|
||||
```bash
|
||||
# Scale up to 5 replicas
|
||||
docker service scale framework_web=5
|
||||
|
||||
# Scale down to 2 replicas
|
||||
docker service scale framework_web=2
|
||||
```
|
||||
|
||||
### Scale Queue Workers
|
||||
|
||||
```bash
|
||||
# Scale workers based on queue backlog
|
||||
docker service scale framework_queue-worker=4
|
||||
```
|
||||
|
||||
## Cleanup
|
||||
|
||||
### Remove Stack
|
||||
|
||||
```bash
|
||||
# Remove entire stack
|
||||
docker stack rm framework
|
||||
|
||||
# Verify removal
|
||||
docker stack ls
|
||||
```
|
||||
|
||||
### Remove Secrets
|
||||
|
||||
```bash
|
||||
# List secrets
|
||||
docker secret ls
|
||||
|
||||
# Remove specific secret
|
||||
docker secret rm db_password
|
||||
|
||||
# Remove all framework secrets
|
||||
docker secret ls | grep -E "db_password|app_key|vault_encryption_key" | awk '{print $2}' | xargs docker secret rm
|
||||
```
|
||||
|
||||
### Leave Swarm
|
||||
|
||||
```bash
|
||||
# Force leave Swarm (removes all services and secrets)
|
||||
docker swarm leave --force
|
||||
```
|
||||
|
||||
## Network Architecture
|
||||
|
||||
### Overlay Networks
|
||||
|
||||
- **traefik-public**: External network for Traefik ↔ Web communication
|
||||
- **backend**: Internal network for Web ↔ Database/Redis communication
|
||||
|
||||
### Port Mappings
|
||||
|
||||
| Port | Service | Purpose |
|
||||
|------|---------|---------|
|
||||
| 80 | Traefik | HTTP (redirects to 443) |
|
||||
| 443 | Traefik | HTTPS (production traffic) |
|
||||
| 8080 | Traefik | Dashboard (manager node only) |
|
||||
|
||||
## Volume Management
|
||||
|
||||
### Named Volumes
|
||||
|
||||
| Volume | Purpose | Mounted In |
|
||||
|--------|---------|------------|
|
||||
| traefik-logs | Traefik access logs | traefik |
|
||||
| storage-logs | Application logs | web, queue-worker |
|
||||
| storage-uploads | User uploads | web |
|
||||
| storage-queue | Queue data | queue-worker |
|
||||
| db-data | PostgreSQL data | db |
|
||||
| redis-data | Redis persistence | redis |
|
||||
|
||||
### Backup Volumes
|
||||
|
||||
```bash
|
||||
# Backup database
|
||||
docker exec $(docker ps -q -f name=framework_db) pg_dump -U postgres framework_prod > backup.sql
|
||||
|
||||
# Backup Redis (if persistence enabled)
|
||||
docker exec $(docker ps -q -f name=framework_redis) redis-cli --rdb /data/dump.rdb
|
||||
```
|
||||
|
||||
## Security Best Practices
|
||||
|
||||
1. **Secrets Management**: Never commit secrets to version control, use Docker Secrets
|
||||
2. **Network Isolation**: Backend network is internal-only, no external access
|
||||
3. **SSL/TLS**: Traefik enforces HTTPS, redirects HTTP → HTTPS
|
||||
4. **Health Checks**: All services have health checks with automatic restart
|
||||
5. **Resource Limits**: Production services have memory/CPU limits
|
||||
6. **Least Privilege**: Containers run as www-data (not root) where possible
|
||||
|
||||
## Phase 2 - Monitoring (Coming Soon)
|
||||
|
||||
- Prometheus for metrics collection
|
||||
- Grafana dashboards
|
||||
- Automated PostgreSQL backups
|
||||
- Email/Slack alerting
|
||||
|
||||
## Phase 3 - CI/CD (Coming Soon)
|
||||
|
||||
- Gitea Actions workflow
|
||||
- Loki + Promtail for log aggregation
|
||||
- Performance tuning
|
||||
|
||||
## Phase 4 - High Availability (Future)
|
||||
|
||||
- Multi-node Swarm cluster
|
||||
- Varnish CDN cache layer
|
||||
- PostgreSQL Primary/Replica with pgpool
|
||||
- MinIO object storage
|
||||
|
||||
## References
|
||||
|
||||
- [Docker Swarm Documentation](https://docs.docker.com/engine/swarm/)
|
||||
- [Traefik v2 Documentation](https://doc.traefik.io/traefik/)
|
||||
- [Docker Secrets Management](https://docs.docker.com/engine/swarm/secrets/)
|
||||
Reference in New Issue
Block a user