Files
michaelschiemer/deployment/legacy/gitea-workflows/README.md
2025-11-24 21:28:25 +01:00

742 lines
20 KiB
Markdown

# Gitea CI/CD Workflows Documentation
Comprehensive guide for the automated deployment workflows using Gitea Actions.
## Overview
This project uses Gitea Actions for automated deployments to staging and production environments. The workflows are designed for:
- **Zero-downtime deployments** via rolling updates
- **Automatic rollback** on deployment failures
- **Environment-specific configurations** using Docker Compose overlays
- **Database protection** with automated backups (production)
- **Comprehensive health checks** and smoke tests
- **Deployment audit trail** via persistent logs
## Workflow Files
### 1. Staging Deployment (`deploy-staging.yml`)
**Purpose:** Automated deployment to staging environment for testing and validation.
**Triggers:**
- Push to `staging` branch
- Manual workflow dispatch
**Target:** `https://staging.michaelschiemer.de`
**Key Features:**
- Fast deployment cycle (30-second health check)
- Basic health verification
- Keeps 5 deployment backups
- No database backup (non-critical environment)
**Workflow Steps:**
1. Build Docker image with `ENV=staging`
2. Push to private registry (localhost:5000)
3. Deploy to staging server via SSH
4. Basic health checks
5. Automatic rollback on failure
---
### 2. Production Deployment (`deploy-production.yml`)
**Purpose:** Production deployment with enhanced safety features and verification.
**Triggers:**
- Push to `main` or `production` branches
- Manual workflow dispatch
**Target:** `https://michaelschiemer.de`
**Key Features:**
- **Database backup** before deployment (aborts on backup failure)
- **Database migrations** after container startup
- **Extended health checks** (60-second wait, multiple verification layers)
- **Smoke tests** for functional verification
- **Deployment logging** for audit trail
- **Graceful shutdown** for active request handling
- Keeps 10 deployment backups
**Workflow Steps:**
1. Build Docker image with `ENV=production`
2. Push to private registry
3. Create database backup (optional via `skip_backup` input)
4. Gracefully stop current containers
5. Deploy new containers
6. Run database migrations
7. Extended health verification
8. Smoke tests (main page + API)
9. Automatic rollback on failure
10. Log deployment outcome
11. Clean up build artifacts
---
## Required Gitea Secrets
Configure these secrets in your Gitea repository settings (`Settings``Secrets`).
### Staging Secrets
| Secret Name | Description | Example Value |
|-------------|-------------|---------------|
| `STAGING_HOST` | Staging server hostname or IP | `staging.example.com` or `203.0.113.42` |
| `STAGING_USER` | SSH username for staging server | `deploy` or `www-data` |
| `STAGING_SSH_KEY` | Private SSH key (PEM format) | `-----BEGIN RSA PRIVATE KEY-----...` |
| `STAGING_SSH_PORT` | SSH port (optional, defaults to 22) | `22` or `2222` |
### Production Secrets
| Secret Name | Description | Example Value |
|-------------|-------------|---------------|
| `PRODUCTION_HOST` | Production server hostname or IP | `michaelschiemer.de` or `198.51.100.10` |
| `PRODUCTION_USER` | SSH username for production server | `deploy` or `www-data` |
| `PRODUCTION_SSH_KEY` | Private SSH key (PEM format) | `-----BEGIN RSA PRIVATE KEY-----...` |
| `PRODUCTION_SSH_PORT` | SSH port (optional, defaults to 22) | `22` or `2222` |
**SSH Key Generation:**
```bash
# Generate SSH key pair (on your local machine)
ssh-keygen -t rsa -b 4096 -f deployment_key -C "gitea-deployment"
# Copy public key to target server
ssh-copy-id -i deployment_key.pub deploy@server.example.com
# Add private key to Gitea secrets (entire content)
cat deployment_key
```
**Security Best Practices:**
- Use dedicated deployment user with minimal permissions
- Restrict SSH key to specific commands via `authorized_keys` options
- Rotate SSH keys regularly (quarterly recommended)
- Never commit SSH keys to repository
---
## Manual Workflow Triggering
### Via Gitea UI
1. Navigate to your repository
2. Click `Actions` tab
3. Select the workflow (`Deploy to Staging` or `Deploy to Production`)
4. Click `Run workflow`
5. Choose branch
6. Set input parameters (if applicable)
7. Click `Run workflow`
### Via Git Push
**Staging Deployment:**
```bash
# Push to staging branch
git checkout staging
git merge develop
git push origin staging
# Workflow triggers automatically
```
**Production Deployment:**
```bash
# Push to main/production branch
git checkout main
git merge staging
git push origin main
# Workflow triggers automatically
```
### Workflow Input Parameters
**Production Workflow:**
- `force_rebuild`: Force rebuild Docker image even if code hasn't changed (default: `false`)
- `skip_backup`: Skip database backup step - **NOT RECOMMENDED** (default: `false`)
**Use Case for `skip_backup`:**
Emergency hotfix deployment when backup would cause unacceptable delay. Only use if:
- Recent backup exists
- Issue is critical (security vulnerability, production down)
- Backup failure is blocking deployment
---
## Deployment Monitoring
### Real-Time Monitoring
**Via Gitea UI:**
1. Navigate to `Actions` tab
2. Click on running workflow
3. View real-time logs for each step
4. Check for errors or warnings
**Via Server Logs:**
```bash
# SSH to target server
ssh deploy@server.example.com
# Staging logs
tail -f /opt/framework-staging/current/storage/logs/app.log
# Production logs
tail -f /opt/framework-production/current/storage/logs/app.log
# Deployment log (production only)
tail -f /opt/framework-production/deployment.log
```
### Deployment Status Verification
**Check Container Status:**
```bash
# Staging
cd /opt/framework-staging/current
docker-compose -f docker-compose.base.yml -f docker-compose.staging.yml ps
# Production
cd /opt/framework-production/current
docker-compose -f docker-compose.base.yml -f docker-compose.prod.yml ps
```
**Health Check Endpoints:**
```bash
# Staging
curl -k https://staging.michaelschiemer.de/health
# Production
curl -k https://michaelschiemer.de/health
curl -k https://michaelschiemer.de/api/health
```
**Expected Health Response:**
```json
{
"status": "healthy",
"timestamp": "2025-01-28T15:30:00Z",
"version": "2.x",
"services": {
"database": "connected",
"redis": "connected",
"queue": "running"
}
}
```
---
## Rollback Procedures
### Automatic Rollback
Both workflows include automatic rollback on deployment failure:
**Trigger Conditions:**
- Build failure
- Health check failure
- Smoke test failure (production)
- Database migration failure (production)
**Rollback Process:**
1. Stop failed deployment containers
2. Restore most recent backup deployment
3. Start restored containers
4. Verify rollback success
5. Log rollback event
**Note:** Automatic rollback restores the application, but **database changes are NOT rolled back automatically**. See Manual Database Rollback below.
---
### Manual Rollback
**When to use:**
- Issue discovered after successful deployment
- Need to rollback to specific version (not just previous)
#### Application Rollback
**Staging:**
```bash
ssh deploy@staging.example.com
cd /opt/framework-staging
# List available backups
ls -lt backup_*
# Stop current deployment
cd current
docker-compose -f docker-compose.base.yml -f docker-compose.staging.yml down
cd ..
# Restore specific backup
rm -rf current
cp -r backup_20250128_143000 current
# Start restored deployment
cd current
docker-compose -f docker-compose.base.yml -f docker-compose.staging.yml up -d
# Verify
docker-compose -f docker-compose.base.yml -f docker-compose.staging.yml ps
curl -k https://staging.michaelschiemer.de/health
```
**Production:**
```bash
ssh deploy@michaelschiemer.de
cd /opt/framework-production
# List available backups
ls -lt backup_*
# Stop current deployment
cd current
docker-compose -f docker-compose.base.yml -f docker-compose.prod.yml down
cd ..
# Restore specific backup
rm -rf current
cp -r backup_20250128_150000 current
# Start restored deployment
cd current
docker-compose -f docker-compose.base.yml -f docker-compose.prod.yml up -d
# Wait for services
sleep 30
# Verify
docker-compose -f docker-compose.base.yml -f docker-compose.prod.yml ps
curl -k https://michaelschiemer.de/health
curl -k https://michaelschiemer.de/api/health
```
#### Database Rollback (Production Only)
**CRITICAL:** Database rollback is a destructive operation. Only perform if:
- You have confirmed backup from before problematic deployment
- You understand data loss implications
- Issue cannot be fixed forward
**Process:**
```bash
ssh deploy@michaelschiemer.de
cd /opt/framework-production/current
# List available database backups
ls -lt storage/backups/backup_*.sql
# Verify backup integrity
docker-compose -f docker-compose.base.yml -f docker-compose.prod.yml exec -T production-app \
php console.php db:verify-backup --file=storage/backups/backup_20250128_150000.sql
# Stop application to prevent new writes
docker-compose -f docker-compose.base.yml -f docker-compose.prod.yml stop production-app
# Restore database from backup
docker-compose -f docker-compose.base.yml -f docker-compose.prod.yml exec -T production-app \
php console.php db:restore --file=storage/backups/backup_20250128_150000.sql --force
# Start application
docker-compose -f docker-compose.base.yml -f docker-compose.prod.yml start production-app
# Verify
curl -k https://michaelschiemer.de/health
```
**Database Rollback Best Practices:**
- Always create new backup before rollback
- Document rollback reason in deployment log
- Notify team immediately
- Review application logs for data consistency issues
- Consider rolling forward with fix instead
---
## Troubleshooting
### Common Issues
#### 1. Workflow Fails: "Permission denied (publickey)"
**Cause:** SSH authentication failed
**Solutions:**
- Verify SSH key is correctly added to Gitea secrets (entire key content)
- Ensure public key is in `~/.ssh/authorized_keys` on target server
- Check SSH key permissions on server (`chmod 600 ~/.ssh/authorized_keys`)
- Test SSH connection manually: `ssh -i deployment_key deploy@server.example.com`
#### 2. Health Check Fails
**Staging:**
```bash
# Check container status
docker-compose -f docker-compose.base.yml -f docker-compose.staging.yml ps
# Check logs
docker-compose -f docker-compose.base.yml -f docker-compose.staging.yml logs staging-app
# Check PHP-FPM
docker-compose -f docker-compose.base.yml -f docker-compose.staging.yml exec staging-app php -v
```
**Production:**
```bash
# Extended diagnostics
docker-compose -f docker-compose.base.yml -f docker-compose.prod.yml ps
docker-compose -f docker-compose.base.yml -f docker-compose.prod.yml logs production-app
docker-compose -f docker-compose.base.yml -f docker-compose.prod.yml exec production-app php -v
docker-compose -f docker-compose.base.yml -f docker-compose.prod.yml exec production-app pgrep php-fpm
docker-compose -f docker-compose.base.yml -f docker-compose.prod.yml exec production-redis redis-cli ping
```
#### 3. Database Backup Fails (Production)
**Symptoms:**
- Workflow aborts at step [0/6]
- Error: "Database backup failed - deployment aborted"
**Solutions:**
```bash
# Check database connection
docker-compose -f docker-compose.base.yml -f docker-compose.prod.yml exec production-app \
php console.php db:status
# Check backup directory permissions
docker-compose -f docker-compose.base.yml -f docker-compose.prod.yml exec production-app \
ls -la storage/backups/
# Fix permissions if needed
docker-compose -f docker-compose.base.yml -f docker-compose.prod.yml exec production-app \
chown -R www-data:www-data storage/backups/
# Test backup manually
docker-compose -f docker-compose.base.yml -f docker-compose.prod.yml exec production-app \
php console.php db:backup --output=storage/backups/test_backup.sql
```
#### 4. Database Migration Fails (Production)
**Symptoms:**
- Workflow fails at step [5/6]
- Error: "Database migration failed"
**Solutions:**
```bash
# Check migration status
docker-compose -f docker-compose.base.yml -f docker-compose.prod.yml exec production-app \
php console.php db:status
# Review migration logs
docker-compose -f docker-compose.base.yml -f docker-compose.prod.yml logs production-app | grep migration
# Run migration manually with verbose output
docker-compose -f docker-compose.base.yml -f docker-compose.prod.yml exec production-app \
php console.php db:migrate --force --verbose
# If migration is stuck, rollback and retry
docker-compose -f docker-compose.base.yml -f docker-compose.prod.yml exec production-app \
php console.php db:rollback 1
docker-compose -f docker-compose.base.yml -f docker-compose.prod.yml exec production-app \
php console.php db:migrate --force
```
#### 5. Image Push Fails
**Symptoms:**
- Workflow fails at "Push image to private registry"
- Error: "connection refused" or "unauthorized"
**Solutions:**
```bash
# Check registry is accessible from CI runner
curl http://localhost:5000/v2/_catalog
# Verify registry authentication (if configured)
docker login localhost:5000
# Check registry container is running
docker ps | grep registry
```
#### 6. Smoke Tests Fail (Production)
**Symptoms:**
- Health checks pass but smoke tests fail
- Error: "Main page failed" or "API health check failed"
**Solutions:**
```bash
# Test endpoints manually
curl -v -k https://michaelschiemer.de/
curl -v -k https://michaelschiemer.de/api/health
# Check Traefik routing
docker-compose -f docker-compose.base.yml -f docker-compose.prod.yml logs production-nginx
# Check application logs
docker-compose -f docker-compose.base.yml -f docker-compose.prod.yml logs production-app | tail -100
# Verify Traefik labels
docker inspect production-nginx | grep -A 20 Labels
```
---
## Staging vs Production Differences
Comprehensive comparison of workflow behaviors.
| Feature | Staging | Production |
|---------|---------|------------|
| **Trigger Branches** | `staging` | `main`, `production` |
| **Image Tag** | `staging` | `latest` |
| **Deployment Directory** | `/opt/framework-staging/` | `/opt/framework-production/` |
| **Database Backup** | ❌ No | ✅ Yes (with abort on failure) |
| **Database Migrations** | ❌ Manual | ✅ Automatic |
| **Health Check Wait** | 30 seconds | 60 seconds |
| **Health Checks** | Basic (container status, PHP version, HTTP via nginx) | Extended (+ PHP-FPM, Traefik, Redis) |
| **Smoke Tests** | ❌ No | ✅ Yes (main page + API) |
| **Backup Retention** | 5 backups | 10 backups |
| **Container Shutdown** | `docker-compose down` (immediate) | `docker-compose stop` (graceful) |
| **Deployment Logging** | ❌ No | ✅ Yes (deployment.log) |
| **Build Artifact Cleanup** | ❌ No | ✅ Yes |
| **Target URL** | https://staging.michaelschiemer.de | https://michaelschiemer.de |
| **Manual Inputs** | `force_rebuild` | `force_rebuild`, `skip_backup` |
---
## Best Practices
### Development Workflow
**Recommended Branch Flow:**
```
develop → staging → main (production)
```
**Process:**
1. Develop features on feature branches
2. Merge to `develop` branch
3. When ready for testing: `git merge develop``staging`
4. Deploy to staging automatically
5. Test on staging environment
6. If tests pass: `git merge staging``main`
7. Deploy to production automatically
### Pre-Deployment Checklist
**Staging:**
- [ ] All tests pass locally
- [ ] Code reviewed and approved
- [ ] No breaking changes without migration path
- [ ] Dependencies updated in composer.json/package.json
**Production:**
- [ ] Tested on staging environment
- [ ] Database migrations tested on staging
- [ ] Performance impact assessed
- [ ] Rollback plan documented
- [ ] Team notified of deployment window
- [ ] Recent database backup verified
- [ ] Monitoring alerts configured
### Post-Deployment Verification
**Staging:**
```bash
# Basic checks
curl -k https://staging.michaelschiemer.de/health
curl -k https://staging.michaelschiemer.de/api/health
# Manual testing of new features
```
**Production:**
```bash
# Automated checks (from CI workflow)
curl -k https://michaelschiemer.de/
curl -k https://michaelschiemer.de/api/health
# Manual verification
# - Test critical user flows
# - Check analytics/monitoring dashboards
# - Review error logs
# - Verify database migrations applied
# Check deployment log
ssh deploy@michaelschiemer.de tail /opt/framework-production/deployment.log
```
### Deployment Scheduling
**Staging:** Deploy anytime during business hours
**Production:**
- **Preferred Window:** Off-peak hours (e.g., 2-6 AM local time)
- **Emergency Deployments:** Anytime (use `skip_backup` if necessary)
- **Major Releases:** Schedule during maintenance window with advance notice
---
## Emergency Procedures
### Production Down - Complete Outage
**Immediate Response:**
```bash
# 1. Check container status
ssh deploy@michaelschiemer.de
cd /opt/framework-production/current
docker-compose -f docker-compose.base.yml -f docker-compose.prod.yml ps
# 2. If containers stopped, restart
docker-compose -f docker-compose.base.yml -f docker-compose.prod.yml up -d
# 3. If restart fails, rollback
cd /opt/framework-production
rm -rf current
cp -r $(ls -dt backup_* | head -n1) current
cd current
docker-compose -f docker-compose.base.yml -f docker-compose.prod.yml up -d
# 4. Verify recovery
curl -k https://michaelschiemer.de/health
# 5. Investigate root cause
docker-compose -f docker-compose.base.yml -f docker-compose.prod.yml logs production-app
```
### Database Corruption
**Recovery Steps:**
```bash
# 1. Stop application immediately
docker-compose -f docker-compose.base.yml -f docker-compose.prod.yml stop production-app
# 2. Verify most recent backup
ls -lt /opt/framework-production/current/storage/backups/
# 3. Restore from backup (see Database Rollback section)
# 4. Verify data integrity
docker-compose -f docker-compose.base.yml -f docker-compose.prod.yml exec production-app \
php console.php db:verify-integrity
# 5. Restart application
docker-compose -f docker-compose.base.yml -f docker-compose.prod.yml start production-app
```
### Failed Deployment with No Rollback
**If automatic rollback fails:**
```bash
# 1. SSH to server
ssh deploy@michaelschiemer.de
# 2. Manual rollback (see Manual Rollback section)
# 3. If rollback unavailable, emergency restore
cd /opt/framework-production
git clone https://git.michaelschiemer.de/michael/framework.git emergency-deploy
cd emergency-deploy
git checkout <last-known-good-commit>
# 4. Build and deploy manually
docker build -f docker/php/Dockerfile -t localhost:5000/framework:emergency .
docker push localhost:5000/framework:emergency
# 5. Update docker-compose.prod.yml to use emergency tag
cd /opt/framework-production/current
# Edit docker-compose.prod.yml: image: localhost:5000/framework:emergency
# 6. Deploy
docker-compose -f docker-compose.base.yml -f docker-compose.prod.yml up -d
```
---
## Monitoring and Alerting
### Recommended Monitoring
**Application Metrics:**
- Response time (target: <200ms p95)
- Error rate (target: <0.1%)
- Request throughput
- Queue depth
**Infrastructure Metrics:**
- Container health status
- CPU usage (target: <70%)
- Memory usage (target: <80%)
- Disk space (alert: <20% free)
**Database Metrics:**
- Query performance
- Connection pool utilization
- Replication lag (if applicable)
- Backup success rate
### Alert Configuration
**Critical Alerts (immediate notification):**
- Production deployment failed
- Automatic rollback triggered
- Health check failure (3 consecutive)
- Database backup failure
- Container restart loop
**Warning Alerts (review within 1 hour):**
- Staging deployment failed
- Smoke test failure
- Slow health check response (>5s)
- Disk space <30%
---
## Additional Resources
- **Main Documentation:** `deployment/NEW_ARCHITECTURE.md`
- **Architecture Analysis:** `deployment/legacy/ARCHITECTURE_ANALYSIS.md`
- **Docker Compose Files:** Root directory (`docker-compose.*.yml`)
- **Framework Documentation:** `docs/` directory
- **Troubleshooting Guide:** `docs/guides/troubleshooting.md`
---
## Maintenance
### Regular Tasks
**Weekly:**
- Review deployment logs
- Check backup retention
- Verify health check reliability
- Update dependencies (staging first)
**Monthly:**
- Rotate SSH keys
- Review and clean old backups (>30 days)
- Test rollback procedures
- Update workflow documentation
**Quarterly:**
- Disaster recovery drill
- Performance baseline review
- Security audit of deployment process
- Workflow optimization review
---
**Last Updated:** 2025-01-28
**Workflow Version:** 1.0
**Maintained by:** DevOps Team