Some checks failed
Deploy Application / deploy (push) Has been cancelled
6.9 KiB
6.9 KiB
Legacy Deployment Architecture Analysis
Created: 2025-01-24 Status: Archived - System being redesigned
Executive Summary
This document analyzes the existing deployment architecture that led to the decision to rebuild from scratch.
Discovered Issues
1. Docker Swarm vs Docker Compose Confusion
Problem: System designed for Docker Swarm but running with Docker Compose
- Stack files reference Swarm features (secrets, configs)
- Docker Swarm not initialized on target server
- Local development uses Docker Compose
- Production deployment unclear which to use
Impact: Container startup failures, service discovery issues
2. Distributed Stack Files
Current Structure:
deployment/stacks/
├── traefik/ # Reverse proxy
├── postgresql-production/
├── postgresql-staging/
├── gitea/ # Git server
├── redis/
├── minio/
├── monitoring/
├── registry/
└── semaphore/
Problems:
- No clear dependency graph between stacks
- Unclear startup order
- Volume mounts across stacks
- Network configuration scattered
3. Ansible Deployment Confusion
Ansible Usage:
- Server provisioning (install-docker.yml)
- Application deployment (sync-application-code.yml)
- Container recreation (recreate-containers-with-env.yml)
- Stack synchronization (sync-stacks.yml)
Problem: Ansible used for BOTH provisioning AND deployment
- Should only provision servers
- Deployment should be via CI/CD
- Creates unclear responsibilities
4. Environment-Specific Issues
Environments Identified:
local- Developer machines (Docker Compose)staging- Hetzner server (unclear Docker Compose vs Swarm)production- Hetzner server (unclear Docker Compose vs Swarm)
Problems:
- No unified docker-compose files per environment
- Environment variables scattered (.env, secrets, Ansible vars)
- SSL certificates managed differently per environment
5. Specific Container Failures
postgres-production-backup:
- Container doesn't exist (was in restart loop)
- Volume mounts not accessible:
/scripts/backup-entrypoint.sh - Exit code 255 (file not found)
- Restart policy causing loop
Root Causes:
- Relative volume paths in docker-compose.yml
- Container running from different working directory
- Stack not properly initialized
6. Network Architecture Unclear
Networks Found:
traefik-public(external)app-internal(external, for PostgreSQL)backend,cache,postgres-production-internal
Problems:
- Which stacks share which networks?
- How do services discover each other?
- Traefik routing configuration scattered
Architecture Diagram (Current State)
┌─────────────────────────────────────────────────────────────┐
│ Server (Docker Compose? Docker Swarm? Unclear) │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Traefik │───▶│ App │───▶│ PostgreSQL │ │
│ │ Stack │ │ Stack │ │ Stack │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │ │ │ │
│ │ │ │ │
│ ┌──────▼──────┐ ┌───────▼────┐ ┌──────────▼─────┐ │
│ │ Gitea │ │ Redis │ │ MinIO │ │
│ │ Stack │ │ Stack │ │ Stack │ │
│ └─────────────┘ └────────────┘ └────────────────┘ │
│ │
│ Networks: traefik-public, app-internal, backend, cache │
│ Volumes: Relative paths, absolute paths, mixed │
│ Secrets: Docker secrets (Swarm), .env files, Ansible vars│
└─────────────────────────────────────────────────────────────┘
▲
│ Deployment via?
│ - docker-compose up?
│ - docker stack deploy?
│ - Ansible playbooks?
│ UNCLEAR
│
┌───┴────────────────────────────────────────────────┐
│ Developer Machine / CI/CD (Gitea) │
│ - Ansible playbooks in deployment/ansible/ │
│ - Stack files in deployment/stacks/ │
│ - Application code in src/ │
└─────────────────────────────────────────────────────┘
Decision Rationale: Rebuild vs Repair
Why Rebuild?
- Architectural Clarity: Current system mixes concepts (Swarm/Compose, provisioning/deployment)
- Environment Separation: Clean separation of local/staging/prod configurations
- CI/CD Integration: Design for Gitea Actions from start
- Maintainability: Single source of truth per environment
- Debugging Difficulty: Current issues are symptoms of architectural problems
What to Keep?
- ✅ Traefik configuration (reverse proxy setup is solid)
- ✅ PostgreSQL backup scripts (logic is good, just needs proper mounting)
- ✅ SSL certificate generation (Let's Encrypt integration works)
- ✅ Ansible server provisioning playbooks (keep for initial setup)
What to Redesign?
- ❌ Stack organization (too fragmented)
- ❌ Deployment method (unclear Ansible vs CI/CD)
- ❌ Environment configuration (scattered variables)
- ❌ Volume mount strategy (relative paths causing issues)
- ❌ Network architecture (unclear dependencies)
Lessons Learned
- Consistency is Key: Choose Docker Compose OR Docker Swarm, not both
- Environment Files: One docker-compose.{env}.yml per environment
- Ansible Scope: Only for server provisioning, NOT deployment
- CI/CD First: Gitea Actions should handle deployment
- Volume Paths: Always use absolute paths or named volumes
- Network Clarity: Explicit network definitions, clear service discovery
Next Steps
See deployment/NEW_ARCHITECTURE.md for the redesigned system.
Archive Contents
This deployment/legacy/ directory contains:
- Original stack files (archived)
- Ansible playbooks (reference only)
- This analysis document
DO NOT USE THESE FILES FOR NEW DEPLOYMENTS