Files
michaelschiemer/deployment/legacy/ARCHITECTURE_ANALYSIS.md
2025-11-24 21:28:25 +01:00

177 lines
6.9 KiB
Markdown

# Legacy Deployment Architecture Analysis
**Created**: 2025-01-24
**Status**: Archived - System being redesigned
## Executive Summary
This document analyzes the existing deployment architecture that led to the decision to rebuild from scratch.
## Discovered Issues
### 1. Docker Swarm vs Docker Compose Confusion
**Problem**: System designed for Docker Swarm but running with Docker Compose
- Stack files reference Swarm features (secrets, configs)
- Docker Swarm not initialized on target server
- Local development uses Docker Compose
- Production deployment unclear which to use
**Impact**: Container startup failures, service discovery issues
### 2. Distributed Stack Files
**Current Structure**:
```
deployment/stacks/
├── traefik/ # Reverse proxy
├── postgresql-production/
├── postgresql-staging/
├── gitea/ # Git server
├── redis/
├── minio/
├── monitoring/
├── registry/
└── semaphore/
```
**Problems**:
- No clear dependency graph between stacks
- Unclear startup order
- Volume mounts across stacks
- Network configuration scattered
### 3. Ansible Deployment Confusion
**Ansible Usage**:
- Server provisioning (install-docker.yml)
- Application deployment (sync-application-code.yml)
- Container recreation (recreate-containers-with-env.yml)
- Stack synchronization (sync-stacks.yml)
**Problem**: Ansible used for BOTH provisioning AND deployment
- Should only provision servers
- Deployment should be via CI/CD
- Creates unclear responsibilities
### 4. Environment-Specific Issues
**Environments Identified**:
- `local` - Developer machines (Docker Compose)
- `staging` - Hetzner server (unclear Docker Compose vs Swarm)
- `production` - Hetzner server (unclear Docker Compose vs Swarm)
**Problems**:
- No unified docker-compose files per environment
- Environment variables scattered (.env, secrets, Ansible vars)
- SSL certificates managed differently per environment
### 5. Specific Container Failures
**postgres-production-backup**:
- Container doesn't exist (was in restart loop)
- Volume mounts not accessible: `/scripts/backup-entrypoint.sh`
- Exit code 255 (file not found)
- Restart policy causing loop
**Root Causes**:
- Relative volume paths in docker-compose.yml
- Container running from different working directory
- Stack not properly initialized
### 6. Network Architecture Unclear
**Networks Found**:
- `traefik-public` (external)
- `app-internal` (external, for PostgreSQL)
- `backend`, `cache`, `postgres-production-internal`
**Problems**:
- Which stacks share which networks?
- How do services discover each other?
- Traefik routing configuration scattered
## Architecture Diagram (Current State)
```
┌─────────────────────────────────────────────────────────────┐
│ Server (Docker Compose? Docker Swarm? Unclear) │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Traefik │───▶│ App │───▶│ PostgreSQL │ │
│ │ Stack │ │ Stack │ │ Stack │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │ │ │ │
│ │ │ │ │
│ ┌──────▼──────┐ ┌───────▼────┐ ┌──────────▼─────┐ │
│ │ Gitea │ │ Redis │ │ MinIO │ │
│ │ Stack │ │ Stack │ │ Stack │ │
│ └─────────────┘ └────────────┘ └────────────────┘ │
│ │
│ Networks: traefik-public, app-internal, backend, cache │
│ Volumes: Relative paths, absolute paths, mixed │
│ Secrets: Docker secrets (Swarm), .env files, Ansible vars│
└─────────────────────────────────────────────────────────────┘
│ Deployment via?
│ - docker-compose up?
│ - docker stack deploy?
│ - Ansible playbooks?
│ UNCLEAR
┌───┴────────────────────────────────────────────────┐
│ Developer Machine / CI/CD (Gitea) │
│ - Ansible playbooks in deployment/ansible/ │
│ - Stack files in deployment/stacks/ │
│ - Application code in src/ │
└─────────────────────────────────────────────────────┘
```
## Decision Rationale: Rebuild vs Repair
### Why Rebuild?
1. **Architectural Clarity**: Current system mixes concepts (Swarm/Compose, provisioning/deployment)
2. **Environment Separation**: Clean separation of local/staging/prod configurations
3. **CI/CD Integration**: Design for Gitea Actions from start
4. **Maintainability**: Single source of truth per environment
5. **Debugging Difficulty**: Current issues are symptoms of architectural problems
### What to Keep?
- ✅ Traefik configuration (reverse proxy setup is solid)
- ✅ PostgreSQL backup scripts (logic is good, just needs proper mounting)
- ✅ SSL certificate generation (Let's Encrypt integration works)
- ✅ Ansible server provisioning playbooks (keep for initial setup)
### What to Redesign?
- ❌ Stack organization (too fragmented)
- ❌ Deployment method (unclear Ansible vs CI/CD)
- ❌ Environment configuration (scattered variables)
- ❌ Volume mount strategy (relative paths causing issues)
- ❌ Network architecture (unclear dependencies)
## Lessons Learned
1. **Consistency is Key**: Choose Docker Compose OR Docker Swarm, not both
2. **Environment Files**: One docker-compose.{env}.yml per environment
3. **Ansible Scope**: Only for server provisioning, NOT deployment
4. **CI/CD First**: Gitea Actions should handle deployment
5. **Volume Paths**: Always use absolute paths or named volumes
6. **Network Clarity**: Explicit network definitions, clear service discovery
## Next Steps
See `deployment/NEW_ARCHITECTURE.md` for the redesigned system.
## Archive Contents
This `deployment/legacy/` directory contains:
- Original stack files (archived)
- Ansible playbooks (reference only)
- This analysis document
**DO NOT USE THESE FILES FOR NEW DEPLOYMENTS**