Files
michaelschiemer/deployment/legacy/ARCHITECTURE_ANALYSIS.md
2025-11-24 21:28:25 +01:00

6.9 KiB

Legacy Deployment Architecture Analysis

Created: 2025-01-24 Status: Archived - System being redesigned

Executive Summary

This document analyzes the existing deployment architecture that led to the decision to rebuild from scratch.

Discovered Issues

1. Docker Swarm vs Docker Compose Confusion

Problem: System designed for Docker Swarm but running with Docker Compose

  • Stack files reference Swarm features (secrets, configs)
  • Docker Swarm not initialized on target server
  • Local development uses Docker Compose
  • Production deployment unclear which to use

Impact: Container startup failures, service discovery issues

2. Distributed Stack Files

Current Structure:

deployment/stacks/
├── traefik/              # Reverse proxy
├── postgresql-production/
├── postgresql-staging/
├── gitea/                # Git server
├── redis/
├── minio/
├── monitoring/
├── registry/
└── semaphore/

Problems:

  • No clear dependency graph between stacks
  • Unclear startup order
  • Volume mounts across stacks
  • Network configuration scattered

3. Ansible Deployment Confusion

Ansible Usage:

  • Server provisioning (install-docker.yml)
  • Application deployment (sync-application-code.yml)
  • Container recreation (recreate-containers-with-env.yml)
  • Stack synchronization (sync-stacks.yml)

Problem: Ansible used for BOTH provisioning AND deployment

  • Should only provision servers
  • Deployment should be via CI/CD
  • Creates unclear responsibilities

4. Environment-Specific Issues

Environments Identified:

  • local - Developer machines (Docker Compose)
  • staging - Hetzner server (unclear Docker Compose vs Swarm)
  • production - Hetzner server (unclear Docker Compose vs Swarm)

Problems:

  • No unified docker-compose files per environment
  • Environment variables scattered (.env, secrets, Ansible vars)
  • SSL certificates managed differently per environment

5. Specific Container Failures

postgres-production-backup:

  • Container doesn't exist (was in restart loop)
  • Volume mounts not accessible: /scripts/backup-entrypoint.sh
  • Exit code 255 (file not found)
  • Restart policy causing loop

Root Causes:

  • Relative volume paths in docker-compose.yml
  • Container running from different working directory
  • Stack not properly initialized

6. Network Architecture Unclear

Networks Found:

  • traefik-public (external)
  • app-internal (external, for PostgreSQL)
  • backend, cache, postgres-production-internal

Problems:

  • Which stacks share which networks?
  • How do services discover each other?
  • Traefik routing configuration scattered

Architecture Diagram (Current State)

┌─────────────────────────────────────────────────────────────┐
│ Server (Docker Compose? Docker Swarm? Unclear)             │
│                                                             │
│  ┌──────────────┐    ┌──────────────┐    ┌──────────────┐ │
│  │   Traefik    │───▶│     App      │───▶│  PostgreSQL  │ │
│  │   Stack      │    │   Stack      │    │    Stack     │ │
│  └──────────────┘    └──────────────┘    └──────────────┘ │
│         │                    │                    │        │
│         │                    │                    │        │
│  ┌──────▼──────┐    ┌───────▼────┐    ┌──────────▼─────┐ │
│  │   Gitea     │    │   Redis    │    │     MinIO      │ │
│  │   Stack     │    │   Stack    │    │    Stack       │ │
│  └─────────────┘    └────────────┘    └────────────────┘ │
│                                                             │
│  Networks: traefik-public, app-internal, backend, cache    │
│  Volumes: Relative paths, absolute paths, mixed           │
│  Secrets: Docker secrets (Swarm), .env files, Ansible vars│
└─────────────────────────────────────────────────────────────┘

    ▲
    │ Deployment via?
    │ - docker-compose up?
    │ - docker stack deploy?
    │ - Ansible playbooks?
    │ UNCLEAR
    │
┌───┴────────────────────────────────────────────────┐
│ Developer Machine / CI/CD (Gitea)                  │
│ - Ansible playbooks in deployment/ansible/         │
│ - Stack files in deployment/stacks/                │
│ - Application code in src/                         │
└─────────────────────────────────────────────────────┘

Decision Rationale: Rebuild vs Repair

Why Rebuild?

  1. Architectural Clarity: Current system mixes concepts (Swarm/Compose, provisioning/deployment)
  2. Environment Separation: Clean separation of local/staging/prod configurations
  3. CI/CD Integration: Design for Gitea Actions from start
  4. Maintainability: Single source of truth per environment
  5. Debugging Difficulty: Current issues are symptoms of architectural problems

What to Keep?

  • Traefik configuration (reverse proxy setup is solid)
  • PostgreSQL backup scripts (logic is good, just needs proper mounting)
  • SSL certificate generation (Let's Encrypt integration works)
  • Ansible server provisioning playbooks (keep for initial setup)

What to Redesign?

  • Stack organization (too fragmented)
  • Deployment method (unclear Ansible vs CI/CD)
  • Environment configuration (scattered variables)
  • Volume mount strategy (relative paths causing issues)
  • Network architecture (unclear dependencies)

Lessons Learned

  1. Consistency is Key: Choose Docker Compose OR Docker Swarm, not both
  2. Environment Files: One docker-compose.{env}.yml per environment
  3. Ansible Scope: Only for server provisioning, NOT deployment
  4. CI/CD First: Gitea Actions should handle deployment
  5. Volume Paths: Always use absolute paths or named volumes
  6. Network Clarity: Explicit network definitions, clear service discovery

Next Steps

See deployment/NEW_ARCHITECTURE.md for the redesigned system.

Archive Contents

This deployment/legacy/ directory contains:

  • Original stack files (archived)
  • Ansible playbooks (reference only)
  • This analysis document

DO NOT USE THESE FILES FOR NEW DEPLOYMENTS