fix(console): comprehensive TUI rendering fixes

- Fix Enter key detection: handle multiple Enter key formats (\n, \r, \r\n)
- Reduce flickering: lower render frequency from 60 FPS to 30 FPS
- Fix menu bar visibility: re-render menu bar after content to prevent overwriting
- Fix content positioning: explicit line positioning for categories and commands
- Fix line shifting: clear lines before writing, control newlines manually
- Limit visible items: prevent overflow with maxVisibleCategories/Commands
- Improve CPU usage: increase sleep interval when no events processed

This fixes:
- Enter key not working for selection
- Strong flickering of the application
- Menu bar not visible or being overwritten
- Top half of selection list not displayed
- Lines being shifted/misaligned
This commit is contained in:
2025-11-10 11:06:07 +01:00
parent 6bc78f5540
commit 8f3c15ddbb
106 changed files with 9082 additions and 4483 deletions

View File

@@ -0,0 +1,208 @@
# Playbook Cleanup & Server Redeploy - Summary
## Completed Tasks
### Phase 1: Playbook Cleanup ✅
#### 1.1 Redundante Diagnose-Playbooks konsolidiert
- ✅ Created `diagnose/gitea.yml` - Consolidates:
- `diagnose-gitea-timeouts.yml`
- `diagnose-gitea-timeout-deep.yml`
- `diagnose-gitea-timeout-live.yml`
- `diagnose-gitea-timeouts-complete.yml`
- `comprehensive-gitea-diagnosis.yml`
- ✅ Uses tags: `deep`, `complete` for selective execution
- ✅ Removed redundant playbooks
#### 1.2 Redundante Fix-Playbooks konsolidiert
- ✅ Created `manage/gitea.yml` - Consolidates:
- `fix-gitea-timeouts.yml`
- `fix-gitea-traefik-connection.yml`
- `fix-gitea-ssl-routing.yml`
- `fix-gitea-servers-transport.yml`
- `fix-gitea-complete.yml`
- `restart-gitea-complete.yml`
- `restart-gitea-with-cache.yml`
- ✅ Uses tags: `restart`, `fix-timeouts`, `fix-ssl`, `fix-servers-transport`, `complete`
- ✅ Removed redundant playbooks
#### 1.3 Traefik-Diagnose/Fix-Playbooks konsolidiert
- ✅ Created `diagnose/traefik.yml` - Consolidates:
- `diagnose-traefik-restarts.yml`
- `find-traefik-restart-source.yml`
- `monitor-traefik-restarts.yml`
- `monitor-traefik-continuously.yml`
- `verify-traefik-fix.yml`
- ✅ Created `manage/traefik.yml` - Consolidates:
- `stabilize-traefik.yml`
- `disable-traefik-auto-restarts.yml`
- ✅ Uses tags: `restart-source`, `monitor`, `stabilize`, `disable-auto-restart`
- ✅ Removed redundant playbooks
#### 1.4 Veraltete/Redundante Playbooks entfernt
- ✅ Removed `update-gitea-traefik-service.yml` (deprecated)
- ✅ Removed `ensure-gitea-traefik-discovery.yml` (redundant)
- ✅ Removed `test-gitea-after-fix.yml` (temporär)
- ✅ Removed `find-ansible-automation-source.yml` (temporär)
#### 1.5 Neue Verzeichnisstruktur erstellt
- ✅ Created `playbooks/diagnose/` directory
- ✅ Created `playbooks/manage/` directory
- ✅ Created `playbooks/setup/` directory
- ✅ Created `playbooks/maintenance/` directory
- ✅ Created `playbooks/deploy/` directory
#### 1.6 Playbooks verschoben
-`setup-infrastructure.yml``setup/infrastructure.yml`
-`deploy-complete.yml``deploy/complete.yml`
-`deploy-image.yml``deploy/image.yml`
-`deploy-application-code.yml``deploy/code.yml`
-`setup-ssl-certificates.yml``setup/ssl.yml`
-`setup-gitea-initial-config.yml``setup/gitea.yml`
-`cleanup-all-containers.yml``maintenance/cleanup.yml`
#### 1.7 README aktualisiert
- ✅ Updated `playbooks/README.md` with new structure
- ✅ Documented consolidated playbooks
- ✅ Added usage examples with tags
- ✅ Listed removed/consolidated playbooks
### Phase 2: Server Neustart-Vorbereitung ✅
#### 2.1 Backup-Script erstellt
- ✅ Created `maintenance/backup-before-redeploy.yml`
- ✅ Backs up:
- Gitea data (volumes)
- SSL certificates (acme.json)
- Gitea configuration (app.ini)
- Traefik configuration
- PostgreSQL data (if applicable)
- ✅ Includes backup verification
#### 2.2 Neustart-Playbook erstellt
- ✅ Created `setup/redeploy-traefik-gitea-clean.yml`
- ✅ Features:
- Automatic backup (optional)
- Stop and remove containers (preserves volumes/acme.json)
- Sync configurations
- Redeploy stacks
- Restore Gitea configuration
- Verify service discovery
- Final tests
#### 2.3 Neustart-Anleitung erstellt
- ✅ Created `setup/REDEPLOY_GUIDE.md`
- ✅ Includes:
- Step-by-step guide
- Prerequisites
- Backup verification
- Rollback procedure
- Troubleshooting
- Common issues
#### 2.4 Rollback-Playbook erstellt
- ✅ Created `maintenance/rollback-redeploy.yml`
- ✅ Features:
- Restore from backup
- Restore volumes, configurations, SSL certificates
- Restart stacks
- Verification
## New Playbook Structure
```
playbooks/
├── setup/ # Initial Setup
│ ├── infrastructure.yml
│ ├── gitea.yml
│ ├── ssl.yml
│ ├── redeploy-traefik-gitea-clean.yml
│ └── REDEPLOY_GUIDE.md
├── deploy/ # Deployment
│ ├── complete.yml
│ ├── image.yml
│ └── code.yml
├── manage/ # Management (konsolidiert)
│ ├── traefik.yml
│ └── gitea.yml
├── diagnose/ # Diagnose (konsolidiert)
│ ├── gitea.yml
│ └── traefik.yml
└── maintenance/ # Wartung
├── backup.yml
├── backup-before-redeploy.yml
├── cleanup.yml
├── rollback-redeploy.yml
└── system.yml
```
## Usage Examples
### Gitea Diagnosis
```bash
# Basic
ansible-playbook -i inventory/production.yml playbooks/diagnose/gitea.yml
# Deep
ansible-playbook -i inventory/production.yml playbooks/diagnose/gitea.yml --tags deep
# Complete
ansible-playbook -i inventory/production.yml playbooks/diagnose/gitea.yml --tags complete
```
### Gitea Management
```bash
# Restart
ansible-playbook -i inventory/production.yml playbooks/manage/gitea.yml --tags restart
# Fix timeouts
ansible-playbook -i inventory/production.yml playbooks/manage/gitea.yml --tags fix-timeouts
# Complete fix
ansible-playbook -i inventory/production.yml playbooks/manage/gitea.yml --tags complete
```
### Redeploy
```bash
# With automatic backup
ansible-playbook -i inventory/production.yml playbooks/setup/redeploy-traefik-gitea-clean.yml \
--vault-password-file secrets/.vault_pass
# With existing backup
ansible-playbook -i inventory/production.yml playbooks/setup/redeploy-traefik-gitea-clean.yml \
--vault-password-file secrets/.vault_pass \
-e "backup_name=redeploy-backup-1234567890" \
-e "skip_backup=true"
```
### Rollback
```bash
ansible-playbook -i inventory/production.yml playbooks/maintenance/rollback-redeploy.yml \
--vault-password-file secrets/.vault_pass \
-e "backup_name=redeploy-backup-1234567890"
```
## Statistics
- **Consolidated playbooks created**: 4 (diagnose/gitea.yml, diagnose/traefik.yml, manage/gitea.yml, manage/traefik.yml)
- **Redeploy playbooks created**: 3 (redeploy-traefik-gitea-clean.yml, backup-before-redeploy.yml, rollback-redeploy.yml)
- **Redundant playbooks removed**: ~20+
- **Playbooks moved to new structure**: 7
- **Documentation created**: 2 (README.md updated, REDEPLOY_GUIDE.md)
## Next Steps
1. ✅ Test consolidated playbooks (dry-run where possible)
2. ✅ Verify redeploy playbook works correctly
3. ✅ Update CI/CD workflows to use new playbook paths if needed
4. ⏳ Perform actual server redeploy when ready
## Notes
- All consolidated playbooks use tags for selective execution
- Old wrapper playbooks (e.g., `restart-traefik.yml`) still exist and work
- Backup playbook preserves all critical data
- Redeploy playbook includes comprehensive verification
- Rollback playbook allows quick recovery if needed

View File

@@ -1,42 +1,81 @@
# Ansible Playbooks - Übersicht
## Neue Struktur
Die Playbooks wurden reorganisiert in eine klare Verzeichnisstruktur:
```
playbooks/
├── setup/ # Initial Setup
│ ├── infrastructure.yml
│ ├── gitea.yml
│ └── ssl.yml
├── deploy/ # Deployment
│ ├── complete.yml
│ ├── image.yml
│ └── code.yml
├── manage/ # Management (konsolidiert)
│ ├── traefik.yml
│ ├── gitea.yml
│ └── application.yml
├── diagnose/ # Diagnose (konsolidiert)
│ ├── gitea.yml
│ ├── traefik.yml
│ └── application.yml
└── maintenance/ # Wartung
├── backup.yml
├── backup-before-redeploy.yml
├── cleanup.yml
├── rollback-redeploy.yml
└── system.yml
```
## Verfügbare Playbooks
> **Hinweis**: Die meisten Playbooks wurden in wiederverwendbare Roles refactored. Die Playbooks sind jetzt Wrapper, die die entsprechenden Role-Tasks aufrufen. Dies verbessert Wiederverwendbarkeit, Wartbarkeit und folgt Ansible Best Practices.
### Infrastructure Setup
- **`setup-infrastructure.yml`** - Deployed alle Stacks (Traefik, PostgreSQL, Redis, Registry, Gitea, Monitoring, Production)
- **`setup-production-secrets.yml`** - Deployed Secrets zu Production
- **`setup-ssl-certificates.yml`** - SSL Certificate Setup (Wrapper für `traefik` Role, `tasks_from: ssl`)
- **`setup-wireguard-host.yml`** - WireGuard VPN Setup
- **`sync-stacks.yml`** - Synchronisiert Stack-Konfigurationen zum Server
### Setup (Initial Setup)
### Deployment & Updates
- **`rollback.yml`** - Rollback zu vorheriger Version
- **`backup.yml`** - Erstellt Backups von PostgreSQL, Application Data, Gitea, Registry
- **`deploy-image.yml`** - Docker Image Deployment (wird von CI/CD Workflows verwendet)
- **`setup/infrastructure.yml`** - Deployed alle Stacks (Traefik, PostgreSQL, Redis, Registry, Gitea, Monitoring, Production)
- **`setup/gitea.yml`** - Setup Gitea Initial Configuration (Wrapper für `gitea` Role, `tasks_from: setup`)
- **`setup/ssl.yml`** - SSL Certificate Setup (Wrapper für `traefik` Role, `tasks_from: ssl`)
- **`setup/redeploy-traefik-gitea-clean.yml`** - Clean redeployment of Traefik and Gitea stacks
- **`setup/REDEPLOY_GUIDE.md`** - Step-by-step guide for redeployment
### Traefik Management (Role-basiert)
### Deployment
- **`deploy/complete.yml`** - Complete deployment (code + image + dependencies)
- **`deploy/image.yml`** - Docker Image Deployment (wird von CI/CD Workflows verwendet)
- **`deploy/code.yml`** - Deploy Application Code via Git (Wrapper für `application` Role, `tasks_from: deploy_code`)
### Management (Konsolidiert)
#### Traefik Management
- **`manage/traefik.yml`** - Consolidated Traefik management
- `--tags stabilize`: Fix acme.json, ensure running, monitor stability
- `--tags disable-auto-restart`: Check and document auto-restart mechanisms
- **`restart-traefik.yml`** - Restart Traefik Container (Wrapper für `traefik` Role, `tasks_from: restart`)
- **`recreate-traefik.yml`** - Recreate Traefik Container (Wrapper für `traefik` Role, `tasks_from: restart` mit `traefik_restart_action: recreate`)
- **`deploy-traefik-config.yml`** - Deploy Traefik Configuration Files (Wrapper für `traefik` Role, `tasks_from: config`)
- **`check-traefik-acme-logs.yml`** - Check Traefik ACME Challenge Logs (Wrapper für `traefik` Role, `tasks_from: logs`)
- **`setup-ssl-certificates.yml`** - Setup Let's Encrypt SSL Certificates (Wrapper für `traefik` Role, `tasks_from: ssl`)
### Gitea Management (Role-basiert)
#### Gitea Management
- **`manage/gitea.yml`** - Consolidated Gitea management
- `--tags restart`: Restart Gitea container
- `--tags fix-timeouts`: Restart Gitea and Traefik to fix timeouts
- `--tags fix-ssl`: Fix SSL/routing issues
- `--tags fix-servers-transport`: Update ServersTransport configuration
- `--tags complete`: Complete fix (stop runner, restart services, verify)
- **`check-and-restart-gitea.yml`** - Check and Restart Gitea if Unhealthy (Wrapper für `gitea` Role, `tasks_from: restart`)
- **`fix-gitea-runner-config.yml`** - Fix Gitea Runner Configuration (Wrapper für `gitea` Role, `tasks_from: runner` mit `gitea_runner_action: fix`)
- **`register-gitea-runner.yml`** - Register Gitea Runner (Wrapper für `gitea` Role, `tasks_from: runner` mit `gitea_runner_action: register`)
- **`update-gitea-config.yml`** - Update Gitea Configuration (Wrapper für `gitea` Role, `tasks_from: config`)
- **`setup-gitea-initial-config.yml`** - Setup Gitea Initial Configuration (Wrapper für `gitea` Role, `tasks_from: setup`)
- **`setup-gitea-repository.yml`** - Setup Gitea Repository (Wrapper für `gitea` Role, `tasks_from: repository`)
### Application Deployment (Role-basiert)
- **`deploy-application-code.yml`** - Deploy Application Code via Git (Wrapper für `application` Role, `tasks_from: deploy_code` mit `application_deployment_method: git`)
#### Application Management
- **`manage/application.yml`** - Consolidated application management (to be created)
- **`sync-application-code.yml`** - Synchronize Application Code via Rsync (Wrapper für `application` Role, `tasks_from: deploy_code` mit `application_deployment_method: rsync`)
- **`install-composer-dependencies.yml`** - Install Composer Dependencies (Wrapper für `application` Role, `tasks_from: composer`)
### Application Container Management (Role-basiert)
- **`check-container-status.yml`** - Check Container Status (Wrapper für `application` Role, `tasks_from: health_check`)
- **`check-container-logs.yml`** - Check Container Logs (Wrapper für `application` Role, `tasks_from: logs`)
- **`check-worker-logs.yml`** - Check Worker and Scheduler Logs (Wrapper für `application` Role, `tasks_from: logs` mit `application_logs_check_vendor: true`)
@@ -46,28 +85,89 @@
- **`recreate-containers-with-env.yml`** - Recreate Containers with Environment Variables (Wrapper für `application` Role, `tasks_from: containers` mit `application_container_action: recreate-with-env`)
- **`sync-and-recreate-containers.yml`** - Sync and Recreate Containers (Wrapper für `application` Role, `tasks_from: containers` mit `application_container_action: sync-recreate`)
### Diagnose (Konsolidiert)
#### Gitea Diagnose
- **`diagnose/gitea.yml`** - Consolidated Gitea diagnosis
- Basic checks (always): Container status, health endpoints, network connectivity, service discovery
- `--tags deep`: Resource usage, multiple connection tests, log analysis
- `--tags complete`: All checks including app.ini, ServersTransport, etc.
#### Traefik Diagnose
- **`diagnose/traefik.yml`** - Consolidated Traefik diagnosis
- Basic checks (always): Container status, restart count, recent logs
- `--tags restart-source`: Find source of restart loops (cronjobs, systemd, scripts)
- `--tags monitor`: Monitor for restarts over time
### Maintenance
- **`cleanup-all-containers.yml`** - Stoppt und entfernt alle Container, bereinigt Netzwerke und Volumes (für vollständigen Server-Reset)
- **`system-maintenance.yml`** - System-Updates, Unattended-Upgrades, Docker-Pruning
- **`troubleshoot.yml`** - Unified Troubleshooting mit Tags
- **`maintenance/backup.yml`** - Erstellt Backups von PostgreSQL, Application Data, Gitea, Registry
- **`maintenance/backup-before-redeploy.yml`** - Backup before redeploy (Gitea data, SSL certificates, configurations)
- **`maintenance/rollback-redeploy.yml`** - Rollback from redeploy backup
- **`maintenance/cleanup.yml`** - Stoppt und entfernt alle Container, bereinigt Netzwerke und Volumes (für vollständigen Server-Reset)
- **`maintenance/system.yml`** - System-Updates, Unattended-Upgrades, Docker-Pruning
- **`rollback.yml`** - Rollback zu vorheriger Version
### WireGuard
- **`generate-wireguard-client.yml`** - Generiert WireGuard Client-Config
- **`wireguard-routing.yml`** - Konfiguriert WireGuard Routing
- **`setup-wireguard-host.yml`** - WireGuard VPN Setup
### Initial Deployment
- **`build-initial-image.yml`** - Build und Push des initialen Docker Images (für erstes Deployment)
### CI/CD & Development
- **`setup-gitea-runner-ci.yml`** - Gitea Runner CI Setup
- **`install-docker.yml`** - Docker Installation auf Server
## Entfernte/Legacy Playbooks
## Entfernte/Konsolidierte Playbooks
Die folgenden Playbooks wurden entfernt, da sie nicht mehr benötigt werden:
- ~~`build-and-push.yml`~~ - Wird durch CI/CD Pipeline ersetzt
- ~~`remove-framework-production-stack.yml`~~ - Temporäres Playbook
- ~~`remove-temporary-grafana-ip.yml`~~ - Temporäres Playbook
Die folgenden Playbooks wurden konsolidiert oder entfernt:
### Konsolidiert in `diagnose/gitea.yml`:
- ~~`diagnose-gitea-timeouts.yml`~~
- ~~`diagnose-gitea-timeout-deep.yml`~~
- ~~`diagnose-gitea-timeout-live.yml`~~
- ~~`diagnose-gitea-timeouts-complete.yml`~~
- ~~`comprehensive-gitea-diagnosis.yml`~~
### Konsolidiert in `manage/gitea.yml`:
- ~~`fix-gitea-timeouts.yml`~~
- ~~`fix-gitea-traefik-connection.yml`~~
- ~~`fix-gitea-ssl-routing.yml`~~
- ~~`fix-gitea-servers-transport.yml`~~
- ~~`fix-gitea-complete.yml`~~
- ~~`restart-gitea-complete.yml`~~
- ~~`restart-gitea-with-cache.yml`~~
### Konsolidiert in `diagnose/traefik.yml`:
- ~~`diagnose-traefik-restarts.yml`~~
- ~~`find-traefik-restart-source.yml`~~
- ~~`monitor-traefik-restarts.yml`~~
- ~~`monitor-traefik-continuously.yml`~~
- ~~`verify-traefik-fix.yml`~~
### Konsolidiert in `manage/traefik.yml`:
- ~~`stabilize-traefik.yml`~~
- ~~`disable-traefik-auto-restarts.yml`~~
### Entfernt (veraltet/redundant):
- ~~`update-gitea-traefik-service.yml`~~ - Deprecated (wie in Code dokumentiert)
- ~~`ensure-gitea-traefik-discovery.yml`~~ - Redundant
- ~~`test-gitea-after-fix.yml`~~ - Temporär
- ~~`find-ansible-automation-source.yml`~~ - Temporär
### Verschoben:
- `setup-infrastructure.yml``setup/infrastructure.yml`
- `deploy-complete.yml``deploy/complete.yml`
- `deploy-image.yml``deploy/image.yml`
- `deploy-application-code.yml``deploy/code.yml`
- `setup-ssl-certificates.yml``setup/ssl.yml`
- `setup-gitea-initial-config.yml``setup/gitea.yml`
- `cleanup-all-containers.yml``maintenance/cleanup.yml`
## Verwendung
@@ -78,6 +178,69 @@ cd deployment/ansible
ansible-playbook -i inventory/production.yml playbooks/<playbook>.yml --vault-password-file secrets/.vault_pass
```
### Konsolidierte Playbooks mit Tags
**Gitea Diagnose:**
```bash
# Basic diagnosis (default)
ansible-playbook -i inventory/production.yml playbooks/diagnose/gitea.yml --vault-password-file secrets/.vault_pass
# Deep diagnosis
ansible-playbook -i inventory/production.yml playbooks/diagnose/gitea.yml --tags deep --vault-password-file secrets/.vault_pass
# Complete diagnosis
ansible-playbook -i inventory/production.yml playbooks/diagnose/gitea.yml --tags complete --vault-password-file secrets/.vault_pass
```
**Gitea Management:**
```bash
# Restart Gitea
ansible-playbook -i inventory/production.yml playbooks/manage/gitea.yml --tags restart --vault-password-file secrets/.vault_pass
# Fix timeouts
ansible-playbook -i inventory/production.yml playbooks/manage/gitea.yml --tags fix-timeouts --vault-password-file secrets/.vault_pass
# Complete fix
ansible-playbook -i inventory/production.yml playbooks/manage/gitea.yml --tags complete --vault-password-file secrets/.vault_pass
```
**Traefik Diagnose:**
```bash
# Basic diagnosis
ansible-playbook -i inventory/production.yml playbooks/diagnose/traefik.yml --vault-password-file secrets/.vault_pass
# Find restart source
ansible-playbook -i inventory/production.yml playbooks/diagnose/traefik.yml --tags restart-source --vault-password-file secrets/.vault_pass
# Monitor restarts
ansible-playbook -i inventory/production.yml playbooks/diagnose/traefik.yml --tags monitor --vault-password-file secrets/.vault_pass
```
**Traefik Management:**
```bash
# Stabilize Traefik
ansible-playbook -i inventory/production.yml playbooks/manage/traefik.yml --tags stabilize --vault-password-file secrets/.vault_pass
```
**Redeploy:**
```bash
# With automatic backup
ansible-playbook -i inventory/production.yml playbooks/setup/redeploy-traefik-gitea-clean.yml --vault-password-file secrets/.vault_pass
# With existing backup
ansible-playbook -i inventory/production.yml playbooks/setup/redeploy-traefik-gitea-clean.yml \
--vault-password-file secrets/.vault_pass \
-e "backup_name=redeploy-backup-1234567890" \
-e "skip_backup=true"
```
**Rollback:**
```bash
ansible-playbook -i inventory/production.yml playbooks/maintenance/rollback-redeploy.yml \
--vault-password-file secrets/.vault_pass \
-e "backup_name=redeploy-backup-1234567890"
```
### Role-basierte Playbooks
Die meisten Playbooks sind jetzt Wrapper, die Roles verwenden. Die Funktionalität bleibt gleich, aber die Implementierung ist jetzt in wiederverwendbaren Roles organisiert:
@@ -99,7 +262,7 @@ ansible-playbook -i inventory/production.yml playbooks/fix-gitea-runner-config.y
**Beispiel: Application Code Deployment**
```bash
# Git-basiert (Standard):
ansible-playbook -i inventory/production.yml playbooks/deploy-application-code.yml \
ansible-playbook -i inventory/production.yml playbooks/deploy/code.yml \
-e "deployment_environment=staging" \
-e "git_branch=staging" \
--vault-password-file secrets/.vault_pass
@@ -109,21 +272,6 @@ ansible-playbook -i inventory/production.yml playbooks/sync-application-code.yml
--vault-password-file secrets/.vault_pass
```
### Tags verwenden
Viele Playbooks unterstützen Tags für selektive Ausführung:
```bash
# Nur Traefik-bezogene Tasks:
ansible-playbook -i inventory/production.yml playbooks/restart-traefik.yml --tags traefik,restart
# Nur Gitea-bezogene Tasks:
ansible-playbook -i inventory/production.yml playbooks/check-and-restart-gitea.yml --tags gitea,restart
# Nur Application-bezogene Tasks:
ansible-playbook -i inventory/production.yml playbooks/deploy-application-code.yml --tags application,deploy
```
## Role-Struktur
Die Playbooks verwenden jetzt folgende Roles:
@@ -143,11 +291,11 @@ Die Playbooks verwenden jetzt folgende Roles:
- **Location**: `roles/application/tasks/`
- **Defaults**: `roles/application/defaults/main.yml`
## Vorteile der Role-basierten Struktur
1. **Wiederverwendbarkeit**: Tasks können in mehreren Playbooks genutzt werden
2. **Wartbarkeit**: Änderungen zentral in Roles
3. **Testbarkeit**: Roles isoliert testbar
4. **Klarheit**: Klare Struktur nach Komponenten
5. **Best Practices**: Folgt Ansible-Empfehlungen
## Vorteile der neuen Struktur
1. **Klarheit**: Klare Verzeichnisstruktur nach Funktion
2. **Konsolidierung**: Redundante Playbooks zusammengeführt
3. **Tags**: Selektive Ausführung mit Tags
4. **Wiederverwendbarkeit**: Tasks können in mehreren Playbooks genutzt werden
5. **Wartbarkeit**: Änderungen zentral in Roles
6. **Best Practices**: Folgt Ansible-Empfehlungen

View File

@@ -1,195 +0,0 @@
---
# Comprehensive Gitea Timeout Diagnosis
# Prüft alle Aspekte des intermittierenden Gitea-Timeout-Problems
- name: Comprehensive Gitea Timeout Diagnosis
hosts: production
gather_facts: yes
become: no
vars:
gitea_stack_path: "{{ stacks_base_path }}/gitea"
traefik_stack_path: "{{ stacks_base_path }}/traefik"
gitea_url: "https://{{ gitea_domain }}"
tasks:
- name: Check Traefik container uptime and restart count
ansible.builtin.shell: |
docker inspect traefik --format '{{ '{{' }}.State.Status{{ '}}' }}|{{ '{{' }}.State.StartedAt{{ '}}' }}|{{ '{{' }}.RestartCount{{ '}}' }}' 2>/dev/null || echo "UNKNOWN"
register: traefik_info
changed_when: false
- name: Check Gitea container uptime and restart count
ansible.builtin.shell: |
docker inspect gitea --format '{{ '{{' }}.State.Status{{ '}}' }}|{{ '{{' }}.State.StartedAt{{ '}}' }}|{{ '{{' }}.RestartCount{{ '}}' }}' 2>/dev/null || echo "UNKNOWN"
register: gitea_info
changed_when: false
- name: Check Traefik logs for recent restarts (last 2 hours)
ansible.builtin.shell: |
cd {{ traefik_stack_path }}
docker compose logs traefik --since 2h 2>&1 | grep -iE "stopping server gracefully|I have to go|restart|shutdown" | tail -20 || echo "Keine Restart-Meldungen in den letzten 2 Stunden"
register: traefik_restart_logs
changed_when: false
- name: Check Gitea logs for errors/timeouts (last 2 hours)
ansible.builtin.shell: |
cd {{ gitea_stack_path }}
docker compose logs gitea --since 2h 2>&1 | grep -iE "error|timeout|failed|panic|fatal|slow" | tail -30 || echo "Keine Fehler in den letzten 2 Stunden"
register: gitea_error_logs
changed_when: false
- name: Test Gitea direct connection (multiple attempts)
ansible.builtin.shell: |
for i in {1..5}; do
echo "=== Attempt $i ==="
cd {{ gitea_stack_path }}
timeout 5 docker compose exec -T gitea curl -f http://localhost:3000/api/healthz 2>&1 || echo "FAILED"
sleep 1
done
register: gitea_direct_tests
changed_when: false
- name: Test Gitea via Traefik (multiple attempts)
ansible.builtin.shell: |
for i in {1..5}; do
echo "=== Attempt $i ==="
timeout 10 curl -k -s -o /dev/null -w "%{http_code}" {{ gitea_url }}/api/healthz 2>&1 || echo "TIMEOUT"
sleep 2
done
register: gitea_traefik_tests
changed_when: false
- name: Check Traefik service discovery for Gitea (using CLI)
ansible.builtin.shell: |
cd {{ traefik_stack_path }}
docker compose exec -T traefik traefik show providers docker 2>/dev/null | grep -i "gitea" || echo "Gitea service not found in Traefik providers"
register: traefik_gitea_service
changed_when: false
failed_when: false
- name: Check Traefik routers for Gitea (using CLI)
ansible.builtin.shell: |
cd {{ traefik_stack_path }}
docker compose exec -T traefik traefik show providers docker 2>/dev/null | grep -i "gitea" || echo "Gitea router not found in Traefik providers"
register: traefik_gitea_router
changed_when: false
failed_when: false
- name: Check network connectivity Traefik -> Gitea
ansible.builtin.shell: |
cd {{ traefik_stack_path }}
for i in {1..3}; do
echo "=== Attempt $i ==="
docker compose exec -T traefik wget -qO- --timeout=5 http://gitea:3000/api/healthz 2>&1 || echo "CONNECTION_FAILED"
sleep 1
done
register: traefik_gitea_network
changed_when: false
- name: Check Gitea container resources (CPU/Memory)
ansible.builtin.shell: |
docker stats gitea --no-stream --format 'CPU: {{ '{{' }}.CPUPerc{{ '}}' }} | Memory: {{ '{{' }}.MemUsage{{ '}}' }}' 2>/dev/null || echo "Could not get stats"
register: gitea_resources
changed_when: false
failed_when: false
- name: Check Traefik container resources (CPU/Memory)
ansible.builtin.shell: |
docker stats traefik --no-stream --format 'CPU: {{ '{{' }}.CPUPerc{{ '}}' }} | Memory: {{ '{{' }}.MemUsage{{ '}}' }}' 2>/dev/null || echo "Could not get stats"
register: traefik_resources
changed_when: false
failed_when: false
- name: Check if Gitea is in traefik-public network
ansible.builtin.shell: |
docker network inspect traefik-public --format '{{ '{{' }}range .Containers{{ '}}' }}{{ '{{' }}.Name{{ '}}' }} {{ '{{' }}end{{ '}}' }}' 2>/dev/null | grep -q gitea && echo "YES" || echo "NO"
register: gitea_in_network
changed_when: false
- name: Check Traefik access logs for Gitea requests (last 100 lines)
ansible.builtin.shell: |
cd {{ traefik_stack_path }}
tail -100 logs/access.log 2>/dev/null | grep -i "git.michaelschiemer.de" | tail -20 || echo "Keine Access-Logs gefunden"
register: traefik_access_logs
changed_when: false
failed_when: false
- name: Check Traefik error logs for Gitea-related errors
ansible.builtin.shell: |
cd {{ traefik_stack_path }}
tail -100 logs/traefik.log 2>/dev/null | grep -iE "gitea|git\.michaelschiemer\.de|timeout|error.*gitea" | tail -20 || echo "Keine Gitea-Fehler in Traefik-Logs"
register: traefik_error_logs
changed_when: false
failed_when: false
- name: Summary
ansible.builtin.debug:
msg: |
================================================================================
UMFASSENDE GITEA TIMEOUT DIAGNOSE:
================================================================================
Container Status:
- Traefik: {{ traefik_info.stdout }}
- Gitea: {{ gitea_info.stdout }}
Traefik Restart-Logs (letzte 2h):
{{ traefik_restart_logs.stdout }}
Gitea Error-Logs (letzte 2h):
{{ gitea_error_logs.stdout }}
Direkte Gitea-Verbindung (5 Versuche):
{{ gitea_direct_tests.stdout }}
Gitea via Traefik (5 Versuche):
{{ gitea_traefik_tests.stdout }}
Traefik Service Discovery:
- Gitea Service: {{ traefik_gitea_service.stdout }}
- Gitea Router: {{ traefik_gitea_router.stdout }}
Netzwerk-Verbindung Traefik -> Gitea (3 Versuche):
{{ traefik_gitea_network.stdout }}
Container-Ressourcen:
- Gitea: {{ gitea_resources.stdout }}
- Traefik: {{ traefik_resources.stdout }}
Netzwerk:
- Gitea in traefik-public: {% if gitea_in_network.stdout == 'YES' %}✅{% else %}❌{% endif %}
Traefik Access-Logs (letzte 20 Gitea-Requests):
{{ traefik_access_logs.stdout }}
Traefik Error-Logs (Gitea-bezogen):
{{ traefik_error_logs.stdout }}
================================================================================
ANALYSE:
================================================================================
{% if 'stopping server gracefully' in traefik_restart_logs.stdout | lower or 'I have to go' in traefik_restart_logs.stdout %}
❌ PROBLEM: Traefik wird regelmäßig gestoppt!
→ Dies ist die Hauptursache für die Timeouts
→ Führe 'find-traefik-restart-source.yml' aus um die Quelle zu finden
{% endif %}
{% if 'CONNECTION_FAILED' in traefik_gitea_network.stdout %}
❌ PROBLEM: Traefik kann Gitea nicht erreichen
→ Netzwerk-Problem zwischen Traefik und Gitea
→ Prüfe ob beide Container im traefik-public Netzwerk sind
{% endif %}
{% if 'not found' in traefik_gitea_service.stdout | lower or 'not found' in traefik_gitea_router.stdout | lower %}
❌ PROBLEM: Gitea nicht in Traefik Service Discovery
→ Traefik hat Gitea nicht erkannt
→ Führe 'fix-gitea-timeouts.yml' aus um beide zu restarten
{% endif %}
{% if 'TIMEOUT' in gitea_traefik_tests.stdout %}
⚠️ PROBLEM: Intermittierende Timeouts via Traefik
→ Mögliche Ursachen: Traefik-Restarts, Gitea-Performance, Netzwerk-Probleme
{% endif %}
================================================================================

View File

@@ -1,499 +0,0 @@
---
# Diagnose Gitea Timeout - Deep Analysis während Request
# Führt alle Checks während eines tatsächlichen Requests durch, inkl. pg_stat_activity, Redis, Backpressure-Tests
- name: Diagnose Gitea Timeout Deep Analysis During Request
hosts: production
gather_facts: yes
become: no
vars:
gitea_stack_path: "{{ stacks_base_path }}/gitea"
traefik_stack_path: "{{ stacks_base_path }}/traefik"
gitea_url: "https://{{ gitea_domain }}"
test_duration_seconds: 60 # Wie lange wir testen
test_timestamp: "{{ ansible_date_time.epoch }}"
postgres_max_connections: 300
tasks:
- name: Display diagnostic plan
ansible.builtin.debug:
msg: |
================================================================================
GITEA TIMEOUT DEEP DIAGNOSE - LIVE WÄHREND REQUEST
================================================================================
Diese erweiterte Diagnose führt alle Checks während eines tatsächlichen Requests durch:
1. Docker Stats (CPU/RAM/IO) während Request
2. pg_stat_activity: Connection Count vs max_connections ({{ postgres_max_connections }})
3. Redis Ping Check (Session-Store-Blockaden)
4. Gitea localhost Test (Backpressure-Analyse)
5. Gitea Logs (DB-Timeouts, Panics, "context deadline exceeded", SESSION: context canceled)
6. Postgres Logs (Connection issues, authentication timeouts)
7. Traefik Logs ("backend connection error", "EOF")
8. Runner Status und git-upload-pack/git gc Jobs
Test-Dauer: {{ test_duration_seconds }} Sekunden
Timestamp: {{ test_timestamp }}
================================================================================
- name: Get initial container stats (baseline)
ansible.builtin.shell: |
docker stats --no-stream --format "table {{ '{{' }}.Name{{ '}}' }}\t{{ '{{' }}.CPUPerc{{ '}}' }}\t{{ '{{' }}.MemUsage{{ '}}' }}\t{{ '{{' }}.NetIO{{ '}}' }}\t{{ '{{' }}.BlockIO{{ '}}' }}" gitea gitea-postgres gitea-redis traefik 2>/dev/null || echo "Stats collection failed"
register: initial_stats
changed_when: false
- name: Get initial PostgreSQL connection count
ansible.builtin.shell: |
cd {{ gitea_stack_path }}
docker compose exec -T postgres psql -U gitea -d gitea -c "SELECT count(*) as connection_count FROM pg_stat_activity;" 2>&1 | grep -E "^[[:space:]]*[0-9]+" | head -1 || echo "0"
register: initial_pg_connections
changed_when: false
failed_when: false
- name: Start collecting Docker stats in background
ansible.builtin.shell: |
timeout {{ test_duration_seconds }} docker stats --format "{{ '{{' }}.Name{{ '}}' }},{{ '{{' }}.CPUPerc{{ '}}' }},{{ '{{' }}.MemUsage{{ '}}' }},{{ '{{' }}.NetIO{{ '}}' }},{{ '{{' }}.BlockIO{{ '}}' }}" gitea gitea-postgres gitea-redis traefik 2>/dev/null | while read line; do
echo "[$(date '+%Y-%m-%d %H:%M:%S.%3N')] $line"
done > /tmp/gitea_stats_{{ test_timestamp }}.log 2>&1 &
STATS_PID=$!
echo $STATS_PID
register: stats_pid
changed_when: false
- name: Start collecting Gitea logs in background
ansible.builtin.shell: |
cd {{ gitea_stack_path }}
timeout {{ test_duration_seconds }} docker compose logs -f gitea 2>&1 | while read line; do
echo "[$(date '+%Y-%m-%d %H:%M:%S.%3N')] $line"
done > /tmp/gitea_logs_{{ test_timestamp }}.log 2>&1 &
echo $!
register: gitea_logs_pid
changed_when: false
- name: Start collecting Postgres logs in background
ansible.builtin.shell: |
cd {{ gitea_stack_path }}
timeout {{ test_duration_seconds }} docker compose logs -f postgres 2>&1 | while read line; do
echo "[$(date '+%Y-%m-%d %H:%M:%S.%3N')] $line"
done > /tmp/postgres_logs_{{ test_timestamp }}.log 2>&1 &
echo $!
register: postgres_logs_pid
changed_when: false
- name: Start collecting Traefik logs in background
ansible.builtin.shell: |
cd {{ traefik_stack_path }}
timeout {{ test_duration_seconds }} docker compose logs -f traefik 2>&1 | while read line; do
echo "[$(date '+%Y-%m-%d %H:%M:%S.%3N')] $line"
done > /tmp/traefik_logs_{{ test_timestamp }}.log 2>&1 &
echo $!
register: traefik_logs_pid
changed_when: false
- name: Start monitoring pg_stat_activity in background
ansible.builtin.shell: |
cd {{ gitea_stack_path }}
for i in $(seq 1 {{ (test_duration_seconds / 5) | int }}); do
echo "[$(date '+%Y-%m-%d %H:%M:%S.%3N')] $(docker compose exec -T postgres psql -U gitea -d gitea -t -c 'SELECT count(*) FROM pg_stat_activity;' 2>&1 | tr -d ' ' || echo 'ERROR')"
sleep 5
done > /tmp/pg_stat_activity_{{ test_timestamp }}.log 2>&1 &
echo $!
register: pg_stat_pid
changed_when: false
- name: Wait a moment for log collection to start
ansible.builtin.pause:
seconds: 2
- name: Trigger Gitea request via Traefik (with timeout)
ansible.builtin.shell: |
echo "[$(date '+%Y-%m-%d %H:%M:%S.%3N')] Starting request to {{ gitea_url }}/api/healthz"
timeout 35 curl -k -v -s -o /tmp/gitea_response_{{ test_timestamp }}.log -w "\nHTTP_CODE:%{http_code}\nTIME_TOTAL:%{time_total}\nTIME_CONNECT:%{time_connect}\nTIME_STARTTRANSFER:%{time_starttransfer}\n" "{{ gitea_url }}/api/healthz" 2>&1 | tee /tmp/gitea_curl_{{ test_timestamp }}.log
echo "[$(date '+%Y-%m-%d %H:%M:%S.%3N')] Request completed"
register: gitea_request
changed_when: false
failed_when: false
- name: Test Gitea localhost (Backpressure-Test)
ansible.builtin.shell: |
echo "[$(date '+%Y-%m-%d %H:%M:%S.%3N')] Starting localhost test"
cd {{ gitea_stack_path }}
timeout 35 docker compose exec -T gitea curl -f -s -w "\nHTTP_CODE:%{http_code}\nTIME_TOTAL:%{time_total}\n" http://localhost:3000/api/healthz 2>&1 | tee /tmp/gitea_localhost_{{ test_timestamp }}.log || echo "LOCALHOST_TEST_FAILED" > /tmp/gitea_localhost_{{ test_timestamp }}.log
echo "[$(date '+%Y-%m-%d %H:%M:%S.%3N')] Localhost test completed"
register: gitea_localhost_test
changed_when: false
failed_when: false
- name: Test direct connection Traefik → Gitea (parallel)
ansible.builtin.shell: |
echo "[$(date '+%Y-%m-%d %H:%M:%S.%3N')] Starting direct test Traefik → Gitea"
cd {{ traefik_stack_path }}
timeout 35 docker compose exec -T traefik wget -qO- --timeout=30 http://gitea:3000/api/healthz 2>&1 | tee /tmp/traefik_gitea_direct_{{ test_timestamp }}.log || echo "DIRECT_TEST_FAILED" > /tmp/traefik_gitea_direct_{{ test_timestamp }}.log
echo "[$(date '+%Y-%m-%d %H:%M:%S.%3N')] Direct test completed"
register: traefik_direct_test
changed_when: false
failed_when: false
- name: Test Redis connection during request
ansible.builtin.shell: |
echo "[$(date '+%Y-%m-%d %H:%M:%S.%3N')] Testing Redis connection"
cd {{ gitea_stack_path }}
docker compose exec -T redis redis-cli ping 2>&1 | tee /tmp/redis_ping_{{ test_timestamp }}.log || echo "REDIS_PING_FAILED" > /tmp/redis_ping_{{ test_timestamp }}.log
echo "[$(date '+%Y-%m-%d %H:%M:%S.%3N')] Redis ping completed"
register: redis_ping_test
changed_when: false
failed_when: false
- name: Check Gitea Runner status
ansible.builtin.shell: |
docker ps --format "{{ '{{' }}.Names{{ '}}' }}" | grep -q "gitea-runner" && echo "RUNNING" || echo "STOPPED"
register: runner_status
changed_when: false
failed_when: false
- name: Wait for log collection to complete
ansible.builtin.pause:
seconds: "{{ test_duration_seconds - 5 }}"
- name: Stop background processes
ansible.builtin.shell: |
pkill -f "docker.*stats.*gitea" || true
pkill -f "docker compose logs.*gitea" || true
pkill -f "docker compose logs.*postgres" || true
pkill -f "docker compose logs.*traefik" || true
pkill -f "pg_stat_activity" || true
sleep 2
changed_when: false
failed_when: false
- name: Get final PostgreSQL connection count
ansible.builtin.shell: |
cd {{ gitea_stack_path }}
docker compose exec -T postgres psql -U gitea -d gitea -c "SELECT count(*) as connection_count FROM pg_stat_activity;" 2>&1 | grep -E "^[[:space:]]*[0-9]+" | head -1 || echo "0"
register: final_pg_connections
changed_when: false
failed_when: false
- name: Collect stats results
ansible.builtin.slurp:
src: "/tmp/gitea_stats_{{ test_timestamp }}.log"
register: stats_results
changed_when: false
failed_when: false
- name: Collect pg_stat_activity results
ansible.builtin.slurp:
src: "/tmp/pg_stat_activity_{{ test_timestamp }}.log"
register: pg_stat_results
changed_when: false
failed_when: false
- name: Collect Gitea logs results
ansible.builtin.slurp:
src: "/tmp/gitea_logs_{{ test_timestamp }}.log"
register: gitea_logs_results
changed_when: false
failed_when: false
- name: Collect Postgres logs results
ansible.builtin.slurp:
src: "/tmp/postgres_logs_{{ test_timestamp }}.log"
register: postgres_logs_results
changed_when: false
failed_when: false
- name: Collect Traefik logs results
ansible.builtin.slurp:
src: "/tmp/traefik_logs_{{ test_timestamp }}.log"
register: traefik_logs_results
changed_when: false
failed_when: false
- name: Get request result
ansible.builtin.slurp:
src: "/tmp/gitea_curl_{{ test_timestamp }}.log"
register: request_result
changed_when: false
failed_when: false
- name: Get localhost test result
ansible.builtin.slurp:
src: "/tmp/gitea_localhost_{{ test_timestamp }}.log"
register: localhost_result
changed_when: false
failed_when: false
- name: Get direct test result
ansible.builtin.slurp:
src: "/tmp/traefik_gitea_direct_{{ test_timestamp }}.log"
register: direct_test_result
changed_when: false
failed_when: false
- name: Get Redis ping result
ansible.builtin.slurp:
src: "/tmp/redis_ping_{{ test_timestamp }}.log"
register: redis_ping_result
changed_when: false
failed_when: false
- name: Analyze pg_stat_activity for connection count
ansible.builtin.shell: |
if [ -f /tmp/pg_stat_activity_{{ test_timestamp }}.log ]; then
echo "=== POSTGRES CONNECTION COUNT ANALYSIS ==="
echo "Initial connections: {{ initial_pg_connections.stdout }}"
echo "Final connections: {{ final_pg_connections.stdout }}"
echo "Max connections: {{ postgres_max_connections }}"
echo ""
echo "=== CONNECTION COUNT TIMELINE ==="
cat /tmp/pg_stat_activity_{{ test_timestamp }}.log | tail -20 || echo "No connection count data"
echo ""
echo "=== CONNECTION COUNT ANALYSIS ==="
MAX_COUNT=$(cat /tmp/pg_stat_activity_{{ test_timestamp }}.log | grep -E "^\[.*\] [0-9]+" | awk -F'] ' '{print $2}' | sort -n | tail -1 || echo "0")
if [ "$MAX_COUNT" != "0" ] && [ "$MAX_COUNT" != "" ]; then
echo "Maximum connections during test: $MAX_COUNT"
WARNING_THRESHOLD=$(({{ postgres_max_connections }} * 80 / 100))
if [ "$MAX_COUNT" -gt "$WARNING_THRESHOLD" ]; then
echo "⚠️ WARNING: Connection count ($MAX_COUNT) is above 80% of max_connections ({{ postgres_max_connections }})"
echo " Consider reducing MAX_OPEN_CONNS or increasing max_connections"
else
echo "✅ Connection count is within safe limits"
fi
fi
else
echo "pg_stat_activity log file not found"
fi
register: pg_stat_analysis
changed_when: false
failed_when: false
- name: Analyze stats for high CPU/Memory/IO
ansible.builtin.shell: |
if [ -f /tmp/gitea_stats_{{ test_timestamp }}.log ]; then
echo "=== STATS SUMMARY ==="
echo "Total samples: $(wc -l < /tmp/gitea_stats_{{ test_timestamp }}.log)"
echo ""
echo "=== HIGH CPU (>80%) ==="
grep -E "gitea|gitea-postgres" /tmp/gitea_stats_{{ test_timestamp }}.log | awk -F',' '{cpu=$2; gsub(/%/, "", cpu); if (cpu+0 > 80) print $0}' | head -10 || echo "No high CPU usage found"
echo ""
echo "=== MEMORY USAGE ==="
grep -E "gitea" /tmp/gitea_stats_{{ test_timestamp }}.log | tail -5 || echo "No memory stats"
else
echo "Stats file not found"
fi
register: stats_analysis
changed_when: false
failed_when: false
- name: Analyze Gitea logs for errors (including SESSION context canceled, panic, git-upload-pack)
ansible.builtin.shell: |
if [ -f /tmp/gitea_logs_{{ test_timestamp }}.log ]; then
echo "=== DB-TIMEOUTS / CONNECTION ERRORS ==="
grep -iE "timeout|deadline exceeded|connection.*failed|database.*error|postgres.*error|context.*deadline" /tmp/gitea_logs_{{ test_timestamp }}.log | tail -20 || echo "No DB-timeouts found"
echo ""
echo "=== SESSION: CONTEXT CANCELED ==="
grep -iE "SESSION.*context canceled|session.*release.*context canceled" /tmp/gitea_logs_{{ test_timestamp }}.log | tail -10 || echo "No SESSION: context canceled found"
echo ""
echo "=== PANICS / FATAL ERRORS ==="
grep -iE "panic|fatal|error.*fatal" /tmp/gitea_logs_{{ test_timestamp }}.log | tail -10 || echo "No panics found"
echo ""
echo "=== GIT-UPLOAD-PACK REQUESTS (can block) ==="
grep -iE "git-upload-pack|ServiceUploadPack" /tmp/gitea_logs_{{ test_timestamp }}.log | tail -10 || echo "No git-upload-pack requests found"
echo ""
echo "=== GIT GC JOBS (can hold connections) ==="
grep -iE "git.*gc|garbage.*collect" /tmp/gitea_logs_{{ test_timestamp }}.log | tail -10 || echo "No git gc jobs found"
echo ""
echo "=== SLOW QUERIES / PERFORMANCE ==="
grep -iE "slow|performance|took.*ms|duration" /tmp/gitea_logs_{{ test_timestamp }}.log | tail -10 || echo "No slow queries found"
else
echo "Gitea logs file not found"
fi
register: gitea_logs_analysis
changed_when: false
failed_when: false
- name: Analyze Postgres logs for errors
ansible.builtin.shell: |
if [ -f /tmp/postgres_logs_{{ test_timestamp }}.log ]; then
echo "=== POSTGRES ERRORS ==="
grep -iE "error|timeout|deadlock|connection.*refused|too many connections|authentication.*timeout" /tmp/postgres_logs_{{ test_timestamp }}.log | tail -20 || echo "No Postgres errors found"
echo ""
echo "=== SLOW QUERIES ==="
grep -iE "slow|duration|statement.*took" /tmp/postgres_logs_{{ test_timestamp }}.log | tail -10 || echo "No slow queries found"
else
echo "Postgres logs file not found"
fi
register: postgres_logs_analysis
changed_when: false
failed_when: false
- name: Analyze Traefik logs for backend errors
ansible.builtin.shell: |
if [ -f /tmp/traefik_logs_{{ test_timestamp }}.log ]; then
echo "=== BACKEND CONNECTION ERRORS ==="
grep -iE "backend.*error|connection.*error|EOF|gitea.*error|git\.michaelschiemer\.de.*error" /tmp/traefik_logs_{{ test_timestamp }}.log | tail -20 || echo "No backend errors found"
echo ""
echo "=== TIMEOUT ERRORS ==="
grep -iE "timeout|504|gateway.*timeout" /tmp/traefik_logs_{{ test_timestamp }}.log | tail -10 || echo "No timeout errors found"
else
echo "Traefik logs file not found"
fi
register: traefik_logs_analysis
changed_when: false
failed_when: false
- name: Display comprehensive diagnosis
ansible.builtin.debug:
msg: |
================================================================================
GITEA TIMEOUT DEEP DIAGNOSE - ERGEBNISSE
================================================================================
BASELINE STATS (vor Request):
{{ initial_stats.stdout }}
POSTGRES CONNECTION COUNT:
{{ pg_stat_analysis.stdout }}
REQUEST ERGEBNIS (Traefik → Gitea):
{% if request_result.content is defined and request_result.content != '' %}
{{ request_result.content | b64decode }}
{% else %}
Request-Ergebnis nicht verfügbar
{% endif %}
BACKPRESSURE TEST - GITEA LOCALHOST:
{% if localhost_result.content is defined and localhost_result.content != '' %}
{{ localhost_result.content | b64decode }}
{% else %}
Localhost-Test-Ergebnis nicht verfügbar
{% endif %}
DIREKTER TEST TRAEFIK → GITEA:
{% if direct_test_result.content is defined and direct_test_result.content != '' %}
{{ direct_test_result.content | b64decode }}
{% else %}
Direkter Test-Ergebnis nicht verfügbar
{% endif %}
REDIS PING TEST:
{% if redis_ping_result.content is defined and redis_ping_result.content != '' %}
{{ redis_ping_result.content | b64decode }}
{% else %}
Redis-Ping-Ergebnis nicht verfügbar
{% endif %}
RUNNER STATUS:
- Status: {{ runner_status.stdout }}
================================================================================
STATS-ANALYSE (während Request):
================================================================================
{{ stats_analysis.stdout }}
================================================================================
GITEA LOGS-ANALYSE:
================================================================================
{{ gitea_logs_analysis.stdout }}
================================================================================
POSTGRES LOGS-ANALYSE:
================================================================================
{{ postgres_logs_analysis.stdout }}
================================================================================
TRAEFIK LOGS-ANALYSE:
================================================================================
{{ traefik_logs_analysis.stdout }}
================================================================================
INTERPRETATION:
================================================================================
{% set request_content = request_result.content | default('') | b64decode | default('') %}
{% set localhost_content = localhost_result.content | default('') | b64decode | default('') %}
{% set direct_content = direct_test_result.content | default('') | b64decode | default('') %}
{% set redis_content = redis_ping_result.content | default('') | b64decode | default('') %}
{% set traefik_errors = traefik_logs_analysis.stdout | default('') %}
{% set gitea_errors = gitea_logs_analysis.stdout | default('') %}
{% set postgres_errors = postgres_logs_analysis.stdout | default('') %}
{% set stats_content = stats_analysis.stdout | default('') %}
{% if 'timeout' in request_content or '504' in request_content or 'HTTP_CODE:504' in request_content %}
⚠️ REQUEST HAT TIMEOUT/504:
BACKPRESSURE-ANALYSE:
{% if 'LOCALHOST_TEST_FAILED' in localhost_content or localhost_content == '' %}
→ Gitea localhost Test schlägt fehl oder blockiert
→ Problem liegt IN Gitea/DB selbst, nicht zwischen Traefik und Gitea
{% elif 'HTTP_CODE:200' in localhost_content or '200 OK' in localhost_content %}
→ Gitea localhost Test funktioniert schnell
→ Problem liegt ZWISCHEN Traefik und Gitea (Netzwerk, Firewall, Limit)
{% endif %}
{% if 'REDIS_PING_FAILED' in redis_content or redis_content == '' or 'PONG' not in redis_content %}
→ Redis ist nicht erreichbar
→ Session-Store blockiert, Gitea läuft in "context canceled"
{% else %}
→ Redis ist erreichbar
{% endif %}
{% if 'SESSION.*context canceled' in gitea_errors or 'session.*release.*context canceled' in gitea_errors %}
→ Gitea hat SESSION: context canceled Fehler
→ Session-Store (Redis) könnte blockieren oder Session-Locks hängen
{% endif %}
{% if 'git-upload-pack' in gitea_errors %}
→ git-upload-pack Requests gefunden (können blockieren)
→ Prüfe ob Runner aktiv ist und viele Git-Operationen durchführt
{% endif %}
{% if 'git.*gc' in gitea_errors %}
→ git gc Jobs gefunden (können Verbindungen halten)
→ Prüfe ob git gc Jobs hängen
{% endif %}
{% if 'EOF' in traefik_errors or 'backend' in traefik_errors | lower or 'connection.*error' in traefik_errors | lower %}
→ Traefik meldet Backend-Connection-Error
→ Gitea antwortet nicht auf Traefik's Verbindungsversuche
{% endif %}
{% if 'timeout' in gitea_errors | lower or 'deadline exceeded' in gitea_errors | lower %}
→ Gitea hat DB-Timeouts oder Context-Deadline-Exceeded
→ Postgres könnte blockieren oder zu langsam sein
{% endif %}
{% if 'too many connections' in postgres_errors | lower %}
→ Postgres hat zu viele Verbindungen
→ Connection Pool könnte überlastet sein
{% endif %}
{% if 'HIGH CPU' in stats_content or '>80' in stats_content %}
→ Gitea oder Postgres haben hohe CPU-Last
→ Performance-Problem, nicht Timeout-Konfiguration
{% endif %}
{% else %}
✅ REQUEST WAR ERFOLGREICH:
→ Problem tritt nur intermittierend auf
→ Prüfe Logs auf sporadische Fehler
{% endif %}
================================================================================
NÄCHSTE SCHRITTE:
================================================================================
1. Prüfe pg_stat_activity: Connection Count nahe max_connections?
2. Prüfe ob Redis erreichbar ist (Session-Store-Blockaden)
3. Prüfe Backpressure: localhost schnell aber Traefik langsam = Netzwerk-Problem
4. Prüfe SESSION: context canceled Fehler (Session-Locks)
5. Prüfe git-upload-pack Requests (Runner-Überlastung)
6. Prüfe git gc Jobs (hängen und halten Verbindungen)
================================================================================
- name: Cleanup temporary files
ansible.builtin.file:
path: "/tmp/gitea_{{ test_timestamp }}.log"
state: absent
failed_when: false

View File

@@ -1,343 +0,0 @@
---
# Diagnose Gitea Timeout - Live während Request
# Führt alle Checks während eines tatsächlichen Requests durch
- name: Diagnose Gitea Timeout During Request
hosts: production
gather_facts: yes
become: no
vars:
gitea_stack_path: "{{ stacks_base_path }}/gitea"
traefik_stack_path: "{{ stacks_base_path }}/traefik"
gitea_url: "https://{{ gitea_domain }}"
test_duration_seconds: 60 # Wie lange wir testen
test_timestamp: "{{ ansible_date_time.epoch }}"
tasks:
- name: Display diagnostic plan
ansible.builtin.debug:
msg: |
================================================================================
GITEA TIMEOUT DIAGNOSE - LIVE WÄHREND REQUEST
================================================================================
Diese Diagnose führt alle Checks während eines tatsächlichen Requests durch:
1. Docker Stats (CPU/RAM/IO) während Request
2. Gitea Logs (DB-Timeouts, Panics, "context deadline exceeded")
3. Postgres Logs (Connection issues)
4. Traefik Logs ("backend connection error", "EOF")
5. Direkter Test Traefik → Gitea
Test-Dauer: {{ test_duration_seconds }} Sekunden
Timestamp: {{ test_timestamp }}
================================================================================
- name: Get initial container stats (baseline)
ansible.builtin.shell: |
docker stats --no-stream --format "table {{ '{{' }}.Name{{ '}}' }}\t{{ '{{' }}.CPUPerc{{ '}}' }}\t{{ '{{' }}.MemUsage{{ '}}' }}\t{{ '{{' }}.NetIO{{ '}}' }}\t{{ '{{' }}.BlockIO{{ '}}' }}" gitea gitea-postgres gitea-redis traefik 2>/dev/null || echo "Stats collection failed"
register: initial_stats
changed_when: false
- name: Start collecting Docker stats in background
ansible.builtin.shell: |
timeout {{ test_duration_seconds }} docker stats --format "{{ '{{' }}.Name{{ '}}' }},{{ '{{' }}.CPUPerc{{ '}}' }},{{ '{{' }}.MemUsage{{ '}}' }},{{ '{{' }}.NetIO{{ '}}' }},{{ '{{' }}.BlockIO{{ '}}' }}" gitea gitea-postgres gitea-redis traefik 2>/dev/null | while read line; do
echo "[$(date '+%Y-%m-%d %H:%M:%S.%3N')] $line"
done > /tmp/gitea_stats_{{ test_timestamp }}.log 2>&1 &
STATS_PID=$!
echo $STATS_PID
register: stats_pid
changed_when: false
- name: Start collecting Gitea logs in background
ansible.builtin.shell: |
cd {{ gitea_stack_path }}
timeout {{ test_duration_seconds }} docker compose logs -f gitea 2>&1 | while read line; do
echo "[$(date '+%Y-%m-%d %H:%M:%S.%3N')] $line"
done > /tmp/gitea_logs_{{ test_timestamp }}.log 2>&1 &
echo $!
register: gitea_logs_pid
changed_when: false
- name: Start collecting Postgres logs in background
ansible.builtin.shell: |
cd {{ gitea_stack_path }}
timeout {{ test_duration_seconds }} docker compose logs -f gitea-postgres 2>&1 | while read line; do
echo "[$(date '+%Y-%m-%d %H:%M:%S.%3N')] $line"
done > /tmp/postgres_logs_{{ test_timestamp }}.log 2>&1 &
echo $!
register: postgres_logs_pid
changed_when: false
- name: Start collecting Traefik logs in background
ansible.builtin.shell: |
cd {{ traefik_stack_path }}
timeout {{ test_duration_seconds }} docker compose logs -f traefik 2>&1 | while read line; do
echo "[$(date '+%Y-%m-%d %H:%M:%S.%3N')] $line"
done > /tmp/traefik_logs_{{ test_timestamp }}.log 2>&1 &
echo $!
register: traefik_logs_pid
changed_when: false
- name: Wait a moment for log collection to start
ansible.builtin.pause:
seconds: 2
- name: Trigger Gitea request via Traefik (with timeout)
ansible.builtin.shell: |
echo "[$(date '+%Y-%m-%d %H:%M:%S.%3N')] Starting request to {{ gitea_url }}/api/healthz"
timeout 35 curl -k -v -s -o /tmp/gitea_response_{{ test_timestamp }}.log -w "\nHTTP_CODE:%{http_code}\nTIME_TOTAL:%{time_total}\nTIME_CONNECT:%{time_connect}\nTIME_STARTTRANSFER:%{time_starttransfer}\n" "{{ gitea_url }}/api/healthz" 2>&1 | tee /tmp/gitea_curl_{{ test_timestamp }}.log
echo "[$(date '+%Y-%m-%d %H:%M:%S.%3N')] Request completed"
register: gitea_request
changed_when: false
failed_when: false
- name: Test direct connection Traefik → Gitea (parallel)
ansible.builtin.shell: |
echo "[$(date '+%Y-%m-%d %H:%M:%S.%3N')] Starting direct test Traefik → Gitea"
cd {{ traefik_stack_path }}
timeout 35 docker compose exec -T traefik wget -qO- --timeout=30 http://gitea:3000/api/healthz 2>&1 | tee /tmp/traefik_gitea_direct_{{ test_timestamp }}.log || echo "DIRECT_TEST_FAILED" > /tmp/traefik_gitea_direct_{{ test_timestamp }}.log
echo "[$(date '+%Y-%m-%d %H:%M:%S.%3N')] Direct test completed"
register: traefik_direct_test
changed_when: false
failed_when: false
- name: Wait for log collection to complete
ansible.builtin.pause:
seconds: "{{ test_duration_seconds - 5 }}"
- name: Stop background processes
ansible.builtin.shell: |
pkill -f "docker.*stats.*gitea" || true
pkill -f "docker compose logs.*gitea" || true
pkill -f "docker compose logs.*postgres" || true
pkill -f "docker compose logs.*traefik" || true
sleep 2
changed_when: false
failed_when: false
- name: Collect stats results
ansible.builtin.slurp:
src: "/tmp/gitea_stats_{{ test_timestamp }}.log"
register: stats_results
changed_when: false
failed_when: false
- name: Collect Gitea logs results
ansible.builtin.slurp:
src: "/tmp/gitea_logs_{{ test_timestamp }}.log"
register: gitea_logs_results
changed_when: false
failed_when: false
- name: Collect Postgres logs results
ansible.builtin.slurp:
src: "/tmp/postgres_logs_{{ test_timestamp }}.log"
register: postgres_logs_results
changed_when: false
failed_when: false
- name: Collect Traefik logs results
ansible.builtin.slurp:
src: "/tmp/traefik_logs_{{ test_timestamp }}.log"
register: traefik_logs_results
changed_when: false
failed_when: false
- name: Get request result
ansible.builtin.slurp:
src: "/tmp/gitea_curl_{{ test_timestamp }}.log"
register: request_result
changed_when: false
failed_when: false
- name: Get direct test result
ansible.builtin.slurp:
src: "/tmp/traefik_gitea_direct_{{ test_timestamp }}.log"
register: direct_test_result
changed_when: false
failed_when: false
- name: Analyze stats for high CPU/Memory/IO
ansible.builtin.shell: |
if [ -f /tmp/gitea_stats_{{ test_timestamp }}.log ]; then
echo "=== STATS SUMMARY ==="
echo "Total samples: $(wc -l < /tmp/gitea_stats_{{ test_timestamp }}.log)"
echo ""
echo "=== HIGH CPU (>80%) ==="
grep -E "gitea|gitea-postgres" /tmp/gitea_stats_{{ test_timestamp }}.log | awk -F',' '{cpu=$2; gsub(/%/, "", cpu); if (cpu+0 > 80) print $0}' | head -10 || echo "No high CPU usage found"
echo ""
echo "=== MEMORY USAGE ==="
grep -E "gitea" /tmp/gitea_stats_{{ test_timestamp }}.log | tail -5 || echo "No memory stats"
echo ""
echo "=== NETWORK IO ==="
grep -E "gitea" /tmp/gitea_stats_{{ test_timestamp }}.log | tail -5 || echo "No network activity"
else
echo "Stats file not found"
fi
register: stats_analysis
changed_when: false
failed_when: false
- name: Analyze Gitea logs for errors
ansible.builtin.shell: |
if [ -f /tmp/gitea_logs_{{ test_timestamp }}.log ]; then
echo "=== DB-TIMEOUTS / CONNECTION ERRORS ==="
grep -iE "timeout|deadline exceeded|connection.*failed|database.*error|postgres.*error|context.*deadline" /tmp/gitea_logs_{{ test_timestamp }}.log | tail -20 || echo "No DB-timeouts found"
echo ""
echo "=== PANICS / FATAL ERRORS ==="
grep -iE "panic|fatal|error.*fatal" /tmp/gitea_logs_{{ test_timestamp }}.log | tail -10 || echo "No panics found"
echo ""
echo "=== SLOW QUERIES / PERFORMANCE ==="
grep -iE "slow|performance|took.*ms|duration" /tmp/gitea_logs_{{ test_timestamp }}.log | tail -10 || echo "No slow queries found"
echo ""
echo "=== RECENT LOG ENTRIES (last 10) ==="
tail -10 /tmp/gitea_logs_{{ test_timestamp }}.log || echo "No recent logs"
else
echo "Gitea logs file not found"
fi
register: gitea_logs_analysis
changed_when: false
failed_when: false
- name: Analyze Postgres logs for errors
ansible.builtin.shell: |
if [ -f /tmp/postgres_logs_{{ test_timestamp }}.log ]; then
echo "=== POSTGRES ERRORS ==="
grep -iE "error|timeout|deadlock|connection.*refused|too many connections" /tmp/postgres_logs_{{ test_timestamp }}.log | tail -20 || echo "No Postgres errors found"
echo ""
echo "=== SLOW QUERIES ==="
grep -iE "slow|duration|statement.*took" /tmp/postgres_logs_{{ test_timestamp }}.log | tail -10 || echo "No slow queries found"
echo ""
echo "=== RECENT LOG ENTRIES (last 10) ==="
tail -10 /tmp/postgres_logs_{{ test_timestamp }}.log || echo "No recent logs"
else
echo "Postgres logs file not found"
fi
register: postgres_logs_analysis
changed_when: false
failed_when: false
- name: Analyze Traefik logs for backend errors
ansible.builtin.shell: |
if [ -f /tmp/traefik_logs_{{ test_timestamp }}.log ]; then
echo "=== BACKEND CONNECTION ERRORS ==="
grep -iE "backend.*error|connection.*error|EOF|gitea.*error|git\.michaelschiemer\.de.*error" /tmp/traefik_logs_{{ test_timestamp }}.log | tail -20 || echo "No backend errors found"
echo ""
echo "=== TIMEOUT ERRORS ==="
grep -iE "timeout|504|gateway.*timeout" /tmp/traefik_logs_{{ test_timestamp }}.log | tail -10 || echo "No timeout errors found"
echo ""
echo "=== RECENT LOG ENTRIES (last 10) ==="
tail -10 /tmp/traefik_logs_{{ test_timestamp }}.log || echo "No recent logs"
else
echo "Traefik logs file not found"
fi
register: traefik_logs_analysis
changed_when: false
failed_when: false
- name: Display comprehensive diagnosis
ansible.builtin.debug:
msg: |
================================================================================
GITEA TIMEOUT DIAGNOSE - ERGEBNISSE
================================================================================
BASELINE STATS (vor Request):
{{ initial_stats.stdout }}
REQUEST ERGEBNIS:
{% if request_result.content is defined and request_result.content != '' %}
{{ request_result.content | b64decode }}
{% else %}
Request-Ergebnis nicht verfügbar
{% endif %}
DIREKTER TEST TRAEFIK → GITEA:
{% if direct_test_result.content is defined and direct_test_result.content != '' %}
{{ direct_test_result.content | b64decode }}
{% else %}
Direkter Test-Ergebnis nicht verfügbar
{% endif %}
================================================================================
STATS-ANALYSE (während Request):
================================================================================
{{ stats_analysis.stdout }}
================================================================================
GITEA LOGS-ANALYSE:
================================================================================
{{ gitea_logs_analysis.stdout }}
================================================================================
POSTGRES LOGS-ANALYSE:
================================================================================
{{ postgres_logs_analysis.stdout }}
================================================================================
TRAEFIK LOGS-ANALYSE:
================================================================================
{{ traefik_logs_analysis.stdout }}
================================================================================
INTERPRETATION:
================================================================================
{% set request_content = request_result.content | default('') | b64decode | default('') %}
{% set direct_content = direct_test_result.content | default('') | b64decode | default('') %}
{% set traefik_errors = traefik_logs_analysis.stdout | default('') %}
{% set gitea_errors = gitea_logs_analysis.stdout | default('') %}
{% set postgres_errors = postgres_logs_analysis.stdout | default('') %}
{% set stats_content = stats_analysis.stdout | default('') %}
{% if 'timeout' in request_content or '504' in request_content or 'HTTP_CODE:504' in request_content %}
⚠️ REQUEST HAT TIMEOUT/504:
{% if 'EOF' in traefik_errors or 'backend' in traefik_errors | lower or 'connection.*error' in traefik_errors | lower %}
→ Traefik meldet Backend-Connection-Error
→ Gitea antwortet nicht auf Traefik's Verbindungsversuche
{% endif %}
{% if 'timeout' in gitea_errors | lower or 'deadline exceeded' in gitea_errors | lower %}
→ Gitea hat DB-Timeouts oder Context-Deadline-Exceeded
→ Postgres könnte blockieren oder zu langsam sein
{% endif %}
{% if 'too many connections' in postgres_errors | lower %}
→ Postgres hat zu viele Verbindungen
→ Connection Pool könnte überlastet sein
{% endif %}
{% if 'HIGH CPU' in stats_content or '>80' in stats_content %}
→ Gitea oder Postgres haben hohe CPU-Last
→ Performance-Problem, nicht Timeout-Konfiguration
{% endif %}
{% if 'DIRECT_TEST_FAILED' in direct_content or direct_content == '' %}
→ Direkter Test Traefik → Gitea schlägt fehl
→ Problem liegt bei Gitea selbst, nicht bei Traefik-Routing
{% endif %}
{% else %}
✅ REQUEST WAR ERFOLGREICH:
→ Problem tritt nur intermittierend auf
→ Prüfe Logs auf sporadische Fehler
{% endif %}
================================================================================
NÄCHSTE SCHRITTE:
================================================================================
1. Prüfe ob hohe CPU/Memory bei Gitea oder Postgres
2. Prüfe ob DB-Timeouts in Gitea-Logs
3. Prüfe ob Postgres "too many connections" meldet
4. Prüfe ob Traefik "backend connection error" oder "EOF" meldet
5. Prüfe ob direkter Test Traefik → Gitea funktioniert
================================================================================
- name: Cleanup temporary files
ansible.builtin.file:
path: "/tmp/gitea_{{ test_timestamp }}.log"
state: absent
failed_when: false

View File

@@ -1,325 +0,0 @@
---
# Diagnose Gitea Timeouts
# Prüft Gitea-Status, Traefik-Routing, Netzwerk-Verbindungen und behebt Probleme
- name: Diagnose Gitea Timeouts
hosts: production
gather_facts: yes
become: no
tasks:
- name: Check Gitea container status
ansible.builtin.shell: |
cd /home/deploy/deployment/stacks/gitea
docker compose ps gitea
register: gitea_status
changed_when: false
- name: Display Gitea container status
ansible.builtin.debug:
msg: |
================================================================================
Gitea Container Status:
================================================================================
{{ gitea_status.stdout }}
================================================================================
- name: Check Gitea health endpoint (direct from container)
ansible.builtin.shell: |
cd /home/deploy/deployment/stacks/gitea
docker compose exec -T gitea curl -f http://localhost:3000/api/healthz 2>&1 || echo "HEALTH_CHECK_FAILED"
register: gitea_health_direct
changed_when: false
failed_when: false
- name: Display Gitea health (direct)
ansible.builtin.debug:
msg: |
================================================================================
Gitea Health Check (direct from container):
================================================================================
{% if 'HEALTH_CHECK_FAILED' not in gitea_health_direct.stdout %}
✅ Gitea is healthy (direct check)
Response: {{ gitea_health_direct.stdout }}
{% else %}
❌ Gitea health check failed (direct)
Error: {{ gitea_health_direct.stdout }}
{% endif %}
================================================================================
- name: Check Gitea health endpoint (via Traefik)
ansible.builtin.uri:
url: "https://git.michaelschiemer.de/api/healthz"
method: GET
status_code: [200]
validate_certs: false
timeout: 10
register: gitea_health_traefik
failed_when: false
changed_when: false
- name: Display Gitea health (via Traefik)
ansible.builtin.debug:
msg: |
================================================================================
Gitea Health Check (via Traefik):
================================================================================
{% if gitea_health_traefik.status == 200 %}
✅ Gitea is reachable via Traefik
Status: {{ gitea_health_traefik.status }}
{% else %}
❌ Gitea is NOT reachable via Traefik
Status: {{ gitea_health_traefik.status | default('TIMEOUT/ERROR') }}
Message: {{ gitea_health_traefik.msg | default('No response') }}
{% endif %}
================================================================================
- name: Check Traefik container status
ansible.builtin.shell: |
cd /home/deploy/deployment/stacks/traefik
docker compose ps traefik
register: traefik_status
changed_when: false
- name: Display Traefik container status
ansible.builtin.debug:
msg: |
================================================================================
Traefik Container Status:
================================================================================
{{ traefik_status.stdout }}
================================================================================
- name: Check Redis container status
ansible.builtin.shell: |
cd /home/deploy/deployment/stacks/gitea
docker compose ps redis
register: redis_status
changed_when: false
- name: Display Redis container status
ansible.builtin.debug:
msg: |
================================================================================
Redis Container Status:
================================================================================
{{ redis_status.stdout }}
================================================================================
- name: Check PostgreSQL container status
ansible.builtin.shell: |
cd /home/deploy/deployment/stacks/gitea
docker compose ps postgres
register: postgres_status
changed_when: false
- name: Display PostgreSQL container status
ansible.builtin.debug:
msg: |
================================================================================
PostgreSQL Container Status:
================================================================================
{{ postgres_status.stdout }}
================================================================================
- name: Check Gitea container IP in traefik-public network
ansible.builtin.shell: |
docker inspect gitea --format '{{ '{{' }}range .NetworkSettings.Networks{{ '}}' }}{{ '{{' }}if eq .NetworkID (docker network inspect traefik-public --format "{{ '{{' }}.Id{{ '}}' }}"){{ '}}' }}{{ '{{' }}.IPAddress{{ '}}' }}{{ '{{' }}end{{ '}}' }}{{ '{{' }}end{{ '}}' }}' 2>/dev/null || echo "NOT_FOUND"
register: gitea_ip
changed_when: false
failed_when: false
- name: Display Gitea IP in traefik-public network
ansible.builtin.debug:
msg: |
================================================================================
Gitea IP in traefik-public Network:
================================================================================
{% if gitea_ip.stdout and gitea_ip.stdout != 'NOT_FOUND' %}
✅ Gitea IP: {{ gitea_ip.stdout }}
{% else %}
❌ Gitea IP not found in traefik-public network
{% endif %}
================================================================================
- name: Test connection from Traefik to Gitea
ansible.builtin.shell: |
cd /home/deploy/deployment/stacks/traefik
docker compose exec -T traefik wget -qO- --timeout=5 http://gitea:3000/api/healthz 2>&1 || echo "CONNECTION_FAILED"
register: traefik_gitea_connection
changed_when: false
failed_when: false
- name: Display Traefik-Gitea connection test
ansible.builtin.debug:
msg: |
================================================================================
Traefik → Gitea Connection Test:
================================================================================
{% if 'CONNECTION_FAILED' in traefik_gitea_connection.stdout %}
❌ Traefik cannot reach Gitea
Error: {{ traefik_gitea_connection.stdout }}
{% else %}
✅ Traefik can reach Gitea
Response: {{ traefik_gitea_connection.stdout }}
{% endif %}
================================================================================
- name: Check Traefik routing configuration for Gitea
ansible.builtin.shell: |
docker inspect gitea --format '{{ '{{' }}json .Config.Labels{{ '}}' }}' 2>/dev/null | grep -i "traefik" || echo "NO_TRAEFIK_LABELS"
register: traefik_labels
changed_when: false
failed_when: false
- name: Display Traefik labels for Gitea
ansible.builtin.debug:
msg: |
================================================================================
Traefik Labels for Gitea:
================================================================================
{{ traefik_labels.stdout }}
================================================================================
- name: Check Gitea logs for errors
ansible.builtin.shell: |
cd /home/deploy/deployment/stacks/gitea
docker compose logs gitea --tail=50 2>&1 | grep -iE "error|timeout|failed|panic|fatal" | tail -20 || echo "No errors in recent logs"
register: gitea_errors
changed_when: false
failed_when: false
- name: Display Gitea errors
ansible.builtin.debug:
msg: |
================================================================================
Gitea Error Logs (last 50 lines):
================================================================================
{{ gitea_errors.stdout }}
================================================================================
- name: Check Traefik logs for Gitea-related errors
ansible.builtin.shell: |
cd /home/deploy/deployment/stacks/traefik
docker compose logs traefik --tail=50 2>&1 | grep -iE "gitea|git\.michaelschiemer\.de|timeout|error" | tail -20 || echo "No Gitea-related errors in Traefik logs"
register: traefik_gitea_errors
changed_when: false
failed_when: false
- name: Display Traefik Gitea errors
ansible.builtin.debug:
msg: |
================================================================================
Traefik Gitea-Related Error Logs (last 50 lines):
================================================================================
{{ traefik_gitea_errors.stdout }}
================================================================================
- name: Check if Gitea is in traefik-public network
ansible.builtin.shell: |
docker network inspect traefik-public --format '{{ '{{' }}range .Containers{{ '}}' }}{{ '{{' }}.Name{{ '}}' }} {{ '{{' }}end{{ '}}' }}' 2>/dev/null | grep -q gitea && echo "YES" || echo "NO"
register: gitea_in_traefik_network
changed_when: false
failed_when: false
- name: Display Gitea network membership
ansible.builtin.debug:
msg: |
================================================================================
Gitea in traefik-public Network:
================================================================================
{% if gitea_in_traefik_network.stdout == 'YES' %}
✅ Gitea is in traefik-public network
{% else %}
❌ Gitea is NOT in traefik-public network
{% endif %}
================================================================================
- name: Check Redis connection from Gitea
ansible.builtin.shell: |
cd /home/deploy/deployment/stacks/gitea
docker compose exec -T gitea sh -c "redis-cli -h redis -p 6379 -a gitea_redis_password ping 2>&1" || echo "REDIS_CONNECTION_FAILED"
register: gitea_redis_connection
changed_when: false
failed_when: false
- name: Display Gitea-Redis connection
ansible.builtin.debug:
msg: |
================================================================================
Gitea → Redis Connection:
================================================================================
{% if 'REDIS_CONNECTION_FAILED' in gitea_redis_connection.stdout %}
❌ Gitea cannot connect to Redis
Error: {{ gitea_redis_connection.stdout }}
{% else %}
✅ Gitea can connect to Redis
Response: {{ gitea_redis_connection.stdout }}
{% endif %}
================================================================================
- name: Check PostgreSQL connection from Gitea
ansible.builtin.shell: |
cd /home/deploy/deployment/stacks/gitea
docker compose exec -T gitea sh -c "pg_isready -h postgres -p 5432 -U gitea 2>&1" || echo "POSTGRES_CONNECTION_FAILED"
register: gitea_postgres_connection
changed_when: false
failed_when: false
- name: Display Gitea-PostgreSQL connection
ansible.builtin.debug:
msg: |
================================================================================
Gitea → PostgreSQL Connection:
================================================================================
{% if 'POSTGRES_CONNECTION_FAILED' in gitea_postgres_connection.stdout %}
❌ Gitea cannot connect to PostgreSQL
Error: {{ gitea_postgres_connection.stdout }}
{% else %}
✅ Gitea can connect to PostgreSQL
Response: {{ gitea_postgres_connection.stdout }}
{% endif %}
================================================================================
- name: Summary and recommendations
ansible.builtin.debug:
msg: |
================================================================================
ZUSAMMENFASSUNG - Gitea Timeout Diagnose:
================================================================================
Gitea Status: {{ gitea_status.stdout | regex_replace('.*(Up|Down|Restarting).*', '\\1') | default('UNKNOWN') }}
Gitea Health (direct): {% if 'HEALTH_CHECK_FAILED' not in gitea_health_direct.stdout %}✅{% else %}❌{% endif %}
Gitea Health (via Traefik): {% if gitea_health_traefik.status == 200 %}✅{% else %}❌{% endif %}
Traefik Status: {{ traefik_status.stdout | regex_replace('.*(Up|Down|Restarting).*', '\\1') | default('UNKNOWN') }}
Redis Status: {{ redis_status.stdout | regex_replace('.*(Up|Down|Restarting).*', '\\1') | default('UNKNOWN') }}
PostgreSQL Status: {{ postgres_status.stdout | regex_replace('.*(Up|Down|Restarting).*', '\\1') | default('UNKNOWN') }}
Netzwerk:
- Gitea in traefik-public: {% if gitea_in_traefik_network.stdout == 'YES' %}✅{% else %}❌{% endif %}
- Traefik → Gitea: {% if 'CONNECTION_FAILED' not in traefik_gitea_connection.stdout %}✅{% else %}❌{% endif %}
- Gitea → Redis: {% if 'REDIS_CONNECTION_FAILED' not in gitea_redis_connection.stdout %}✅{% else %}❌{% endif %}
- Gitea → PostgreSQL: {% if 'POSTGRES_CONNECTION_FAILED' not in gitea_postgres_connection.stdout %}✅{% else %}❌{% endif %}
Empfohlene Aktionen:
{% if gitea_health_traefik.status != 200 %}
1. ❌ Gitea ist nicht über Traefik erreichbar
→ Führe 'fix-gitea-timeouts.yml' aus um Gitea und Traefik zu restarten
{% endif %}
{% if gitea_in_traefik_network.stdout != 'YES' %}
2. ❌ Gitea ist nicht im traefik-public Netzwerk
→ Gitea Container neu starten um Netzwerk-Verbindung zu aktualisieren
{% endif %}
{% if 'CONNECTION_FAILED' in traefik_gitea_connection.stdout %}
3. ❌ Traefik kann Gitea nicht erreichen
→ Beide Container neu starten
{% endif %}
{% if 'REDIS_CONNECTION_FAILED' in gitea_redis_connection.stdout %}
4. ❌ Gitea kann Redis nicht erreichen
→ Redis Container prüfen und neu starten
{% endif %}
{% if 'POSTGRES_CONNECTION_FAILED' in gitea_postgres_connection.stdout %}
5. ❌ Gitea kann PostgreSQL nicht erreichen
→ PostgreSQL Container prüfen und neu starten
{% endif %}
================================================================================

View File

@@ -1,477 +0,0 @@
---
# Diagnose: Finde Ursache für Traefik Restart-Loop
# Prüft alle möglichen Ursachen für regelmäßige Traefik-Restarts
- name: Diagnose Traefik Restart Loop
hosts: production
gather_facts: yes
become: yes
tasks:
- name: Check systemd timers
ansible.builtin.shell: |
systemctl list-timers --all --no-pager
register: systemd_timers
changed_when: false
- name: Display systemd timers
ansible.builtin.debug:
msg: |
================================================================================
Systemd Timers (können Container stoppen):
================================================================================
{{ systemd_timers.stdout }}
================================================================================
- name: Check root crontab
ansible.builtin.shell: |
crontab -l 2>/dev/null || echo "No root crontab"
register: root_crontab
changed_when: false
- name: Display root crontab
ansible.builtin.debug:
msg: |
================================================================================
Root Crontab:
================================================================================
{{ root_crontab.stdout }}
================================================================================
- name: Check deploy user crontab
ansible.builtin.shell: |
crontab -l -u deploy 2>/dev/null || echo "No deploy user crontab"
register: deploy_crontab
changed_when: false
- name: Display deploy user crontab
ansible.builtin.debug:
msg: |
================================================================================
Deploy User Crontab:
================================================================================
{{ deploy_crontab.stdout }}
================================================================================
- name: Check system-wide cron jobs
ansible.builtin.shell: |
echo "=== /etc/cron.d ==="
ls -la /etc/cron.d 2>/dev/null || echo "Directory not found"
grep -r "traefik\|docker.*compose.*traefik\|docker.*stop\|docker.*restart" /etc/cron.d 2>/dev/null || echo "No matches"
echo ""
echo "=== /etc/cron.daily ==="
ls -la /etc/cron.daily 2>/dev/null || echo "Directory not found"
grep -r "traefik\|docker.*compose.*traefik\|docker.*stop\|docker.*restart" /etc/cron.daily 2>/dev/null || echo "No matches"
echo ""
echo "=== /etc/cron.hourly ==="
ls -la /etc/cron.hourly 2>/dev/null || echo "Directory not found"
grep -r "traefik\|docker.*compose.*traefik\|docker.*stop\|docker.*restart" /etc/cron.hourly 2>/dev/null || echo "No matches"
echo ""
echo "=== /etc/cron.weekly ==="
ls -la /etc/cron.weekly 2>/dev/null || echo "Directory not found"
grep -r "traefik\|docker.*compose.*traefik\|docker.*stop\|docker.*restart" /etc/cron.weekly 2>/dev/null || echo "No matches"
echo ""
echo "=== /etc/cron.monthly ==="
ls -la /etc/cron.monthly 2>/dev/null || echo "Directory not found"
grep -r "traefik\|docker.*compose.*traefik\|docker.*stop\|docker.*restart" /etc/cron.monthly 2>/dev/null || echo "No matches"
register: system_cron
changed_when: false
- name: Display system cron jobs
ansible.builtin.debug:
msg: |
================================================================================
System-Wide Cron Jobs:
================================================================================
{{ system_cron.stdout }}
================================================================================
- name: Check for scripts that might restart Traefik
ansible.builtin.shell: |
find /home/deploy -type f -name "*.sh" -exec grep -l "traefik\|docker.*compose.*restart\|docker.*stop.*traefik\|docker.*down.*traefik" {} \; 2>/dev/null | head -20
register: traefik_scripts
changed_when: false
- name: Display scripts that might restart Traefik
ansible.builtin.debug:
msg: |
================================================================================
Scripts die Traefik stoppen/restarten könnten:
================================================================================
{% if traefik_scripts.stdout %}
{{ traefik_scripts.stdout }}
{% else %}
Keine Skripte gefunden
{% endif %}
================================================================================
- name: Check Docker events for Traefik container (last 24h)
ansible.builtin.shell: |
timeout 5 docker events --since 24h --filter container=traefik --format "{{ '{{' }}.Time{{ '}}' }} {{ '{{' }}.Action{{ '}}' }} {{ '{{' }}.Actor.Attributes.name{{ '}}' }}" 2>/dev/null | tail -50 || echo "No recent events or docker events not available"
register: docker_events
changed_when: false
- name: Display Docker events
ansible.builtin.debug:
msg: |
================================================================================
Docker Events für Traefik (letzte 24h):
================================================================================
{{ docker_events.stdout }}
================================================================================
- name: Check Traefik container exit history
ansible.builtin.shell: |
docker ps -a --filter "name=traefik" --format "{{ '{{' }}.ID{{ '}}' }} | {{ '{{' }}.Status{{ '}}' }} | {{ '{{' }}.CreatedAt{{ '}}' }}" | head -10
register: traefik_exits
changed_when: false
- name: Display Traefik container exit history
ansible.builtin.debug:
msg: |
================================================================================
Traefik Container Exit-Historie:
================================================================================
{{ traefik_exits.stdout }}
================================================================================
- name: Check Docker daemon logs for Traefik stops
ansible.builtin.shell: |
journalctl -u docker.service --since "24h ago" --no-pager | grep -i "traefik\|stop\|kill" | tail -50 || echo "No relevant logs in journalctl"
register: docker_daemon_logs
changed_when: false
- name: Display Docker daemon logs
ansible.builtin.debug:
msg: |
================================================================================
Docker Daemon Logs (Traefik/Stop/Kill):
================================================================================
{{ docker_daemon_logs.stdout }}
================================================================================
- name: Check if there's a health check script running
ansible.builtin.shell: |
ps aux | grep -E "traefik|health.*check|monitor.*docker|auto.*heal|watchdog" | grep -v grep || echo "No health check processes found"
register: health_check_processes
changed_when: false
- name: Display health check processes
ansible.builtin.debug:
msg: |
================================================================================
Laufende Health-Check/Monitoring-Prozesse:
================================================================================
{{ health_check_processes.stdout }}
================================================================================
- name: Check for monitoring/auto-heal scripts
ansible.builtin.shell: |
find /home/deploy -type f \( -name "*monitor*" -o -name "*health*" -o -name "*auto*heal*" -o -name "*watchdog*" \) 2>/dev/null | head -20
register: monitoring_scripts
changed_when: false
- name: Display monitoring scripts
ansible.builtin.debug:
msg: |
================================================================================
Monitoring/Auto-Heal-Skripte:
================================================================================
{% if monitoring_scripts.stdout %}
{{ monitoring_scripts.stdout }}
{% else %}
Keine Monitoring-Skripte gefunden
{% endif %}
================================================================================
- name: Check Docker Compose file for restart policies
ansible.builtin.shell: |
cd /home/deploy/deployment/stacks/traefik && grep -A 5 "restart:" docker-compose.yml || echo "No restart policy found"
register: restart_policy
changed_when: false
- name: Display restart policy
ansible.builtin.debug:
msg: |
================================================================================
Docker Compose Restart Policy:
================================================================================
{{ restart_policy.stdout }}
================================================================================
- name: Check if Traefik is managed by systemd
ansible.builtin.shell: |
systemctl list-units --type=service --all | grep -i traefik || echo "No Traefik systemd service found"
register: traefik_systemd
changed_when: false
- name: Display Traefik systemd service
ansible.builtin.debug:
msg: |
================================================================================
Traefik Systemd Service:
================================================================================
{{ traefik_systemd.stdout }}
================================================================================
- name: Check recent Traefik container logs for stop messages
ansible.builtin.shell: |
cd /home/deploy/deployment/stacks/traefik && docker compose logs traefik --since 24h 2>&1 | grep -E "I have to go|Stopping server gracefully|SIGTERM|SIGINT|received signal" | tail -20 || echo "No stop messages in logs"
register: traefik_stop_logs
changed_when: false
- name: Display Traefik stop messages
ansible.builtin.debug:
msg: |
================================================================================
Traefik Stop-Meldungen (letzte 24h):
================================================================================
{{ traefik_stop_logs.stdout }}
================================================================================
- name: Check Traefik container uptime and restart count
ansible.builtin.shell: |
docker inspect traefik --format '{{ '{{' }}.State.StartedAt{{ '}}' }} | {{ '{{' }}.State.FinishedAt{{ '}}' }} | Restarts: {{ '{{' }}.RestartCount{{ '}}' }}' 2>/dev/null || echo "Container not found"
register: traefik_uptime
changed_when: false
- name: Display Traefik uptime and restart count
ansible.builtin.debug:
msg: |
================================================================================
Traefik Container Uptime & Restart Count:
================================================================================
{{ traefik_uptime.stdout }}
================================================================================
- name: Check for unattended-upgrades activity
ansible.builtin.shell: |
journalctl -u unattended-upgrades --since "24h ago" --no-pager | tail -20 || echo "No unattended-upgrades logs"
register: unattended_upgrades
changed_when: false
- name: Display unattended-upgrades activity
ansible.builtin.debug:
msg: |
================================================================================
Unattended-Upgrades Aktivität (kann zu Reboots führen):
================================================================================
{{ unattended_upgrades.stdout }}
================================================================================
- name: Check system reboot history
ansible.builtin.shell: |
last reboot | head -10 || echo "No reboot history available"
register: reboot_history
changed_when: false
- name: Display reboot history
ansible.builtin.debug:
msg: |
================================================================================
System Reboot-Historie:
================================================================================
{{ reboot_history.stdout }}
================================================================================
- name: Check Docker Compose processes that might affect Traefik
ansible.builtin.shell: |
ps aux | grep -E "docker.*compose.*traefik|docker-compose.*traefik" | grep -v grep || echo "No docker compose processes for Traefik found"
register: docker_compose_processes
changed_when: false
- name: Display Docker Compose processes
ansible.builtin.debug:
msg: |
================================================================================
Docker Compose Prozesse für Traefik:
================================================================================
{{ docker_compose_processes.stdout }}
================================================================================
- name: Check all user crontabs (not just root/deploy)
ansible.builtin.shell: |
for user in $(cut -f1 -d: /etc/passwd); do
crontab -u "$user" -l 2>/dev/null | grep -q "traefik\|docker.*compose.*traefik\|docker.*restart.*traefik" && echo "=== User: $user ===" && crontab -u "$user" -l 2>/dev/null | grep -E "traefik|docker.*compose.*traefik|docker.*restart.*traefik" || true
done || echo "No user crontabs with Traefik commands found"
register: all_user_crontabs
changed_when: false
- name: Display all user crontabs with Traefik commands
ansible.builtin.debug:
msg: |
================================================================================
Alle User-Crontabs mit Traefik-Befehlen:
================================================================================
{{ all_user_crontabs.stdout }}
================================================================================
- name: Check for Gitea Workflows that might restart Traefik
ansible.builtin.shell: |
find /home/deploy -type f -path "*/.gitea/workflows/*.yml" -o -path "*/.github/workflows/*.yml" 2>/dev/null | xargs grep -l "traefik\|restart.*traefik\|docker.*compose.*traefik" 2>/dev/null | head -10 || echo "No Gitea/GitHub workflows found that restart Traefik"
register: gitea_workflows
changed_when: false
- name: Display Gitea Workflows that might restart Traefik
ansible.builtin.debug:
msg: |
================================================================================
Gitea/GitHub Workflows die Traefik restarten könnten:
================================================================================
{{ gitea_workflows.stdout }}
================================================================================
- name: Check for custom systemd services in /etc/systemd/system/
ansible.builtin.shell: |
find /etc/systemd/system -type f -name "*.service" -o -name "*.timer" 2>/dev/null | xargs grep -l "traefik\|docker.*compose.*traefik\|docker.*restart.*traefik" 2>/dev/null | head -10 || echo "No custom systemd services/timers found for Traefik"
register: custom_systemd_services
changed_when: false
- name: Display custom systemd services
ansible.builtin.debug:
msg: |
================================================================================
Custom Systemd Services/Timers für Traefik:
================================================================================
{{ custom_systemd_services.stdout }}
================================================================================
- name: Check for at jobs (scheduled tasks)
ansible.builtin.shell: |
atq 2>/dev/null | while read line; do
job_id=$(echo "$line" | awk '{print $1}')
at -c "$job_id" 2>/dev/null | grep -q "traefik\|docker.*compose.*traefik\|docker.*restart.*traefik" && echo "=== Job ID: $job_id ===" && at -c "$job_id" 2>/dev/null | grep -E "traefik|docker.*compose.*traefik|docker.*restart.*traefik" || true
done || echo "No at jobs found or atq not available"
register: at_jobs
changed_when: false
- name: Display at jobs
ansible.builtin.debug:
msg: |
================================================================================
At Jobs (geplante Tasks) die Traefik betreffen:
================================================================================
{{ at_jobs.stdout }}
================================================================================
- name: Check for Docker Compose watch mode
ansible.builtin.shell: |
cd /home/deploy/deployment/stacks/traefik && docker compose ps --format json 2>/dev/null | jq -r '.[] | select(.Service=="traefik") | .State' || echo "Could not check Docker Compose watch mode"
register: docker_compose_watch
changed_when: false
- name: Check if Docker Compose watch is enabled
ansible.builtin.shell: |
cd /home/deploy/deployment/stacks/traefik && docker compose config 2>/dev/null | grep -i "watch\|x-develop" || echo "No watch mode configured"
register: docker_compose_watch_config
changed_when: false
- name: Display Docker Compose watch mode
ansible.builtin.debug:
msg: |
================================================================================
Docker Compose Watch Mode:
================================================================================
Watch Config: {{ docker_compose_watch_config.stdout }}
================================================================================
- name: Check Ansible traefik_auto_restart setting
ansible.builtin.shell: |
grep -r "traefik_auto_restart" /home/deploy/deployment/ansible/roles/traefik/defaults/ /home/deploy/deployment/ansible/inventory/ 2>/dev/null | head -10 || echo "traefik_auto_restart not found in Ansible config"
register: ansible_auto_restart
changed_when: false
- name: Display Ansible traefik_auto_restart setting
ansible.builtin.debug:
msg: |
================================================================================
Ansible traefik_auto_restart Einstellung:
================================================================================
{{ ansible_auto_restart.stdout }}
================================================================================
- name: Check Port 80/443 configuration
ansible.builtin.shell: |
echo "=== Port 80 ==="
netstat -tlnp 2>/dev/null | grep ":80 " || ss -tlnp 2>/dev/null | grep ":80 " || echo "Could not check port 80"
echo ""
echo "=== Port 443 ==="
netstat -tlnp 2>/dev/null | grep ":443 " || ss -tlnp 2>/dev/null | grep ":443 " || echo "Could not check port 443"
echo ""
echo "=== Docker Port Mappings for Traefik ==="
docker inspect traefik --format '{{ '{{' }}json .HostConfig.PortBindings{{ '}}' }}' 2>/dev/null | jq '.' || echo "Could not get Docker port mappings"
register: port_config
changed_when: false
- name: Display Port configuration
ansible.builtin.debug:
msg: |
================================================================================
Port-Konfiguration (80/443):
================================================================================
{{ port_config.stdout }}
================================================================================
- name: Check if other services are blocking ports 80/443
ansible.builtin.shell: |
echo "=== Services listening on port 80 ==="
lsof -i :80 2>/dev/null || fuser 80/tcp 2>/dev/null || echo "Could not check port 80"
echo ""
echo "=== Services listening on port 443 ==="
lsof -i :443 2>/dev/null || fuser 443/tcp 2>/dev/null || echo "Could not check port 443"
register: port_blockers
changed_when: false
- name: Display port blockers
ansible.builtin.debug:
msg: |
================================================================================
Services die Ports 80/443 blockieren könnten:
================================================================================
{{ port_blockers.stdout }}
================================================================================
- name: Check Traefik network configuration
ansible.builtin.shell: |
docker inspect traefik --format '{{ '{{' }}json .NetworkSettings{{ '}}' }}' 2>/dev/null | jq '.Networks' || echo "Could not get Traefik network configuration"
register: traefik_network
changed_when: false
- name: Display Traefik network configuration
ansible.builtin.debug:
msg: |
================================================================================
Traefik Netzwerk-Konfiguration:
================================================================================
{{ traefik_network.stdout }}
================================================================================
- name: Summary - Most likely causes
ansible.builtin.debug:
msg: |
================================================================================
ZUSAMMENFASSUNG - Mögliche Ursachen für Traefik-Restarts:
================================================================================
Prüfe die obigen Ausgaben auf:
1. Systemd-Timer: Können Container stoppen (z.B. unattended-upgrades)
2. Cronjobs: Regelmäßige Skripte die Traefik stoppen (alle User-Crontabs geprüft)
3. Docker-Events: Zeigen wer/was den Container stoppt
4. Monitoring-Skripte: Auto-Heal-Skripte die bei Fehlern restarten
5. Unattended-Upgrades: Können zu Reboots führen
6. Reboot-Historie: System-Reboots stoppen alle Container
7. Gitea Workflows: Können Traefik via Ansible restarten
8. Custom Systemd Services: Eigene Services die Traefik verwalten
9. At Jobs: Geplante Tasks die Traefik stoppen
10. Docker Compose Watch Mode: Automatische Restarts bei Dateiänderungen
11. Ansible traefik_auto_restart: Automatische Restarts nach Config-Deployment
12. Port-Konfiguration: Ports 80/443 müssen auf Traefik zeigen
Nächste Schritte:
- Prüfe die Docker-Events für wiederkehrende Muster
- Prüfe alle User-Crontabs auf regelmäßige Traefik-Befehle
- Prüfe ob Monitoring-Skripte zu aggressiv sind
- Prüfe ob unattended-upgrades zu Reboots führt
- Prüfe ob traefik_auto_restart zu häufigen Restarts führt
- Verifiziere Port-Konfiguration (80/443)
================================================================================

View File

@@ -0,0 +1,403 @@
---
# Consolidated Gitea Diagnosis Playbook
# Consolidates: diagnose-gitea-timeouts.yml, diagnose-gitea-timeout-deep.yml,
# diagnose-gitea-timeout-live.yml, diagnose-gitea-timeouts-complete.yml,
# comprehensive-gitea-diagnosis.yml
#
# Usage:
# # Basic diagnosis (default)
# ansible-playbook -i inventory/production.yml playbooks/diagnose/gitea.yml
#
# # Deep diagnosis (includes resource checks, multiple connection tests)
# ansible-playbook -i inventory/production.yml playbooks/diagnose/gitea.yml --tags deep
#
# # Live diagnosis (monitors during request)
# ansible-playbook -i inventory/production.yml playbooks/diagnose/gitea.yml --tags live
#
# # Complete diagnosis (all checks)
# ansible-playbook -i inventory/production.yml playbooks/diagnose/gitea.yml --tags complete
- name: Diagnose Gitea Issues
hosts: production
gather_facts: yes
become: no
vars:
gitea_stack_path: "{{ stacks_base_path }}/gitea"
traefik_stack_path: "{{ stacks_base_path }}/traefik"
gitea_url: "https://{{ gitea_domain }}"
gitea_container_name: "gitea"
traefik_container_name: "traefik"
tasks:
# ========================================
# BASIC DIAGNOSIS (always runs)
# ========================================
- name: Display diagnostic plan
ansible.builtin.debug:
msg: |
================================================================================
GITEA DIAGNOSIS
================================================================================
Running diagnosis with tags: {{ ansible_run_tags | default(['all']) }}
Basic checks (always):
- Container status
- Health endpoints
- Network connectivity
- Service discovery
Deep checks (--tags deep):
- Resource usage
- Multiple connection tests
- Log analysis
Live checks (--tags live):
- Real-time monitoring during request
Complete checks (--tags complete):
- All checks including app.ini, ServersTransport, etc.
================================================================================
- name: Check Gitea container status
ansible.builtin.shell: |
cd {{ gitea_stack_path }}
docker compose ps {{ gitea_container_name }}
register: gitea_status
changed_when: false
- name: Check Traefik container status
ansible.builtin.shell: |
cd {{ traefik_stack_path }}
docker compose ps {{ traefik_container_name }}
register: traefik_status
changed_when: false
- name: Check Gitea health endpoint (direct from container)
ansible.builtin.shell: |
cd {{ gitea_stack_path }}
docker compose exec -T {{ gitea_container_name }} curl -f http://localhost:3000/api/healthz 2>&1 || echo "HEALTH_CHECK_FAILED"
register: gitea_health_direct
changed_when: false
failed_when: false
- name: Check Gitea health endpoint (via Traefik)
ansible.builtin.uri:
url: "{{ gitea_url }}/api/healthz"
method: GET
status_code: [200]
validate_certs: false
timeout: 10
register: gitea_health_traefik
failed_when: false
changed_when: false
- name: Check if Gitea is in traefik-public network
ansible.builtin.shell: |
docker network inspect traefik-public --format '{{ '{{' }}range .Containers{{ '}}' }}{{ '{{' }}.Name{{ '}}' }} {{ '{{' }}end{{ '}}' }}' 2>/dev/null | grep -q {{ gitea_container_name }} && echo "YES" || echo "NO"
register: gitea_in_traefik_network
changed_when: false
failed_when: false
- name: Test connection from Traefik to Gitea
ansible.builtin.shell: |
cd {{ traefik_stack_path }}
docker compose exec -T {{ traefik_container_name }} wget -qO- --timeout=5 http://{{ gitea_container_name }}:3000/api/healthz 2>&1 || echo "CONNECTION_FAILED"
register: traefik_gitea_connection
changed_when: false
failed_when: false
- name: Check Traefik service discovery for Gitea
ansible.builtin.shell: |
cd {{ traefik_stack_path }}
docker compose exec -T {{ traefik_container_name }} traefik show providers docker 2>/dev/null | grep -i "gitea" || echo "NOT_FOUND"
register: traefik_gitea_service
changed_when: false
failed_when: false
# ========================================
# DEEP DIAGNOSIS (--tags deep)
# ========================================
- name: Check Gitea container resources (CPU/Memory)
ansible.builtin.shell: |
docker stats {{ gitea_container_name }} --no-stream --format 'CPU: {{ '{{' }}.CPUPerc{{ '}}' }} | Memory: {{ '{{' }}.MemUsage{{ '}}' }}' 2>/dev/null || echo "Could not get stats"
register: gitea_resources
changed_when: false
failed_when: false
tags:
- deep
- complete
- name: Check Traefik container resources (CPU/Memory)
ansible.builtin.shell: |
docker stats {{ traefik_container_name }} --no-stream --format 'CPU: {{ '{{' }}.CPUPerc{{ '}}' }} | Memory: {{ '{{' }}.MemUsage{{ '}}' }}' 2>/dev/null || echo "Could not get stats"
register: traefik_resources
changed_when: false
failed_when: false
tags:
- deep
- complete
- name: Test Gitea direct connection (multiple attempts)
ansible.builtin.shell: |
for i in {1..5}; do
echo "=== Attempt $i ==="
cd {{ gitea_stack_path }}
timeout 5 docker compose exec -T {{ gitea_container_name }} curl -f http://localhost:3000/api/healthz 2>&1 || echo "FAILED"
sleep 1
done
register: gitea_direct_tests
changed_when: false
tags:
- deep
- complete
- name: Test Gitea via Traefik (multiple attempts)
ansible.builtin.shell: |
for i in {1..5}; do
echo "=== Attempt $i ==="
timeout 10 curl -k -s -o /dev/null -w "%{http_code}" {{ gitea_url }}/api/healthz 2>&1 || echo "TIMEOUT"
sleep 2
done
register: gitea_traefik_tests
changed_when: false
tags:
- deep
- complete
- name: Check Gitea logs for errors/timeouts
ansible.builtin.shell: |
cd {{ gitea_stack_path }}
docker compose logs {{ gitea_container_name }} --tail=50 2>&1 | grep -iE "error|timeout|failed|panic|fatal" | tail -20 || echo "No errors in recent logs"
register: gitea_errors
changed_when: false
failed_when: false
tags:
- deep
- complete
- name: Check Traefik logs for Gitea-related errors
ansible.builtin.shell: |
cd {{ traefik_stack_path }}
docker compose logs {{ traefik_container_name }} --tail=50 2>&1 | grep -iE "gitea|git\.michaelschiemer\.de|timeout|error" | tail -20 || echo "No Gitea-related errors in Traefik logs"
register: traefik_gitea_errors
changed_when: false
failed_when: false
tags:
- deep
- complete
# ========================================
# COMPLETE DIAGNOSIS (--tags complete)
# ========================================
- name: Test Gitea internal port (127.0.0.1:3000)
ansible.builtin.shell: |
docker exec {{ gitea_container_name }} curl -sS -I http://127.0.0.1:3000/ 2>&1 | head -5
register: gitea_internal_test
changed_when: false
failed_when: false
tags:
- complete
- name: Test Traefik to Gitea via Docker DNS (gitea:3000)
ansible.builtin.shell: |
docker exec {{ traefik_container_name }} sh -lc 'apk add --no-cache curl >/dev/null 2>&1 || true; curl -sS -I http://gitea:3000/ 2>&1' | head -10
register: traefik_gitea_dns_test
changed_when: false
failed_when: false
tags:
- complete
- name: Check Traefik logs for 504 errors
ansible.builtin.shell: |
docker logs {{ traefik_container_name }} --tail=100 2>&1 | grep -i "504\|timeout" | tail -20 || echo "No 504/timeout errors found"
register: traefik_504_logs
changed_when: false
failed_when: false
tags:
- complete
- name: Check Gitea Traefik labels
ansible.builtin.shell: |
docker inspect {{ gitea_container_name }} --format '{{ '{{' }}json .Config.Labels{{ '}}' }}' 2>/dev/null | python3 -m json.tool | grep -E "traefik" || echo "No Traefik labels found"
register: gitea_labels
changed_when: false
failed_when: false
tags:
- complete
- name: Verify service port is 3000
ansible.builtin.shell: |
docker inspect {{ gitea_container_name }} --format '{{ '{{' }}json .Config.Labels{{ '}}' }}' 2>/dev/null | python3 -c "import sys, json; labels = json.load(sys.stdin); print('server.port:', labels.get('traefik.http.services.gitea.loadbalancer.server.port', 'NOT SET'))"
register: gitea_service_port
changed_when: false
failed_when: false
tags:
- complete
- name: Check ServersTransport configuration
ansible.builtin.shell: |
docker inspect {{ gitea_container_name }} --format '{{ '{{' }}json .Config.Labels{{ '}}' }}' 2>/dev/null | python3 -c "
import sys, json
labels = json.load(sys.stdin)
transport = labels.get('traefik.http.services.gitea.loadbalancer.serversTransport', '')
if transport:
print('ServersTransport:', transport)
print('dialtimeout:', labels.get('traefik.http.serverstransports.gitea-transport.forwardingtimeouts.dialtimeout', 'NOT SET'))
print('responseheadertimeout:', labels.get('traefik.http.serverstransports.gitea-transport.forwardingtimeouts.responseheadertimeout', 'NOT SET'))
print('idleconntimeout:', labels.get('traefik.http.serverstransports.gitea-transport.forwardingtimeouts.idleconntimeout', 'NOT SET'))
print('maxidleconnsperhost:', labels.get('traefik.http.serverstransports.gitea-transport.maxidleconnsperhost', 'NOT SET'))
else:
print('ServersTransport: NOT CONFIGURED')
"
register: gitea_timeout_config
changed_when: false
failed_when: false
tags:
- complete
- name: Check Gitea app.ini proxy settings
ansible.builtin.shell: |
cd {{ gitea_stack_path }}
docker compose exec -T {{ gitea_container_name }} cat /data/gitea/conf/app.ini 2>/dev/null | grep -E "PROXY_TRUSTED_PROXIES|LOCAL_ROOT_URL|COOKIE_SECURE|SAME_SITE" || echo "Proxy settings not found in app.ini"
register: gitea_proxy_settings
changed_when: false
failed_when: false
tags:
- complete
- name: Check if Traefik can resolve Gitea hostname
ansible.builtin.shell: |
docker exec {{ traefik_container_name }} getent hosts {{ gitea_container_name }} || echo "DNS resolution failed"
register: traefik_dns_resolution
changed_when: false
failed_when: false
tags:
- complete
- name: Check Docker networks for Gitea and Traefik
ansible.builtin.shell: |
docker inspect {{ gitea_container_name }} --format '{{ '{{' }}json .NetworkSettings.Networks{{ '}}' }}' | python3 -c "import sys, json; data=json.load(sys.stdin); print('Gitea networks:', list(data.keys()))"
docker inspect {{ traefik_container_name }} --format '{{ '{{' }}json .NetworkSettings.Networks{{ '}}' }}' | python3 -c "import sys, json; data=json.load(sys.stdin); print('Traefik networks:', list(data.keys()))"
register: docker_networks_check
changed_when: false
failed_when: false
tags:
- complete
- name: Test long-running endpoint from external
ansible.builtin.uri:
url: "{{ gitea_url }}/user/events"
method: GET
status_code: [200, 504]
validate_certs: false
timeout: 60
register: long_running_endpoint_test
changed_when: false
failed_when: false
tags:
- complete
- name: Check Redis connection from Gitea
ansible.builtin.shell: |
cd {{ gitea_stack_path }}
docker compose exec -T {{ gitea_container_name }} sh -c "redis-cli -h redis -a {{ vault_gitea_redis_password | default('gitea_redis_password') }} ping 2>&1" || echo "REDIS_CONNECTION_FAILED"
register: gitea_redis_connection
changed_when: false
failed_when: false
tags:
- complete
- name: Check PostgreSQL connection from Gitea
ansible.builtin.shell: |
cd {{ gitea_stack_path }}
docker compose exec -T {{ gitea_container_name }} sh -c "pg_isready -h postgres -p 5432 -U gitea 2>&1" || echo "POSTGRES_CONNECTION_FAILED"
register: gitea_postgres_connection
changed_when: false
failed_when: false
tags:
- complete
# ========================================
# SUMMARY
# ========================================
- name: Summary
ansible.builtin.debug:
msg: |
================================================================================
GITEA DIAGNOSIS SUMMARY
================================================================================
Container Status:
- Gitea: {{ gitea_status.stdout | regex_replace('.*(Up|Down|Restarting).*', '\\1') | default('UNKNOWN') }}
- Traefik: {{ traefik_status.stdout | regex_replace('.*(Up|Down|Restarting).*', '\\1') | default('UNKNOWN') }}
Health Checks:
- Gitea (direct): {% if 'HEALTH_CHECK_FAILED' not in gitea_health_direct.stdout %}✅{% else %}❌{% endif %}
- Gitea (via Traefik): {% if gitea_health_traefik.status == 200 %}✅{% else %}❌ (Status: {{ gitea_health_traefik.status | default('TIMEOUT') }}){% endif %}
Network:
- Gitea in traefik-public: {% if gitea_in_traefik_network.stdout == 'YES' %}✅{% else %}❌{% endif %}
- Traefik → Gitea: {% if 'CONNECTION_FAILED' not in traefik_gitea_connection.stdout %}✅{% else %}❌{% endif %}
Service Discovery:
- Traefik finds Gitea: {% if 'NOT_FOUND' not in traefik_gitea_service.stdout %}✅{% else %}❌{% endif %}
{% if 'deep' in ansible_run_tags or 'complete' in ansible_run_tags %}
Resources:
- Gitea: {{ gitea_resources.stdout | default('N/A') }}
- Traefik: {{ traefik_resources.stdout | default('N/A') }}
Connection Tests:
- Direct (5 attempts): {{ gitea_direct_tests.stdout | default('N/A') }}
- Via Traefik (5 attempts): {{ gitea_traefik_tests.stdout | default('N/A') }}
Error Logs:
- Gitea: {{ gitea_errors.stdout | default('No errors') }}
- Traefik: {{ traefik_gitea_errors.stdout | default('No errors') }}
{% endif %}
{% if 'complete' in ansible_run_tags %}
Configuration:
- Service Port: {{ gitea_service_port.stdout | default('N/A') }}
- ServersTransport: {{ gitea_timeout_config.stdout | default('N/A') }}
- Proxy Settings: {{ gitea_proxy_settings.stdout | default('N/A') }}
- DNS Resolution: {{ traefik_dns_resolution.stdout | default('N/A') }}
- Networks: {{ docker_networks_check.stdout | default('N/A') }}
Long-Running Endpoint:
- Status: {{ long_running_endpoint_test.status | default('N/A') }}
Dependencies:
- Redis: {% if 'REDIS_CONNECTION_FAILED' not in gitea_redis_connection.stdout %}✅{% else %}❌{% endif %}
- PostgreSQL: {% if 'POSTGRES_CONNECTION_FAILED' not in gitea_postgres_connection.stdout %}✅{% else %}❌{% endif %}
{% endif %}
================================================================================
RECOMMENDATIONS
================================================================================
{% if gitea_health_traefik.status != 200 %}
❌ Gitea is not reachable via Traefik
→ Run: ansible-playbook -i inventory/production.yml playbooks/manage/gitea.yml --tags restart
{% endif %}
{% if gitea_in_traefik_network.stdout != 'YES' %}
❌ Gitea is not in traefik-public network
→ Restart Gitea container to update network membership
{% endif %}
{% if 'CONNECTION_FAILED' in traefik_gitea_connection.stdout %}
❌ Traefik cannot reach Gitea
→ Restart both containers
{% endif %}
{% if 'NOT_FOUND' in traefik_gitea_service.stdout %}
❌ Gitea not found in Traefik service discovery
→ Restart Traefik to refresh service discovery
{% endif %}
================================================================================

View File

@@ -0,0 +1,229 @@
---
# Consolidated Traefik Diagnosis Playbook
# Consolidates: diagnose-traefik-restarts.yml, find-traefik-restart-source.yml,
# monitor-traefik-restarts.yml, monitor-traefik-continuously.yml,
# verify-traefik-fix.yml
#
# Usage:
# # Basic diagnosis (default)
# ansible-playbook -i inventory/production.yml playbooks/diagnose/traefik.yml
#
# # Find restart source
# ansible-playbook -i inventory/production.yml playbooks/diagnose/traefik.yml --tags restart-source
#
# # Monitor restarts
# ansible-playbook -i inventory/production.yml playbooks/diagnose/traefik.yml --tags monitor
- name: Diagnose Traefik Issues
hosts: production
gather_facts: yes
become: yes
vars:
traefik_stack_path: "{{ stacks_base_path }}/traefik"
traefik_container_name: "traefik"
monitor_duration_seconds: "{{ monitor_duration_seconds | default(120) }}"
monitor_lookback_hours: "{{ monitor_lookback_hours | default(24) }}"
tasks:
- name: Display diagnostic plan
ansible.builtin.debug:
msg: |
================================================================================
TRAEFIK DIAGNOSIS
================================================================================
Running diagnosis with tags: {{ ansible_run_tags | default(['all']) }}
Basic checks (always):
- Container status
- Restart count
- Recent logs
Restart source (--tags restart-source):
- Find source of restart loops
- Check cronjobs, systemd, scripts
Monitor (--tags monitor):
- Monitor for restarts over time
================================================================================
# ========================================
# BASIC DIAGNOSIS (always runs)
# ========================================
- name: Check Traefik container status
ansible.builtin.shell: |
cd {{ traefik_stack_path }}
docker compose ps {{ traefik_container_name }}
register: traefik_status
changed_when: false
- name: Check Traefik container restart count
ansible.builtin.shell: |
docker inspect {{ traefik_container_name }} --format '{{ '{{' }}.RestartCount{{ '}}' }}' 2>/dev/null || echo "0"
register: traefik_restart_count
changed_when: false
- name: Check Traefik container start time
ansible.builtin.shell: |
docker inspect {{ traefik_container_name }} --format '{{ '{{' }}.State.StartedAt{{ '}}' }}' 2>/dev/null || echo "UNKNOWN"
register: traefik_started_at
changed_when: false
- name: Check Traefik logs for recent restarts
ansible.builtin.shell: |
cd {{ traefik_stack_path }}
docker compose logs {{ traefik_container_name }} --since 2h 2>&1 | grep -iE "stopping server gracefully|I have to go|restart|shutdown" | tail -20 || echo "No restart messages in last 2 hours"
register: traefik_restart_logs
changed_when: false
failed_when: false
- name: Check Traefik logs for errors
ansible.builtin.shell: |
cd {{ traefik_stack_path }}
docker compose logs {{ traefik_container_name }} --tail=100 2>&1 | grep -iE "error|warn|fail" | tail -20 || echo "No errors in recent logs"
register: traefik_error_logs
changed_when: false
failed_when: false
# ========================================
# RESTART SOURCE DIAGNOSIS (--tags restart-source)
# ========================================
- name: Check all user crontabs for Traefik/Docker commands
ansible.builtin.shell: |
for user in $(cut -f1 -d: /etc/passwd); do
crontab -u "$user" -l 2>/dev/null | grep -qE "traefik|docker.*compose.*traefik|docker.*stop.*traefik|docker.*restart.*traefik|docker.*down.*traefik" && echo "=== User: $user ===" && crontab -u "$user" -l 2>/dev/null | grep -E "traefik|docker.*compose.*traefik|docker.*stop.*traefik|docker.*restart.*traefik|docker.*down.*traefik" || true
done || echo "No user crontabs with Traefik commands found"
register: all_user_crontabs
changed_when: false
tags:
- restart-source
- name: Check system-wide cron directories
ansible.builtin.shell: |
for dir in /etc/cron.d /etc/cron.daily /etc/cron.hourly /etc/cron.weekly /etc/cron.monthly; do
if [ -d "$dir" ]; then
echo "=== $dir ==="
grep -rE "traefik|docker.*compose.*traefik|docker.*stop.*traefik|docker.*restart.*traefik|docker.*down.*traefik" "$dir" 2>/dev/null || echo "No matches"
fi
done
register: system_cron_dirs
changed_when: false
tags:
- restart-source
- name: Check systemd timers and services
ansible.builtin.shell: |
echo "=== Active Timers ==="
systemctl list-timers --all --no-pager | grep -E "traefik|docker.*compose" || echo "No Traefik-related timers"
echo ""
echo "=== Custom Services ==="
systemctl list-units --type=service --all | grep -E "traefik|docker.*compose" || echo "No Traefik-related services"
register: systemd_services
changed_when: false
tags:
- restart-source
- name: Check for scripts in deployment directory that restart Traefik
ansible.builtin.shell: |
find /home/deploy/deployment -type f \( -name "*.sh" -o -name "*.yml" -o -name "*.yaml" \) -exec grep -lE "traefik.*restart|docker.*compose.*traefik.*restart|docker.*compose.*traefik.*down|docker.*compose.*traefik.*stop" {} \; 2>/dev/null | head -30
register: deployment_scripts
changed_when: false
tags:
- restart-source
- name: Check Ansible roles for traefik_auto_restart or restart tasks
ansible.builtin.shell: |
grep -rE "traefik_auto_restart|traefik.*restart|docker.*compose.*traefik.*restart" /home/deploy/deployment/ansible/roles/ 2>/dev/null | grep -v ".git" | head -20 || echo "No auto-restart settings found"
register: ansible_auto_restart
changed_when: false
tags:
- restart-source
- name: Check Docker events for Traefik (last 24 hours)
ansible.builtin.shell: |
timeout 5 docker events --since 24h --filter container={{ traefik_container_name }} --filter event=die --format "{{ '{{' }}.Time{{ '}}' }} {{ '{{' }}.Action{{ '}}' }}" 2>/dev/null | tail -20 || echo "No Traefik die events found"
register: docker_events_traefik
changed_when: false
failed_when: false
tags:
- restart-source
# ========================================
# MONITOR (--tags monitor)
# ========================================
- name: Check Traefik logs for stop messages (lookback period)
ansible.builtin.shell: |
cd {{ traefik_stack_path }}
docker compose logs {{ traefik_container_name }} --since {{ monitor_lookback_hours }}h 2>&1 | grep -E "I have to go|Stopping server gracefully" | tail -20 || echo "No stop messages found"
register: traefik_stop_messages
changed_when: false
tags:
- monitor
- name: Count stop messages
ansible.builtin.set_fact:
stop_count: "{{ traefik_stop_messages.stdout | regex_findall('I have to go|Stopping server gracefully') | length }}"
tags:
- monitor
- name: Check system reboot history
ansible.builtin.shell: |
last reboot | head -5 || echo "No reboots found"
register: reboots
changed_when: false
tags:
- monitor
# ========================================
# SUMMARY
# ========================================
- name: Summary
ansible.builtin.debug:
msg: |
================================================================================
TRAEFIK DIAGNOSIS SUMMARY
================================================================================
Container Status:
- Status: {{ traefik_status.stdout | regex_replace('.*(Up|Down|Restarting).*', '\\1') | default('UNKNOWN') }}
- Restart Count: {{ traefik_restart_count.stdout }}
- Started At: {{ traefik_started_at.stdout }}
Recent Logs:
- Restart Messages (last 2h): {{ traefik_restart_logs.stdout | default('None') }}
- Errors (last 100 lines): {{ traefik_error_logs.stdout | default('None') }}
{% if 'restart-source' in ansible_run_tags %}
Restart Source Analysis:
- User Crontabs: {{ all_user_crontabs.stdout | default('None found') }}
- System Cron: {{ system_cron_dirs.stdout | default('None found') }}
- Systemd Services/Timers: {{ systemd_services.stdout | default('None found') }}
- Deployment Scripts: {{ deployment_scripts.stdout | default('None found') }}
- Ansible Auto-Restart: {{ ansible_auto_restart.stdout | default('None found') }}
- Docker Events: {{ docker_events_traefik.stdout | default('None found') }}
{% endif %}
{% if 'monitor' in ansible_run_tags %}
Monitoring (last {{ monitor_lookback_hours }} hours):
- Stop Messages: {{ stop_count | default(0) }}
- System Reboots: {{ reboots.stdout | default('None') }}
{% endif %}
================================================================================
RECOMMENDATIONS
================================================================================
{% if 'stopping server gracefully' in traefik_restart_logs.stdout | lower or 'I have to go' in traefik_restart_logs.stdout %}
❌ PROBLEM: Traefik is being stopped regularly!
→ Run with --tags restart-source to find the source
{% endif %}
{% if (traefik_restart_count.stdout | int) > 5 %}
⚠️ WARNING: High restart count ({{ traefik_restart_count.stdout }})
→ Check restart source: ansible-playbook -i inventory/production.yml playbooks/diagnose/traefik.yml --tags restart-source
{% endif %}
================================================================================

View File

@@ -1,136 +0,0 @@
---
# Disable Traefik Auto-Restarts
# Deaktiviert automatische Restarts nach Config-Deployment und entfernt Cronjobs/Systemd-Timer
- name: Disable Traefik Auto-Restarts
hosts: production
gather_facts: yes
become: yes
tasks:
- name: Check current traefik_auto_restart setting in Ansible defaults
ansible.builtin.shell: |
grep -r "traefik_auto_restart" /home/deploy/deployment/ansible/roles/traefik/defaults/main.yml 2>/dev/null || echo "Setting not found"
register: current_auto_restart_setting
changed_when: false
- name: Display current traefik_auto_restart setting
ansible.builtin.debug:
msg: |
================================================================================
Aktuelle traefik_auto_restart Einstellung:
================================================================================
{{ current_auto_restart_setting.stdout }}
================================================================================
- name: Check for cronjobs that restart Traefik
ansible.builtin.shell: |
for user in $(cut -f1 -d: /etc/passwd); do
crontab -u "$user" -l 2>/dev/null | grep -q "traefik\|docker.*compose.*traefik.*restart" && echo "=== User: $user ===" && crontab -u "$user" -l 2>/dev/null | grep -E "traefik|docker.*compose.*traefik.*restart" || true
done || echo "No cronjobs found that restart Traefik"
register: traefik_cronjobs
changed_when: false
- name: Display Traefik cronjobs
ansible.builtin.debug:
msg: |
================================================================================
Cronjobs die Traefik restarten:
================================================================================
{{ traefik_cronjobs.stdout }}
================================================================================
- name: Check for systemd timers that restart Traefik
ansible.builtin.shell: |
find /etc/systemd/system -type f -name "*.timer" 2>/dev/null | xargs grep -l "traefik\|docker.*compose.*traefik.*restart" 2>/dev/null | head -10 || echo "No systemd timers found for Traefik"
register: traefik_timers
changed_when: false
- name: Display Traefik systemd timers
ansible.builtin.debug:
msg: |
================================================================================
Systemd Timers die Traefik restarten:
================================================================================
{{ traefik_timers.stdout }}
================================================================================
- name: Check for systemd services that restart Traefik
ansible.builtin.shell: |
find /etc/systemd/system -type f -name "*.service" 2>/dev/null | xargs grep -l "traefik\|docker.*compose.*traefik.*restart" 2>/dev/null | head -10 || echo "No systemd services found for Traefik"
register: traefik_services
changed_when: false
- name: Display Traefik systemd services
ansible.builtin.debug:
msg: |
================================================================================
Systemd Services die Traefik restarten:
================================================================================
{{ traefik_services.stdout }}
================================================================================
- name: Summary - Found auto-restart mechanisms
ansible.builtin.debug:
msg: |
================================================================================
ZUSAMMENFASSUNG - Gefundene Auto-Restart-Mechanismen:
================================================================================
Ansible traefik_auto_restart: {{ current_auto_restart_setting.stdout }}
{% if traefik_cronjobs.stdout and 'No cronjobs' not in traefik_cronjobs.stdout %}
⚠️ Gefundene Cronjobs:
{{ traefik_cronjobs.stdout }}
Manuelle Deaktivierung erforderlich:
- Entferne die Cronjob-Einträge manuell
- Oder verwende: crontab -e
{% endif %}
{% if traefik_timers.stdout and 'No systemd timers' not in traefik_timers.stdout %}
⚠️ Gefundene Systemd Timers:
{{ traefik_timers.stdout }}
Manuelle Deaktivierung erforderlich:
- systemctl stop <timer-name>
- systemctl disable <timer-name>
{% endif %}
{% if traefik_services.stdout and 'No systemd services' not in traefik_services.stdout %}
⚠️ Gefundene Systemd Services:
{{ traefik_services.stdout }}
Manuelle Deaktivierung erforderlich:
- systemctl stop <service-name>
- systemctl disable <service-name>
{% endif %}
{% if 'No cronjobs' in traefik_cronjobs.stdout and 'No systemd timers' in traefik_timers.stdout and 'No systemd services' in traefik_services.stdout %}
✅ Keine automatischen Restart-Mechanismen gefunden (außer Ansible traefik_auto_restart)
{% endif %}
Empfehlung:
- Setze traefik_auto_restart: false in group_vars oder inventory
- Oder überschreibe bei Config-Deployment: -e "traefik_auto_restart=false"
================================================================================
- name: Note - Manual steps required
ansible.builtin.debug:
msg: |
================================================================================
HINWEIS - Manuelle Schritte erforderlich:
================================================================================
Dieses Playbook zeigt nur gefundene Auto-Restart-Mechanismen an.
Um traefik_auto_restart zu deaktivieren:
1. In group_vars/production/vars.yml oder inventory hinzufügen:
traefik_auto_restart: false
2. Oder bei jedem Config-Deployment überschreiben:
ansible-playbook ... -e "traefik_auto_restart=false"
3. Für Cronjobs/Systemd: Siehe oben für manuelle Deaktivierung
================================================================================

View File

@@ -1,90 +0,0 @@
---
# Ensure Gitea is Discovered by Traefik
# This playbook ensures that Traefik properly discovers Gitea after restarts
- name: Ensure Gitea is Discovered by Traefik
hosts: production
gather_facts: no
become: no
vars:
traefik_stack_path: "{{ stacks_base_path }}/traefik"
gitea_stack_path: "{{ stacks_base_path }}/gitea"
max_wait_seconds: 60
check_interval: 5
tasks:
- name: Check if Gitea container is running
ansible.builtin.shell: |
cd {{ gitea_stack_path }}
docker compose ps gitea | grep -q "Up" && echo "RUNNING" || echo "NOT_RUNNING"
register: gitea_status
changed_when: false
- name: Start Gitea if not running
ansible.builtin.shell: |
cd {{ gitea_stack_path }}
docker compose up -d gitea
when: gitea_status.stdout == "NOT_RUNNING"
register: gitea_start
- name: Wait for Gitea to be ready
ansible.builtin.wait_for:
timeout: 30
delay: 2
when: gitea_start.changed | default(false) | bool
- name: Check if Traefik can see Gitea container
ansible.builtin.shell: |
cd {{ traefik_stack_path }}
docker compose exec -T traefik sh -c 'wget -qO- http://localhost:8080/api/http/routers 2>&1 | python3 -m json.tool 2>&1 | grep -qi gitea && echo "FOUND" || echo "NOT_FOUND"'
register: traefik_gitea_check
changed_when: false
failed_when: false
retries: "{{ (max_wait_seconds | int) // (check_interval | int) }}"
delay: "{{ check_interval }}"
until: traefik_gitea_check.stdout == "FOUND"
- name: Restart Traefik if Gitea not found
ansible.builtin.shell: |
cd {{ traefik_stack_path }}
docker compose restart traefik
when: traefik_gitea_check.stdout == "NOT_FOUND"
register: traefik_restart
- name: Wait for Traefik to be ready after restart
ansible.builtin.wait_for:
timeout: 30
delay: 2
when: traefik_restart.changed | default(false) | bool
- name: Verify Gitea is reachable via Traefik
ansible.builtin.uri:
url: "https://{{ gitea_domain }}/api/healthz"
method: GET
status_code: [200]
validate_certs: false
timeout: 10
register: gitea_health_check
retries: 5
delay: 2
until: gitea_health_check.status == 200
failed_when: false
- name: Display result
ansible.builtin.debug:
msg: |
================================================================================
GITEA TRAEFIK DISCOVERY - RESULT
================================================================================
Gitea Status: {{ gitea_status.stdout }}
Traefik Discovery: {{ traefik_gitea_check.stdout }}
Gitea Health Check: {{ 'OK' if (gitea_health_check.status | default(0) == 200) else 'FAILED' }}
{% if gitea_health_check.status | default(0) == 200 %}
✅ Gitea is reachable via Traefik
{% else %}
❌ Gitea is not reachable via Traefik
{% endif %}
================================================================================

View File

@@ -1,246 +0,0 @@
---
# Find Ansible Automation Source
# Findet die Quelle der externen Ansible-Automatisierung, die Traefik regelmäßig neu startet
- name: Find Ansible Automation Source
hosts: production
gather_facts: yes
become: yes
tasks:
- name: Check for running Ansible processes
ansible.builtin.shell: |
ps aux | grep -E "ansible|ansible-playbook|ansible-pull" | grep -v grep || echo "No Ansible processes found"
register: ansible_processes
changed_when: false
- name: Check for ansible-pull processes
ansible.builtin.shell: |
ps aux | grep ansible-pull | grep -v grep || echo "No ansible-pull processes found"
register: ansible_pull_processes
changed_when: false
- name: Check systemd timers for ansible-pull
ansible.builtin.shell: |
systemctl list-timers --all --no-pager | grep -i ansible || echo "No ansible timers found"
register: ansible_timers
changed_when: false
- name: Check for ansible-pull cronjobs
ansible.builtin.shell: |
for user in $(cut -f1 -d: /etc/passwd); do
crontab -u "$user" -l 2>/dev/null | grep -q "ansible-pull\|ansible.*playbook" && echo "=== User: $user ===" && crontab -u "$user" -l 2>/dev/null | grep -E "ansible-pull|ansible.*playbook" || true
done || echo "No ansible-pull cronjobs found"
register: ansible_cronjobs
changed_when: false
- name: Check system-wide cron for ansible
ansible.builtin.shell: |
for dir in /etc/cron.d /etc/cron.daily /etc/cron.hourly /etc/cron.weekly /etc/cron.monthly; do
if [ -d "$dir" ]; then
grep -rE "ansible-pull|ansible.*playbook" "$dir" 2>/dev/null && echo "=== Found in $dir ===" || true
fi
done || echo "No ansible in system cron"
register: ansible_system_cron
changed_when: false
- name: Check journalctl for ansible-ansible processes
ansible.builtin.shell: |
journalctl --since "24 hours ago" --no-pager | grep -iE "ansible-ansible|ansible-playbook|ansible-pull" | tail -50 || echo "No ansible processes in journalctl"
register: ansible_journal
changed_when: false
- name: Check for ansible-pull configuration files
ansible.builtin.shell: |
find /home -name "*ansible-pull*" -o -name "*ansible*.yml" -path "*/ansible-pull/*" 2>/dev/null | head -20 || echo "No ansible-pull config files found"
register: ansible_pull_configs
changed_when: false
- name: Check for running docker compose commands related to Traefik
ansible.builtin.shell: |
ps aux | grep -E "docker.*compose.*traefik|docker.*restart.*traefik" | grep -v grep || echo "No docker compose traefik commands running"
register: docker_traefik_commands
changed_when: false
- name: Check Docker events for Traefik kill events (last hour)
ansible.builtin.shell: |
docker events --since 1h --until now --filter container=traefik --filter event=die --format "{{ '{{' }}.Time{{ '}}' }} {{ '{{' }}.Action{{ '}}' }} {{ '{{' }}.Actor.Attributes.signal{{ '}}' }}" 2>/dev/null | tail -20 || echo "No Traefik die events in last hour"
register: traefik_kill_events
changed_when: false
failed_when: false
- name: Check journalctl for docker compose traefik commands
ansible.builtin.shell: |
journalctl --since "24 hours ago" --no-pager | grep -iE "docker.*compose.*traefik|docker.*restart.*traefik" | tail -30 || echo "No docker compose traefik commands in journalctl"
register: docker_traefik_journal
changed_when: false
- name: Check for CI/CD scripts that might run Ansible
ansible.builtin.shell: |
find /home/deploy -type f \( -name "*.sh" -o -name "*.yml" -o -name "*.yaml" \) -exec grep -lE "ansible.*playbook.*traefik|docker.*compose.*traefik.*restart" {} \; 2>/dev/null | head -20 || echo "No CI/CD scripts found"
register: cicd_scripts
changed_when: false
- name: Check for Gitea Workflows that run Ansible
ansible.builtin.shell: |
find /home/deploy -type f -path "*/.gitea/workflows/*.yml" -o -path "*/.github/workflows/*.yml" 2>/dev/null | xargs grep -lE "ansible.*playbook.*traefik|docker.*compose.*traefik" 2>/dev/null | head -10 || echo "No Gitea workflows found"
register: gitea_workflows
changed_when: false
- name: Check for monitoring/healthcheck scripts
ansible.builtin.shell: |
find /home/deploy -type f -name "*monitor*" -o -name "*health*" 2>/dev/null | xargs grep -lE "traefik.*restart|docker.*compose.*traefik" 2>/dev/null | head -10 || echo "No monitoring scripts found"
register: monitoring_scripts
changed_when: false
- name: Summary
ansible.builtin.debug:
msg: |
================================================================================
ANSIBLE AUTOMATION SOURCE DIAGNOSE:
================================================================================
Laufende Ansible-Prozesse:
{{ ansible_processes.stdout }}
Ansible-Pull Prozesse:
{{ ansible_pull_processes.stdout }}
Systemd Timers für Ansible:
{{ ansible_timers.stdout }}
Cronjobs für Ansible:
{{ ansible_cronjobs.stdout }}
System-Cron für Ansible:
{{ ansible_system_cron.stdout }}
Ansible-Prozesse in Journalctl (letzte 24h):
{{ ansible_journal.stdout }}
Ansible-Pull Konfigurationsdateien:
{{ ansible_pull_configs.stdout }}
Laufende Docker Compose Traefik-Befehle:
{{ docker_traefik_commands.stdout }}
Traefik Kill-Events (letzte Stunde):
{{ traefik_kill_events.stdout }}
Docker Compose Traefik-Befehle in Journalctl:
{{ docker_traefik_journal.stdout }}
CI/CD Scripts die Traefik restarten:
{{ cicd_scripts.stdout }}
Gitea Workflows die Traefik restarten:
{{ gitea_workflows.stdout }}
Monitoring-Scripts die Traefik restarten:
{{ monitoring_scripts.stdout }}
================================================================================
ANALYSE:
================================================================================
{% if 'No Ansible processes found' not in ansible_processes.stdout %}
⚠️ AKTIVE ANSIBLE-PROZESSE GEFUNDEN:
{{ ansible_processes.stdout }}
→ Diese Prozesse könnten Traefik regelmäßig neu starten
→ Prüfe die Kommandozeile dieser Prozesse um das Playbook zu identifizieren
{% endif %}
{% if 'No ansible-pull processes found' not in ansible_pull_processes.stdout %}
❌ ANSIBLE-PULL LÄUFT:
{{ ansible_pull_processes.stdout }}
→ ansible-pull führt regelmäßig Playbooks aus
→ Dies ist wahrscheinlich die Quelle der Traefik-Restarts
{% endif %}
{% if 'No ansible timers found' not in ansible_timers.stdout %}
❌ ANSIBLE TIMER GEFUNDEN:
{{ ansible_timers.stdout }}
→ Ein Systemd-Timer führt regelmäßig Ansible aus
→ Deaktiviere mit: systemctl disable <timer-name>
{% endif %}
{% if 'No ansible-pull cronjobs found' not in ansible_cronjobs.stdout %}
❌ ANSIBLE CRONJOB GEFUNDEN:
{{ ansible_cronjobs.stdout }}
→ Ein Cronjob führt regelmäßig Ansible aus
→ Entferne oder kommentiere den Cronjob-Eintrag
{% endif %}
{% if cicd_scripts.stdout and 'No CI/CD scripts found' not in cicd_scripts.stdout %}
⚠️ CI/CD SCRIPTS GEFUNDEN:
{{ cicd_scripts.stdout }}
→ Diese Scripts könnten Traefik regelmäßig neu starten
→ Prüfe diese Dateien und entferne/kommentiere Traefik-Restart-Befehle
{% endif %}
{% if gitea_workflows.stdout and 'No Gitea workflows found' not in gitea_workflows.stdout %}
⚠️ GITEA WORKFLOWS GEFUNDEN:
{{ gitea_workflows.stdout }}
→ Diese Workflows könnten Traefik regelmäßig neu starten
→ Prüfe diese Workflows und entferne/kommentiere Traefik-Restart-Schritte
{% endif %}
{% if monitoring_scripts.stdout and 'No monitoring scripts found' not in monitoring_scripts.stdout %}
⚠️ MONITORING SCRIPTS GEFUNDEN:
{{ monitoring_scripts.stdout }}
→ Diese Scripts könnten Traefik regelmäßig neu starten
→ Prüfe diese Scripts und entferne/kommentiere Traefik-Restart-Befehle
{% endif %}
================================================================================
LÖSUNG:
================================================================================
{% if 'No Ansible processes found' in ansible_processes.stdout and 'No ansible-pull processes found' in ansible_pull_processes.stdout and 'No ansible timers found' in ansible_timers.stdout and 'No ansible-pull cronjobs found' in ansible_cronjobs.stdout %}
Keine aktiven Ansible-Automatisierungen gefunden
Mögliche Ursachen:
1. Ansible-Prozesse laufen nur zeitweise (intermittierend)
2. Externe CI/CD-Pipeline führt Ansible aus
3. Manuelle Ansible-Aufrufe von außen
Nächste Schritte:
1. Beobachte Docker Events in Echtzeit: docker events --filter container=traefik
2. Beobachte Ansible-Prozesse: watch -n 1 'ps aux | grep ansible'
3. Prüfe ob externe CI/CD-Pipelines Ansible ausführen
{% else %}
SOFORTMASSNAHME:
{% if 'No ansible-pull processes found' not in ansible_pull_processes.stdout %}
1. ❌ Stoppe ansible-pull:
pkill -f ansible-pull
{% endif %}
{% if 'No ansible timers found' not in ansible_timers.stdout %}
2. ❌ Deaktiviere Ansible-Timer:
systemctl stop <timer-name>
systemctl disable <timer-name>
{% endif %}
{% if 'No ansible-pull cronjobs found' not in ansible_cronjobs.stdout %}
3. ❌ Entferne Ansible-Cronjobs:
crontab -u <user> -e
(Kommentiere oder entferne die Ansible-Zeilen)
{% endif %}
LANGZEITLÖSUNG:
1. Prüfe gefundene Scripts/Workflows und entferne Traefik-Restart-Befehle
2. Falls Healthchecks nötig sind, setze größere Intervalle (z.B. 5 Minuten statt 30 Sekunden)
3. Restarte Traefik nur bei echten Fehlern, nicht präventiv
{% endif %}
================================================================================

View File

@@ -1,328 +0,0 @@
---
# Find Source of Traefik Restarts
# Umfassende Diagnose um die Quelle der regelmäßigen Traefik-Restarts zu finden
- name: Find Source of Traefik Restarts
hosts: production
gather_facts: yes
become: yes
vars:
traefik_stack_path: "{{ stacks_base_path }}/traefik"
monitor_duration_seconds: 120 # 2 Minuten Monitoring (kann erhöht werden)
tasks:
- name: Check Traefik container restart count
ansible.builtin.shell: |
docker inspect traefik --format '{{ '{{' }}.RestartCount{{ '}}' }}' 2>/dev/null || echo "0"
register: traefik_restart_count
changed_when: false
- name: Check Traefik container start time
ansible.builtin.shell: |
docker inspect traefik --format '{{ '{{' }}.State.StartedAt{{ '}}' }}' 2>/dev/null || echo "UNKNOWN"
register: traefik_started_at
changed_when: false
- name: Analyze Traefik logs for "Stopping server gracefully" messages
ansible.builtin.shell: |
cd {{ traefik_stack_path }}
docker compose logs traefik 2>&1 | grep -i "stopping server gracefully\|I have to go" | tail -20
register: traefik_stop_messages
changed_when: false
failed_when: false
- name: Extract timestamps from stop messages
ansible.builtin.shell: |
cd {{ traefik_stack_path }}
docker compose logs traefik 2>&1 | grep -i "stopping server gracefully\|I have to go" | tail -20 | grep -oE '[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}' | sort | uniq
register: stop_timestamps
changed_when: false
failed_when: false
- name: Check Docker daemon logs for Traefik stop events
ansible.builtin.shell: |
journalctl -u docker.service --since "24 hours ago" --no-pager | grep -iE "traefik.*stop|traefik.*kill|traefik.*die|container.*traefik.*stopped" | tail -30 || echo "No Traefik stop events in Docker daemon logs"
register: docker_daemon_logs
changed_when: false
failed_when: false
- name: Check Docker events for Traefik (last 24 hours)
ansible.builtin.shell: |
docker events --since 24h --until now --filter container=traefik --filter event=die --format "{{ '{{' }}.Time{{ '}}' }} {{ '{{' }}.Action{{ '}}' }} {{ '{{' }}.Actor.Attributes.name{{ '}}' }}" 2>/dev/null | tail -20 || echo "No Traefik die events found"
register: docker_events_traefik
changed_when: false
failed_when: false
- name: Check all user crontabs for Traefik/Docker commands
ansible.builtin.shell: |
for user in $(cut -f1 -d: /etc/passwd); do
crontab -u "$user" -l 2>/dev/null | grep -qE "traefik|docker.*compose.*traefik|docker.*stop.*traefik|docker.*restart.*traefik|docker.*down.*traefik" && echo "=== User: $user ===" && crontab -u "$user" -l 2>/dev/null | grep -E "traefik|docker.*compose.*traefik|docker.*stop.*traefik|docker.*restart.*traefik|docker.*down.*traefik" || true
done || echo "No user crontabs with Traefik commands found"
register: all_user_crontabs
changed_when: false
- name: Check system-wide cron directories
ansible.builtin.shell: |
for dir in /etc/cron.d /etc/cron.daily /etc/cron.hourly /etc/cron.weekly /etc/cron.monthly; do
if [ -d "$dir" ]; then
echo "=== $dir ==="
grep -rE "traefik|docker.*compose.*traefik|docker.*stop.*traefik|docker.*restart.*traefik|docker.*down.*traefik" "$dir" 2>/dev/null || echo "No matches"
fi
done
register: system_cron_dirs
changed_when: false
- name: Check systemd timers and services
ansible.builtin.shell: |
echo "=== Active Timers ==="
systemctl list-timers --all --no-pager | grep -E "traefik|docker.*compose" || echo "No Traefik-related timers"
echo ""
echo "=== Custom Services ==="
systemctl list-units --type=service --all | grep -E "traefik|docker.*compose" || echo "No Traefik-related services"
register: systemd_services
changed_when: false
- name: Check for scripts in deployment directory that restart Traefik
ansible.builtin.shell: |
find /home/deploy/deployment -type f \( -name "*.sh" -o -name "*.yml" -o -name "*.yaml" \) -exec grep -lE "traefik.*restart|docker.*compose.*traefik.*restart|docker.*compose.*traefik.*down|docker.*compose.*traefik.*stop" {} \; 2>/dev/null | head -30
register: deployment_scripts
changed_when: false
- name: Check Ansible roles for traefik_auto_restart or restart tasks
ansible.builtin.shell: |
grep -rE "traefik_auto_restart|traefik.*restart|docker.*compose.*traefik.*restart" /home/deploy/deployment/ansible/roles/ 2>/dev/null | grep -v ".git" | head -20 || echo "No auto-restart settings found"
register: ansible_auto_restart
changed_when: false
- name: Check Docker Compose watch mode
ansible.builtin.shell: |
cd {{ traefik_stack_path }}
docker compose ps traefik 2>/dev/null | grep -q "traefik" && echo "running" || echo "not_running"
register: docker_compose_watch
changed_when: false
failed_when: false
- name: Check if Docker Compose is running in watch mode
ansible.builtin.shell: |
ps aux | grep -E "docker.*compose.*watch|docker.*compose.*--watch" | grep -v grep || echo "No Docker Compose watch mode detected"
register: watch_mode_process
changed_when: false
- name: Check for monitoring/watchdog scripts
ansible.builtin.shell: |
find /home/deploy -type f -name "*monitor*" -o -name "*watchdog*" -o -name "*health*" 2>/dev/null | xargs grep -lE "traefik|docker.*compose.*traefik" 2>/dev/null | head -10 || echo "No monitoring scripts found"
register: monitoring_scripts
changed_when: false
- name: Check Gitea Workflows for Traefik restarts
ansible.builtin.shell: |
find /home/deploy -type f -path "*/.gitea/workflows/*.yml" -o -path "*/.github/workflows/*.yml" 2>/dev/null | xargs grep -lE "traefik.*restart|docker.*compose.*traefik.*restart" 2>/dev/null | head -10 || echo "No Gitea workflows found that restart Traefik"
register: gitea_workflows
changed_when: false
- name: Monitor Docker events in real-time (5 minutes)
ansible.builtin.shell: |
timeout {{ monitor_duration_seconds }} docker events --filter container=traefik --format "{{ '{{' }}.Time{{ '}}' }} {{ '{{' }}.Action{{ '}}' }} {{ '{{' }}.Actor.Attributes.name{{ '}}' }}" 2>&1 || echo "Monitoring completed or timeout"
register: docker_events_realtime
changed_when: false
failed_when: false
async: "{{ monitor_duration_seconds + 10 }}"
poll: 0
- name: Wait for monitoring to complete
ansible.builtin.async_status:
jid: "{{ docker_events_realtime.ansible_job_id }}"
register: monitoring_result
until: monitoring_result.finished
retries: "{{ (monitor_duration_seconds / 10) | int + 5 }}"
delay: 10
failed_when: false
- name: Check system reboot history
ansible.builtin.shell: |
last reboot --since "24 hours ago" 2>/dev/null | head -10 || echo "No reboots in last 24 hours"
register: reboot_history
changed_when: false
failed_when: false
- name: Check for at jobs
ansible.builtin.shell: |
atq 2>/dev/null | while read line; do
job_id=$(echo "$line" | awk '{print $1}')
at -c "$job_id" 2>/dev/null | grep -qE "traefik|docker.*compose.*traefik" && echo "=== Job ID: $job_id ===" && at -c "$job_id" 2>/dev/null | grep -E "traefik|docker.*compose.*traefik" || true
done || echo "No at jobs found or atq not available"
register: at_jobs
changed_when: false
- name: Check Docker daemon configuration for auto-restart
ansible.builtin.shell: |
cat /etc/docker/daemon.json 2>/dev/null | grep -iE "restart|live-restore" || echo "No restart settings in daemon.json"
register: docker_daemon_config
changed_when: false
failed_when: false
- name: Check if Traefik has restart policy
ansible.builtin.shell: |
cd {{ traefik_stack_path }}
docker compose config | grep -A 5 "traefik:" | grep -E "restart|restart_policy" || echo "No explicit restart policy found"
register: traefik_restart_policy
changed_when: false
failed_when: false
- name: Summary
ansible.builtin.debug:
msg: |
================================================================================
TRAEFIK RESTART SOURCE DIAGNOSE - ZUSAMMENFASSUNG:
================================================================================
Traefik Status:
- Restart Count: {{ traefik_restart_count.stdout }}
- Started At: {{ traefik_started_at.stdout }}
- Stop Messages gefunden: {{ traefik_stop_messages.stdout_lines | length }} (letzte 20)
Stop-Zeitstempel (letzte 20):
{% if stop_timestamps.stdout %}
{{ stop_timestamps.stdout }}
{% else %}
Keine Stop-Zeitstempel gefunden
{% endif %}
Docker Events (letzte 24h):
{% if docker_events_traefik.stdout and 'No Traefik die events' not in docker_events_traefik.stdout %}
{{ docker_events_traefik.stdout }}
{% else %}
Keine Traefik die-Events in den letzten 24 Stunden
{% endif %}
Docker Daemon Logs:
{% if docker_daemon_logs.stdout and 'No Traefik stop events' not in docker_daemon_logs.stdout %}
{{ docker_daemon_logs.stdout }}
{% else %}
Keine Traefik-Stop-Events in Docker-Daemon-Logs
{% endif %}
Gefundene Quellen:
{% if all_user_crontabs.stdout and 'No user crontabs' not in all_user_crontabs.stdout %}
1. ❌ CRONJOBS (User):
{{ all_user_crontabs.stdout }}
{% endif %}
{% if system_cron_dirs.stdout and 'No matches' not in system_cron_dirs.stdout %}
2. ❌ SYSTEM CRON:
{{ system_cron_dirs.stdout }}
{% endif %}
{% if systemd_services.stdout and 'No Traefik-related' not in systemd_services.stdout %}
3. ❌ SYSTEMD TIMERS/SERVICES:
{{ systemd_services.stdout }}
{% endif %}
{% if deployment_scripts.stdout and 'No' not in deployment_scripts.stdout %}
4. ⚠️ DEPLOYMENT SCRIPTS:
{{ deployment_scripts.stdout }}
{% endif %}
{% if ansible_auto_restart.stdout and 'No auto-restart' not in ansible_auto_restart.stdout %}
5. ⚠️ ANSIBLE AUTO-RESTART:
{{ ansible_auto_restart.stdout }}
{% endif %}
{% if gitea_workflows.stdout and 'No Gitea workflows' not in gitea_workflows.stdout %}
6. ⚠️ GITEA WORKFLOWS:
{{ gitea_workflows.stdout }}
{% endif %}
{% if monitoring_scripts.stdout and 'No monitoring scripts' not in monitoring_scripts.stdout %}
7. ⚠️ MONITORING SCRIPTS:
{{ monitoring_scripts.stdout }}
{% endif %}
{% if at_jobs.stdout and 'No at jobs' not in at_jobs.stdout %}
8. ❌ AT JOBS:
{{ at_jobs.stdout }}
{% endif %}
{% if docker_compose_watch.stdout and 'Could not check' not in docker_compose_watch.stdout %}
9. ⚠️ DOCKER COMPOSE WATCH:
{{ docker_compose_watch.stdout }}
{% endif %}
{% if watch_mode_process.stdout and 'No Docker Compose watch' not in watch_mode_process.stdout %}
10. ❌ DOCKER COMPOSE WATCH MODE (PROZESS):
{{ watch_mode_process.stdout }}
{% endif %}
{% if reboot_history.stdout and 'No reboots' not in reboot_history.stdout %}
11. ⚠️ SYSTEM REBOOTS:
{{ reboot_history.stdout }}
{% endif %}
Real-Time Monitoring ({{ monitor_duration_seconds }} Sekunden):
{% if monitoring_result.finished and monitoring_result.ansible_job_id %}
{{ monitoring_result.stdout | default('Keine Events während Monitoring') }}
{% else %}
Monitoring läuft noch oder wurde unterbrochen
{% endif %}
================================================================================
NÄCHSTE SCHRITTE:
================================================================================
{% if all_user_crontabs.stdout and 'No user crontabs' not in all_user_crontabs.stdout %}
1. ❌ CRONJOBS DEAKTIVIEREN:
- Prüfe gefundene Cronjobs: {{ all_user_crontabs.stdout }}
- Entferne oder kommentiere die entsprechenden Einträge
{% endif %}
{% if system_cron_dirs.stdout and 'No matches' not in system_cron_dirs.stdout %}
2. ❌ SYSTEM CRON DEAKTIVIEREN:
- Prüfe gefundene System-Cronjobs: {{ system_cron_dirs.stdout }}
- Entferne oder benenne die Dateien um
{% endif %}
{% if systemd_services.stdout and 'No Traefik-related' not in systemd_services.stdout %}
3. ❌ SYSTEMD TIMERS/SERVICES DEAKTIVIEREN:
- Prüfe gefundene Services/Timers: {{ systemd_services.stdout }}
- Deaktiviere mit: systemctl disable <service>
{% endif %}
{% if deployment_scripts.stdout and 'No' not in deployment_scripts.stdout %}
4. ⚠️ DEPLOYMENT SCRIPTS PRÜFEN:
- Prüfe gefundene Scripts: {{ deployment_scripts.stdout }}
- Entferne oder kommentiere Traefik-Restart-Befehle
{% endif %}
{% if ansible_auto_restart.stdout and 'No auto-restart' not in ansible_auto_restart.stdout %}
5. ⚠️ ANSIBLE AUTO-RESTART PRÜFEN:
- Prüfe gefundene Einstellungen: {{ ansible_auto_restart.stdout }}
- Setze traefik_auto_restart: false in group_vars
{% endif %}
{% if not all_user_crontabs.stdout or 'No user crontabs' in all_user_crontabs.stdout %}
{% if not system_cron_dirs.stdout or 'No matches' in system_cron_dirs.stdout %}
{% if not systemd_services.stdout or 'No Traefik-related' in systemd_services.stdout %}
{% if not deployment_scripts.stdout or 'No' in deployment_scripts.stdout %}
{% if not ansible_auto_restart.stdout or 'No auto-restart' in ansible_auto_restart.stdout %}
⚠️ KEINE AUTOMATISCHEN RESTART-MECHANISMEN GEFUNDEN!
Mögliche Ursachen:
1. Externer Prozess (nicht über Cron/Systemd)
2. Docker-Service-Restarts (systemctl restart docker)
3. Host-Reboots
4. Manuelle Restarts (von außen)
5. Monitoring-Service (Portainer, Watchtower, etc.)
Nächste Schritte:
1. Führe 'docker events --filter container=traefik' manuell aus und beobachte
2. Prüfe journalctl -u docker.service für Docker-Service-Restarts
3. Prüfe ob Portainer oder andere Monitoring-Tools laufen
4. Prüfe ob Watchtower oder andere Auto-Update-Tools installiert sind
{% endif %}
{% endif %}
{% endif %}
{% endif %}
{% endif %}
================================================================================

View File

@@ -1,175 +0,0 @@
---
# Fix Gitea Complete - Deaktiviert Runner, repariert Service Discovery
# Behebt Gitea-Timeouts durch: 1) Runner deaktivieren, 2) Service Discovery reparieren
- name: Fix Gitea Complete
hosts: production
gather_facts: yes
become: no
vars:
gitea_stack_path: "{{ stacks_base_path }}/gitea"
traefik_stack_path: "{{ stacks_base_path }}/traefik"
gitea_runner_path: "{{ stacks_base_path }}/../gitea-runner"
gitea_url: "https://{{ gitea_domain }}"
tasks:
- name: Check Gitea Runner status
ansible.builtin.shell: |
cd {{ gitea_runner_path }}
docker compose ps gitea-runner 2>/dev/null || echo "Runner not found"
register: runner_status
changed_when: false
failed_when: false
- name: Display Gitea Runner status
ansible.builtin.debug:
msg: |
================================================================================
Gitea Runner Status (Before):
================================================================================
{{ runner_status.stdout }}
================================================================================
- name: Stop Gitea Runner to reduce load
ansible.builtin.shell: |
cd {{ gitea_runner_path }}
docker compose stop gitea-runner
register: runner_stop
changed_when: runner_stop.rc == 0
failed_when: false
when: runner_status.rc == 0
- name: Check Gitea container status before restart
ansible.builtin.shell: |
cd {{ gitea_stack_path }}
docker compose ps gitea
register: gitea_status_before
changed_when: false
- name: Check Traefik container status before restart
ansible.builtin.shell: |
cd {{ traefik_stack_path }}
docker compose ps traefik
register: traefik_status_before
changed_when: false
- name: Restart Gitea container
ansible.builtin.shell: |
cd {{ gitea_stack_path }}
docker compose restart gitea
register: gitea_restart
changed_when: gitea_restart.rc == 0
- name: Wait for Gitea to be ready (direct check)
ansible.builtin.shell: |
cd {{ gitea_stack_path }}
for i in {1..30}; do
if docker compose exec -T gitea curl -f http://localhost:3000/api/healthz >/dev/null 2>&1; then
echo "Gitea is ready"
exit 0
fi
sleep 2
done
echo "Gitea not ready after 60 seconds"
exit 1
register: gitea_ready
changed_when: false
failed_when: false
- name: Restart Traefik to refresh service discovery
ansible.builtin.shell: |
cd {{ traefik_stack_path }}
docker compose restart traefik
register: traefik_restart
changed_when: traefik_restart.rc == 0
when: traefik_auto_restart | default(false) | bool
- name: Wait for Traefik to be ready
ansible.builtin.wait_for:
timeout: 30
delay: 2
changed_when: false
when: traefik_restart.changed | default(false) | bool
- name: Wait for Gitea to be reachable via Traefik (with retries)
ansible.builtin.uri:
url: "{{ gitea_url }}/api/healthz"
method: GET
status_code: [200]
validate_certs: false
timeout: 10
register: gitea_health_via_traefik
until: gitea_health_via_traefik.status == 200
retries: 15
delay: 2
changed_when: false
failed_when: false
when: (traefik_restart.changed | default(false) | bool) or (gitea_restart.changed | default(false) | bool)
- name: Check if Gitea is in Traefik service discovery
ansible.builtin.shell: |
cd {{ traefik_stack_path }}
docker compose exec -T traefik traefik show providers docker 2>/dev/null | grep -i "gitea" || echo "NOT_FOUND"
register: traefik_gitea_service_check
changed_when: false
failed_when: false
when: (traefik_restart.changed | default(false) | bool) or (gitea_restart.changed | default(false) | bool)
- name: Final status check
ansible.builtin.uri:
url: "{{ gitea_url }}/api/healthz"
method: GET
status_code: [200]
validate_certs: false
timeout: 10
register: final_status
changed_when: false
failed_when: false
- name: Summary
ansible.builtin.debug:
msg: |
================================================================================
ZUSAMMENFASSUNG - Gitea Complete Fix:
================================================================================
Aktionen:
- Gitea Runner: {% if runner_stop.changed | default(false) %}✅ Gestoppt{% else %} War nicht aktiv oder nicht gefunden{% endif %}
- Gitea Restart: {% if gitea_restart.changed %}✅ Durchgeführt{% else %} Nicht nötig{% endif %}
- Traefik Restart: {% if traefik_restart.changed %}✅ Durchgeführt{% else %} Nicht nötig{% endif %}
Gitea Ready Check:
- Direkt: {% if gitea_ready.rc == 0 %}✅ Bereit{% else %}❌ Nicht bereit{% endif %}
Final Status:
- Gitea via Traefik: {% if final_status.status == 200 %}✅ Erreichbar (Status: 200){% else %}❌ Nicht erreichbar (Status: {{ final_status.status | default('TIMEOUT') }}){% endif %}
- Traefik Service Discovery: {% if 'NOT_FOUND' not in traefik_gitea_service_check.stdout %}✅ Gitea gefunden{% else %}❌ Gitea nicht gefunden{% endif %}
{% if final_status.status == 200 and 'NOT_FOUND' not in traefik_gitea_service_check.stdout %}
✅ ERFOLG: Gitea ist jetzt über Traefik erreichbar!
URL: {{ gitea_url }}
Nächste Schritte:
1. Teste Gitea im Browser: {{ gitea_url }}
2. Wenn alles stabil läuft, kannst du den Runner wieder aktivieren:
cd {{ gitea_runner_path }} && docker compose up -d gitea-runner
3. Beobachte ob der Runner Gitea wieder überlastet
{% else %}
⚠️ PROBLEM: Gitea ist noch nicht vollständig erreichbar
Mögliche Ursachen:
{% if final_status.status != 200 %}
- Gitea antwortet nicht via Traefik (Status: {{ final_status.status | default('TIMEOUT') }})
{% endif %}
{% if 'NOT_FOUND' in traefik_gitea_service_check.stdout %}
- Traefik Service Discovery hat Gitea noch nicht erkannt
{% endif %}
Nächste Schritte:
1. Warte 1-2 Minuten und teste erneut: curl -k {{ gitea_url }}/api/healthz
2. Prüfe Traefik-Logs: cd {{ traefik_stack_path }} && docker compose logs traefik --tail=50
3. Prüfe Gitea-Logs: cd {{ gitea_stack_path }} && docker compose logs gitea --tail=50
4. Prüfe Service Discovery: cd {{ traefik_stack_path }} && docker compose exec -T traefik traefik show providers docker
{% endif %}
================================================================================

View File

@@ -1,195 +0,0 @@
---
# Fix Gitea SSL and Routing Issues
# Prüft SSL-Zertifikat, Service Discovery und behebt Routing-Probleme
- name: Fix Gitea SSL and Routing
hosts: production
gather_facts: yes
become: no
vars:
gitea_stack_path: "{{ stacks_base_path }}/gitea"
traefik_stack_path: "{{ stacks_base_path }}/traefik"
gitea_url: "https://{{ gitea_domain }}"
gitea_url_http: "http://{{ gitea_domain }}"
tasks:
- name: Check Gitea container status
ansible.builtin.shell: |
cd {{ gitea_stack_path }}
docker compose ps gitea
register: gitea_status
changed_when: false
- name: Check Traefik container status
ansible.builtin.shell: |
cd {{ traefik_stack_path }}
docker compose ps traefik
register: traefik_status
changed_when: false
- name: Check if Gitea is in traefik-public network
ansible.builtin.shell: |
docker network inspect traefik-public --format '{{ '{{' }}range .Containers{{ '}}' }}{{ '{{' }}.Name{{ '}}' }} {{ '{{' }}end{{ '}}' }}' 2>/dev/null | grep -q gitea && echo "YES" || echo "NO"
register: gitea_in_network
changed_when: false
- name: Test direct connection from Traefik to Gitea (by service name)
ansible.builtin.shell: |
cd {{ traefik_stack_path }}
docker compose exec -T traefik wget -qO- --timeout=5 http://gitea:3000/api/healthz 2>&1 || echo "CONNECTION_FAILED"
register: traefik_gitea_direct
changed_when: false
failed_when: false
- name: Check Traefik logs for SSL/ACME errors
ansible.builtin.shell: |
cd {{ traefik_stack_path }}
docker compose logs traefik --tail=100 2>&1 | grep -iE "acme|certificate|git\.michaelschiemer\.de|ssl|tls" | tail -20 || echo "No SSL/ACME errors found"
register: traefik_ssl_errors
changed_when: false
failed_when: false
- name: Check if SSL certificate exists for git.michaelschiemer.de
ansible.builtin.shell: |
cd {{ traefik_stack_path }}
docker compose exec -T traefik cat /acme.json 2>/dev/null | grep -q "git.michaelschiemer.de" && echo "YES" || echo "NO"
register: ssl_cert_exists
changed_when: false
failed_when: false
- name: Test Gitea via HTTP (port 80, should redirect or show error)
ansible.builtin.uri:
url: "{{ gitea_url_http }}/api/healthz"
method: GET
status_code: [200, 301, 302, 404, 502, 503, 504]
validate_certs: false
timeout: 10
register: gitea_http_test
changed_when: false
failed_when: false
- name: Test Gitea via HTTPS
ansible.builtin.uri:
url: "{{ gitea_url }}/api/healthz"
method: GET
status_code: [200, 301, 302, 404, 502, 503, 504]
validate_certs: false
timeout: 10
register: gitea_https_test
changed_when: false
failed_when: false
- name: Display diagnostic information
ansible.builtin.debug:
msg: |
================================================================================
GITEA SSL/ROUTING DIAGNOSE:
================================================================================
Container Status:
- Gitea: {{ gitea_status.stdout | regex_replace('.*(Up|Down|Restarting).*', '\\1') | default('UNKNOWN') }}
- Traefik: {{ traefik_status.stdout | regex_replace('.*(Up|Down|Restarting).*', '\\1') | default('UNKNOWN') }}
Network:
- Gitea in traefik-public: {% if gitea_in_network.stdout == 'YES' %}✅{% else %}❌{% endif %}
- Traefik → Gitea (direct): {% if 'CONNECTION_FAILED' not in traefik_gitea_direct.stdout %}✅{% else %}❌{% endif %}
SSL/Certificate:
- Certificate in acme.json: {% if ssl_cert_exists.stdout == 'YES' %}✅{% else %}❌{% endif %}
Connectivity:
- HTTP (port 80): Status {{ gitea_http_test.status | default('TIMEOUT') }}
- HTTPS (port 443): Status {{ gitea_https_test.status | default('TIMEOUT') }}
Traefik SSL/ACME Errors:
{{ traefik_ssl_errors.stdout }}
================================================================================
- name: Restart Gitea if not in network or connection failed
ansible.builtin.shell: |
cd {{ gitea_stack_path }}
docker compose restart gitea
register: gitea_restart
changed_when: gitea_restart.rc == 0
when: gitea_in_network.stdout != 'YES' or 'CONNECTION_FAILED' in traefik_gitea_direct.stdout
- name: Wait for Gitea to be ready after restart
ansible.builtin.pause:
seconds: 30
when: gitea_restart.changed | default(false)
- name: Restart Traefik to refresh service discovery and SSL
ansible.builtin.shell: |
cd {{ traefik_stack_path }}
docker compose restart traefik
register: traefik_restart
changed_when: traefik_restart.rc == 0
when: >
(traefik_auto_restart | default(false) | bool)
and (gitea_restart.changed | default(false) or gitea_https_test.status | default(0) != 200)
- name: Wait for Traefik to be ready after restart
ansible.builtin.pause:
seconds: 15
when: traefik_restart.changed | default(false)
- name: Wait for Gitea to be reachable via HTTPS (with retries)
ansible.builtin.uri:
url: "{{ gitea_url }}/api/healthz"
method: GET
status_code: [200]
validate_certs: false
timeout: 10
register: final_gitea_test
until: final_gitea_test.status == 200
retries: 20
delay: 3
changed_when: false
failed_when: false
when: traefik_restart.changed | default(false) or gitea_restart.changed | default(false)
- name: Final status check
ansible.builtin.uri:
url: "{{ gitea_url }}/api/healthz"
method: GET
status_code: [200]
validate_certs: false
timeout: 10
register: final_status
changed_when: false
failed_when: false
- name: Summary
ansible.builtin.debug:
msg: |
================================================================================
ZUSAMMENFASSUNG - Gitea SSL/Routing Fix:
================================================================================
Aktionen:
- Gitea Restart: {% if gitea_restart.changed | default(false) %}✅ Durchgeführt{% else %} Nicht nötig{% endif %}
- Traefik Restart: {% if traefik_restart.changed | default(false) %}✅ Durchgeführt{% else %} Nicht nötig{% endif %}
Final Status:
- Gitea via HTTPS: {% if final_status.status == 200 %}✅ Erreichbar{% else %}❌ Nicht erreichbar (Status: {{ final_status.status | default('TIMEOUT') }}){% endif %}
{% if final_status.status == 200 %}
✅ Gitea ist jetzt über Traefik erreichbar!
URL: {{ gitea_url }}
{% else %}
⚠️ Gitea ist noch nicht erreichbar
Mögliche Ursachen:
1. SSL-Zertifikat wird noch generiert (ACME Challenge läuft)
2. Traefik Service Discovery braucht mehr Zeit
3. Netzwerk-Problem zwischen Traefik und Gitea
Nächste Schritte:
1. Warte 2-5 Minuten und teste erneut: curl -k {{ gitea_url }}/api/healthz
2. Prüfe Traefik-Logs: cd {{ traefik_stack_path }} && docker compose logs traefik --tail=50
3. Prüfe Gitea-Logs: cd {{ gitea_stack_path }} && docker compose logs gitea --tail=50
4. Prüfe Netzwerk: docker network inspect traefik-public | grep -A 5 gitea
{% endif %}
================================================================================

View File

@@ -1,159 +0,0 @@
---
# Fix Gitea Timeouts
# Startet Gitea und Traefik neu, um Timeout-Probleme zu beheben
- name: Fix Gitea Timeouts
hosts: production
gather_facts: yes
become: no
tasks:
- name: Check Gitea container status before restart
ansible.builtin.shell: |
cd /home/deploy/deployment/stacks/gitea
docker compose ps gitea
register: gitea_status_before
changed_when: false
- name: Display Gitea status before restart
ansible.builtin.debug:
msg: |
================================================================================
Gitea Status (Before Restart):
================================================================================
{{ gitea_status_before.stdout }}
================================================================================
- name: Check Traefik container status before restart
ansible.builtin.shell: |
cd /home/deploy/deployment/stacks/traefik
docker compose ps traefik
register: traefik_status_before
changed_when: false
- name: Display Traefik status before restart
ansible.builtin.debug:
msg: |
================================================================================
Traefik Status (Before Restart):
================================================================================
{{ traefik_status_before.stdout }}
================================================================================
- name: Restart Gitea container
ansible.builtin.shell: |
cd /home/deploy/deployment/stacks/gitea
docker compose restart gitea
register: gitea_restart
changed_when: gitea_restart.rc == 0
- name: Wait for Gitea to be ready
ansible.builtin.uri:
url: "https://git.michaelschiemer.de/api/healthz"
method: GET
status_code: [200]
validate_certs: false
timeout: 10
register: gitea_health_after_restart
until: gitea_health_after_restart.status == 200
retries: 30
delay: 2
changed_when: false
failed_when: false
- name: Display Gitea health after restart
ansible.builtin.debug:
msg: |
================================================================================
Gitea Health After Restart:
================================================================================
{% if gitea_health_after_restart.status == 200 %}
✅ Gitea is healthy after restart
{% else %}
⚠️ Gitea health check failed after restart (Status: {{ gitea_health_after_restart.status | default('TIMEOUT') }})
{% endif %}
================================================================================
- name: Restart Traefik to refresh service discovery
ansible.builtin.shell: |
cd /home/deploy/deployment/stacks/traefik
docker compose restart traefik
register: traefik_restart
changed_when: traefik_restart.rc == 0
when: traefik_auto_restart | default(false) | bool
- name: Wait for Traefik to be ready
ansible.builtin.wait_for:
timeout: 30
delay: 2
changed_when: false
when: traefik_restart.changed | default(false) | bool
- name: Wait for Gitea to be reachable via Traefik
ansible.builtin.uri:
url: "https://git.michaelschiemer.de/api/healthz"
method: GET
status_code: [200]
validate_certs: false
timeout: 10
register: gitea_health_via_traefik
until: gitea_health_via_traefik.status == 200
retries: 30
delay: 2
changed_when: false
failed_when: false
when: (traefik_restart.changed | default(false) | bool) or (gitea_restart.changed | default(false) | bool)
- name: Check final Gitea container status
ansible.builtin.shell: |
cd /home/deploy/deployment/stacks/gitea
docker compose ps gitea
register: gitea_status_after
changed_when: false
- name: Check final Traefik container status
ansible.builtin.shell: |
cd /home/deploy/deployment/stacks/traefik
docker compose ps traefik
register: traefik_status_after
changed_when: false
- name: Test Gitea access via Traefik
ansible.builtin.uri:
url: "https://git.michaelschiemer.de/api/healthz"
method: GET
status_code: [200]
validate_certs: false
timeout: 10
register: final_gitea_test
changed_when: false
failed_when: false
- name: Summary
ansible.builtin.debug:
msg: |
================================================================================
ZUSAMMENFASSUNG - Gitea Timeout Fix:
================================================================================
Gitea Restart: {% if gitea_restart.changed %}✅ Durchgeführt{% else %} Nicht nötig{% endif %}
Traefik Restart: {% if traefik_restart.changed %}✅ Durchgeführt{% else %} Nicht nötig{% endif %}
Final Status:
- Gitea: {{ gitea_status_after.stdout | regex_replace('.*(Up|Down|Restarting).*', '\\1') | default('UNKNOWN') }}
- Traefik: {{ traefik_status_after.stdout | regex_replace('.*(Up|Down|Restarting).*', '\\1') | default('UNKNOWN') }}
- Gitea via Traefik: {% if final_gitea_test.status == 200 %}✅ Erreichbar{% else %}❌ Nicht erreichbar (Status: {{ final_gitea_test.status | default('TIMEOUT') }}){% endif %}
{% if final_gitea_test.status == 200 %}
✅ Gitea ist jetzt über Traefik erreichbar!
URL: https://git.michaelschiemer.de
{% else %}
⚠️ Gitea ist noch nicht über Traefik erreichbar
Nächste Schritte:
1. Prüfe Gitea-Logs: cd /home/deploy/deployment/stacks/gitea && docker compose logs gitea --tail=50
2. Prüfe Traefik-Logs: cd /home/deploy/deployment/stacks/traefik && docker compose logs traefik --tail=50
3. Prüfe Netzwerk: docker network inspect traefik-public | grep -A 5 gitea
4. Führe diagnose-gitea-timeouts.yml aus für detaillierte Diagnose
{% endif %}
================================================================================

View File

@@ -1,94 +0,0 @@
---
# Ansible Playbook: Fix Gitea-Traefik Connection Issues
# Purpose: Ensure Traefik can reliably reach Gitea by restarting both services
# Usage:
# ansible-playbook -i inventory/production.yml playbooks/fix-gitea-traefik-connection.yml \
# --vault-password-file secrets/.vault_pass
- name: Fix Gitea-Traefik Connection
hosts: production
vars:
gitea_stack_path: "{{ stacks_base_path }}/gitea"
traefik_stack_path: "{{ stacks_base_path }}/traefik"
gitea_url: "https://{{ gitea_domain }}"
tasks:
- name: Get current Gitea container IP
shell: |
docker inspect gitea | grep -A 10 'traefik-public' | grep IPAddress | head -1 | awk '{print $2}' | tr -d '",'
register: gitea_ip
changed_when: false
failed_when: false
- name: Display Gitea IP
debug:
msg: "Gitea container IP in traefik-public network: {{ gitea_ip.stdout }}"
- name: Test direct connection to Gitea from Traefik container
shell: |
docker compose -f {{ traefik_stack_path }}/docker-compose.yml exec -T traefik wget -qO- http://{{ gitea_ip.stdout }}:3000/api/healthz 2>&1 | head -3
register: traefik_gitea_test
changed_when: false
failed_when: false
- name: Display Traefik-Gitea connection test result
debug:
msg: "{{ traefik_gitea_test.stdout }}"
- name: Restart Gitea container to refresh IP
shell: |
docker compose -f {{ gitea_stack_path }}/docker-compose.yml restart gitea
when: traefik_gitea_test.rc != 0
- name: Wait for Gitea to be ready
uri:
url: "{{ gitea_url }}/api/healthz"
method: GET
status_code: [200]
validate_certs: false
timeout: 10
register: gitea_health
until: gitea_health.status == 200
retries: 30
delay: 2
changed_when: false
when: traefik_gitea_test.rc != 0
- name: Restart Traefik to refresh service discovery
shell: |
docker compose -f {{ traefik_stack_path }}/docker-compose.yml restart traefik
when: >
traefik_gitea_test.rc != 0
and (traefik_auto_restart | default(false) | bool)
register: traefik_restart
changed_when: traefik_restart.rc == 0
- name: Wait for Traefik to be ready
pause:
seconds: 10
when: traefik_restart.changed | default(false) | bool
- name: Test Gitea via Traefik
uri:
url: "{{ gitea_url }}/api/healthz"
method: GET
status_code: [200]
validate_certs: false
timeout: 10
register: final_test
changed_when: false
when: traefik_restart.changed | default(false) | bool
- name: Display result
debug:
msg: |
Gitea-Traefik connection test:
- Direct connection: {{ 'OK' if traefik_gitea_test.rc == 0 else 'FAILED' }}
- Via Traefik: {{ 'OK' if (final_test.status | default(0) == 200) else 'FAILED' if (traefik_restart.changed | default(false) | bool) else 'SKIPPED (no restart)' }}
{% if traefik_restart.changed | default(false) | bool %}
Traefik has been restarted to refresh service discovery.
{% elif traefik_gitea_test.rc != 0 %}
Note: Traefik restart was skipped (traefik_auto_restart=false). Direct connection test failed.
{% endif %}

View File

@@ -0,0 +1,198 @@
---
# Backup Before Redeploy
# Creates comprehensive backup of Gitea data, SSL certificates, and configurations
# before redeploying Traefik and Gitea stacks
- name: Backup Before Redeploy
hosts: production
gather_facts: yes
become: no
vars:
gitea_stack_path: "{{ stacks_base_path }}/gitea"
traefik_stack_path: "{{ stacks_base_path }}/traefik"
backup_base_path: "{{ backups_path | default('/home/deploy/backups') }}"
backup_name: "redeploy-backup-{{ ansible_date_time.epoch }}"
tasks:
- name: Display backup plan
ansible.builtin.debug:
msg: |
================================================================================
BACKUP BEFORE REDEPLOY
================================================================================
This playbook will backup:
1. Gitea data (volumes)
2. SSL certificates (acme.json)
3. Gitea configuration (app.ini)
4. Traefik configuration
5. PostgreSQL data (if applicable)
Backup location: {{ backup_base_path }}/{{ backup_name }}
================================================================================
- name: Ensure backup directory exists
ansible.builtin.file:
path: "{{ backup_base_path }}/{{ backup_name }}"
state: directory
mode: '0755'
become: yes
- name: Create backup timestamp file
ansible.builtin.copy:
content: |
Backup created: {{ ansible_date_time.iso8601 }}
Backup name: {{ backup_name }}
Purpose: Before Traefik/Gitea redeploy
dest: "{{ backup_base_path }}/{{ backup_name }}/backup-info.txt"
mode: '0644'
become: yes
# ========================================
# Backup Gitea Data
# ========================================
- name: Check Gitea volumes
ansible.builtin.shell: |
docker volume ls --filter name=gitea --format "{{ '{{' }}.Name{{ '}}' }}"
register: gitea_volumes
changed_when: false
failed_when: false
- name: Backup Gitea volumes
ansible.builtin.shell: |
for volume in {{ gitea_volumes.stdout_lines | join(' ') }}; do
if [ -n "$volume" ]; then
echo "Backing up volume: $volume"
docker run --rm \
-v "$volume:/source:ro" \
-v "{{ backup_base_path }}/{{ backup_name }}:/backup" \
alpine tar czf "/backup/gitea-volume-${volume}.tar.gz" -C /source .
fi
done
when: gitea_volumes.stdout_lines | length > 0
register: gitea_volumes_backup
changed_when: gitea_volumes_backup.rc == 0
# ========================================
# Backup SSL Certificates
# ========================================
- name: Check if acme.json exists
ansible.builtin.stat:
path: "{{ traefik_stack_path }}/acme.json"
register: acme_json_stat
- name: Backup acme.json
ansible.builtin.copy:
src: "{{ traefik_stack_path }}/acme.json"
dest: "{{ backup_base_path }}/{{ backup_name }}/acme.json"
remote_src: yes
mode: '0600'
when: acme_json_stat.stat.exists
register: acme_backup
changed_when: acme_backup.changed | default(false)
# ========================================
# Backup Gitea Configuration
# ========================================
- name: Backup Gitea app.ini
ansible.builtin.shell: |
cd {{ gitea_stack_path }}
docker compose exec -T gitea cat /data/gitea/conf/app.ini > "{{ backup_base_path }}/{{ backup_name }}/gitea-app.ini" 2>/dev/null || echo "Could not read app.ini"
register: gitea_app_ini_backup
changed_when: false
failed_when: false
- name: Backup Gitea docker-compose.yml
ansible.builtin.copy:
src: "{{ gitea_stack_path }}/docker-compose.yml"
dest: "{{ backup_base_path }}/{{ backup_name }}/gitea-docker-compose.yml"
remote_src: yes
mode: '0644'
register: gitea_compose_backup
changed_when: gitea_compose_backup.changed | default(false)
# ========================================
# Backup Traefik Configuration
# ========================================
- name: Backup Traefik configuration files
ansible.builtin.shell: |
cd {{ traefik_stack_path }}
tar czf "{{ backup_base_path }}/{{ backup_name }}/traefik-config.tar.gz" \
traefik.yml \
docker-compose.yml \
dynamic/ 2>/dev/null || echo "Some files may be missing"
register: traefik_config_backup
changed_when: traefik_config_backup.rc == 0
failed_when: false
# ========================================
# Backup PostgreSQL Data (if applicable)
# ========================================
- name: Check if PostgreSQL stack exists
ansible.builtin.stat:
path: "{{ stacks_base_path }}/postgresql/docker-compose.yml"
register: postgres_compose_exists
- name: Backup PostgreSQL database (if running)
ansible.builtin.shell: |
cd {{ stacks_base_path }}/postgresql
if docker compose ps postgres | grep -q "Up"; then
docker compose exec -T postgres pg_dumpall -U postgres | gzip > "{{ backup_base_path }}/{{ backup_name }}/postgresql-all-{{ ansible_date_time.epoch }}.sql.gz"
echo "PostgreSQL backup created"
else
echo "PostgreSQL not running, skipping backup"
fi
when: postgres_compose_exists.stat.exists
register: postgres_backup
changed_when: false
failed_when: false
# ========================================
# Verify Backup
# ========================================
- name: List backup contents
ansible.builtin.shell: |
ls -lh "{{ backup_base_path }}/{{ backup_name }}/"
register: backup_contents
changed_when: false
- name: Calculate backup size
ansible.builtin.shell: |
du -sh "{{ backup_base_path }}/{{ backup_name }}" | awk '{print $1}'
register: backup_size
changed_when: false
- name: Summary
ansible.builtin.debug:
msg: |
================================================================================
BACKUP SUMMARY
================================================================================
Backup location: {{ backup_base_path }}/{{ backup_name }}
Backup size: {{ backup_size.stdout }}
Backed up:
- Gitea volumes: {% if gitea_volumes_backup.changed %}✅{% else %} No volumes found{% endif %}
- SSL certificates (acme.json): {% if acme_backup.changed | default(false) %}✅{% else %} Not found{% endif %}
- Gitea app.ini: {% if gitea_app_ini_backup.rc == 0 %}✅{% else %}⚠️ Could not read{% endif %}
- Gitea docker-compose.yml: {% if gitea_compose_backup.changed | default(false) %}✅{% else %} Not found{% endif %}
- Traefik configuration: {% if traefik_config_backup.rc == 0 %}✅{% else %}⚠️ Some files may be missing{% endif %}
- PostgreSQL data: {% if postgres_backup.rc == 0 and 'created' in postgres_backup.stdout %}✅{% else %} Not running or not found{% endif %}
Backup contents:
{{ backup_contents.stdout }}
================================================================================
NEXT STEPS
================================================================================
Backup completed successfully. You can now proceed with redeploy:
ansible-playbook -i inventory/production.yml playbooks/setup/redeploy-traefik-gitea-clean.yml \
--vault-password-file secrets/.vault_pass \
-e "backup_name={{ backup_name }}"
================================================================================

View File

@@ -0,0 +1,255 @@
---
# Rollback Redeploy
# Restores Traefik and Gitea from backup created before redeploy
#
# Usage:
# ansible-playbook -i inventory/production.yml playbooks/maintenance/rollback-redeploy.yml \
# --vault-password-file secrets/.vault_pass \
# -e "backup_name=redeploy-backup-1234567890"
- name: Rollback Redeploy
hosts: production
gather_facts: yes
become: no
vars:
traefik_stack_path: "{{ stacks_base_path }}/traefik"
gitea_stack_path: "{{ stacks_base_path }}/gitea"
backup_base_path: "{{ backups_path | default('/home/deploy/backups') }}"
backup_name: "{{ backup_name | default('') }}"
tasks:
- name: Validate backup name
ansible.builtin.fail:
msg: "backup_name is required. Use: -e 'backup_name=redeploy-backup-1234567890'"
when: backup_name == ""
- name: Check if backup directory exists
ansible.builtin.stat:
path: "{{ backup_base_path }}/{{ backup_name }}"
register: backup_dir_stat
- name: Fail if backup not found
ansible.builtin.fail:
msg: "Backup directory not found: {{ backup_base_path }}/{{ backup_name }}"
when: not backup_dir_stat.stat.exists
- name: Display rollback plan
ansible.builtin.debug:
msg: |
================================================================================
ROLLBACK REDEPLOY
================================================================================
This playbook will restore from backup: {{ backup_base_path }}/{{ backup_name }}
Steps:
1. Stop Traefik and Gitea stacks
2. Restore Gitea volumes
3. Restore SSL certificates (acme.json)
4. Restore Gitea configuration (app.ini)
5. Restore Traefik configuration
6. Restore PostgreSQL data (if applicable)
7. Restart stacks
8. Verify
⚠️ WARNING: This will overwrite current state!
================================================================================
# ========================================
# 1. STOP STACKS
# ========================================
- name: Stop Traefik stack
ansible.builtin.shell: |
cd {{ traefik_stack_path }}
docker compose down
register: traefik_stop
changed_when: traefik_stop.rc == 0
failed_when: false
- name: Stop Gitea stack
ansible.builtin.shell: |
cd {{ gitea_stack_path }}
docker compose down
register: gitea_stop
changed_when: gitea_stop.rc == 0
failed_when: false
# ========================================
# 2. RESTORE GITEA VOLUMES
# ========================================
- name: List Gitea volume backups
ansible.builtin.shell: |
ls -1 "{{ backup_base_path }}/{{ backup_name }}/gitea-volume-"*.tar.gz 2>/dev/null || echo ""
register: gitea_volume_backups
changed_when: false
- name: Restore Gitea volumes
ansible.builtin.shell: |
for backup_file in {{ backup_base_path }}/{{ backup_name }}/gitea-volume-*.tar.gz; do
if [ -f "$backup_file" ]; then
volume_name=$(basename "$backup_file" .tar.gz | sed 's/gitea-volume-//')
echo "Restoring volume: $volume_name"
docker volume create "$volume_name" 2>/dev/null || true
docker run --rm \
-v "$volume_name:/target" \
-v "{{ backup_base_path }}/{{ backup_name }}:/backup:ro" \
alpine sh -c "cd /target && tar xzf /backup/$(basename $backup_file)"
fi
done
when: gitea_volume_backups.stdout != ""
register: gitea_volumes_restore
changed_when: gitea_volumes_restore.rc == 0
# ========================================
# 3. RESTORE SSL CERTIFICATES
# ========================================
- name: Restore acme.json
ansible.builtin.copy:
src: "{{ backup_base_path }}/{{ backup_name }}/acme.json"
dest: "{{ traefik_stack_path }}/acme.json"
remote_src: yes
mode: '0600'
register: acme_restore
changed_when: acme_restore.rc == 0
# ========================================
# 4. RESTORE CONFIGURATIONS
# ========================================
- name: Restore Gitea docker-compose.yml
ansible.builtin.copy:
src: "{{ backup_base_path }}/{{ backup_name }}/gitea-docker-compose.yml"
dest: "{{ gitea_stack_path }}/docker-compose.yml"
remote_src: yes
mode: '0644'
register: gitea_compose_restore
changed_when: gitea_compose_restore.rc == 0
failed_when: false
- name: Restore Traefik configuration
ansible.builtin.shell: |
cd {{ traefik_stack_path }}
tar xzf "{{ backup_base_path }}/{{ backup_name }}/traefik-config.tar.gz" 2>/dev/null || echo "Some files may be missing"
register: traefik_config_restore
changed_when: traefik_config_restore.rc == 0
failed_when: false
# ========================================
# 5. RESTORE POSTGRESQL DATA
# ========================================
- name: Find PostgreSQL backup
ansible.builtin.shell: |
ls -1 "{{ backup_base_path }}/{{ backup_name }}/postgresql-all-"*.sql.gz 2>/dev/null | head -1 || echo ""
register: postgres_backup_file
changed_when: false
- name: Restore PostgreSQL database
ansible.builtin.shell: |
cd {{ stacks_base_path }}/postgresql
if docker compose ps postgres | grep -q "Up"; then
gunzip -c "{{ postgres_backup_file.stdout }}" | docker compose exec -T postgres psql -U postgres
echo "PostgreSQL restored"
else
echo "PostgreSQL not running, skipping restore"
fi
when: postgres_backup_file.stdout != ""
register: postgres_restore
changed_when: false
failed_when: false
# ========================================
# 6. RESTART STACKS
# ========================================
- name: Deploy Traefik stack
community.docker.docker_compose_v2:
project_src: "{{ traefik_stack_path }}"
state: present
pull: always
register: traefik_deploy
- name: Wait for Traefik to be ready
ansible.builtin.shell: |
cd {{ traefik_stack_path }}
docker compose ps traefik | grep -Eiq "Up|running"
register: traefik_ready
changed_when: false
until: traefik_ready.rc == 0
retries: 12
delay: 5
failed_when: traefik_ready.rc != 0
- name: Deploy Gitea stack
community.docker.docker_compose_v2:
project_src: "{{ gitea_stack_path }}"
state: present
pull: always
register: gitea_deploy
- name: Restore Gitea app.ini
ansible.builtin.shell: |
if [ -f "{{ backup_base_path }}/{{ backup_name }}/gitea-app.ini" ]; then
cd {{ gitea_stack_path }}
docker compose exec -T gitea sh -c "cat > /data/gitea/conf/app.ini" < "{{ backup_base_path }}/{{ backup_name }}/gitea-app.ini"
docker compose restart gitea
echo "app.ini restored and Gitea restarted"
else
echo "No app.ini backup found"
fi
register: gitea_app_ini_restore
changed_when: false
failed_when: false
- name: Wait for Gitea to be ready
ansible.builtin.shell: |
cd {{ gitea_stack_path }}
docker compose ps gitea | grep -Eiq "Up|running"
register: gitea_ready
changed_when: false
until: gitea_ready.rc == 0
retries: 12
delay: 5
failed_when: gitea_ready.rc != 0
# ========================================
# 7. VERIFY
# ========================================
- name: Wait for Gitea to be healthy
ansible.builtin.shell: |
cd {{ gitea_stack_path }}
docker compose exec -T gitea curl -f http://localhost:3000/api/healthz 2>&1 | grep -q "status.*pass" && echo "HEALTHY" || echo "NOT_HEALTHY"
register: gitea_health
changed_when: false
until: gitea_health.stdout == "HEALTHY"
retries: 30
delay: 2
failed_when: false
- name: Summary
ansible.builtin.debug:
msg: |
================================================================================
ROLLBACK SUMMARY
================================================================================
Restored from backup: {{ backup_base_path }}/{{ backup_name }}
Restored:
- Gitea volumes: {% if gitea_volumes_restore.changed %}✅{% else %} No volumes to restore{% endif %}
- SSL certificates: {% if acme_restore.changed %}✅{% else %} Not found{% endif %}
- Gitea docker-compose.yml: {% if gitea_compose_restore.changed %}✅{% else %} Not found{% endif %}
- Traefik configuration: {% if traefik_config_restore.rc == 0 %}✅{% else %}⚠️ Some files may be missing{% endif %}
- PostgreSQL data: {% if postgres_restore.rc == 0 and 'restored' in postgres_restore.stdout %}✅{% else %} Not restored{% endif %}
- Gitea app.ini: {% if gitea_app_ini_restore.rc == 0 and 'restored' in gitea_app_ini_restore.stdout %}✅{% else %} Not found{% endif %}
Status:
- Traefik: {% if traefik_ready.rc == 0 %}✅ Running{% else %}❌ Not running{% endif %}
- Gitea: {% if gitea_ready.rc == 0 %}✅ Running{% else %}❌ Not running{% endif %}
- Gitea Health: {% if gitea_health.stdout == 'HEALTHY' %}✅ Healthy{% else %}❌ Not healthy{% endif %}
Next steps:
1. Test Gitea: curl -k https://{{ gitea_domain }}/api/healthz
2. Check logs if issues: cd {{ gitea_stack_path }} && docker compose logs gitea --tail=50
================================================================================

View File

@@ -0,0 +1,294 @@
---
# Consolidated Gitea Management Playbook
# Consolidates: fix-gitea-timeouts.yml, fix-gitea-traefik-connection.yml,
# fix-gitea-ssl-routing.yml, fix-gitea-servers-transport.yml,
# fix-gitea-complete.yml, restart-gitea-complete.yml,
# restart-gitea-with-cache.yml
#
# Usage:
# # Restart Gitea
# ansible-playbook -i inventory/production.yml playbooks/manage/gitea.yml --tags restart
#
# # Fix timeouts (restart Gitea and Traefik)
# ansible-playbook -i inventory/production.yml playbooks/manage/gitea.yml --tags fix-timeouts
#
# # Fix SSL/routing issues
# ansible-playbook -i inventory/production.yml playbooks/manage/gitea.yml --tags fix-ssl
#
# # Complete fix (runner stop + restart + service discovery)
# ansible-playbook -i inventory/production.yml playbooks/manage/gitea.yml --tags complete
- name: Manage Gitea
hosts: production
gather_facts: yes
become: no
vars:
gitea_stack_path: "{{ stacks_base_path }}/gitea"
traefik_stack_path: "{{ stacks_base_path }}/traefik"
gitea_runner_path: "{{ stacks_base_path }}/../gitea-runner"
gitea_url: "https://{{ gitea_domain }}"
gitea_container_name: "gitea"
traefik_container_name: "traefik"
tasks:
- name: Display management plan
ansible.builtin.debug:
msg: |
================================================================================
GITEA MANAGEMENT
================================================================================
Running management tasks with tags: {{ ansible_run_tags | default(['all']) }}
Available actions:
- restart: Restart Gitea container
- fix-timeouts: Restart Gitea and Traefik to fix timeouts
- fix-ssl: Fix SSL/routing issues
- fix-servers-transport: Update ServersTransport configuration
- complete: Complete fix (stop runner, restart services, verify)
================================================================================
# ========================================
# COMPLETE FIX (--tags complete)
# ========================================
- name: Check Gitea Runner status
ansible.builtin.shell: |
cd {{ gitea_runner_path }}
docker compose ps gitea-runner 2>/dev/null || echo "Runner not found"
register: runner_status
changed_when: false
failed_when: false
tags:
- complete
- name: Stop Gitea Runner to reduce load
ansible.builtin.shell: |
cd {{ gitea_runner_path }}
docker compose stop gitea-runner
register: runner_stop
changed_when: runner_stop.rc == 0
failed_when: false
when: runner_status.rc == 0
tags:
- complete
# ========================================
# RESTART GITEA (--tags restart, fix-timeouts, complete)
# ========================================
- name: Check Gitea container status before restart
ansible.builtin.shell: |
cd {{ gitea_stack_path }}
docker compose ps {{ gitea_container_name }}
register: gitea_status_before
changed_when: false
tags:
- restart
- fix-timeouts
- complete
- name: Restart Gitea container
ansible.builtin.shell: |
cd {{ gitea_stack_path }}
docker compose restart {{ gitea_container_name }}
register: gitea_restart
changed_when: gitea_restart.rc == 0
tags:
- restart
- fix-timeouts
- complete
- name: Wait for Gitea to be ready (direct check)
ansible.builtin.shell: |
cd {{ gitea_stack_path }}
for i in {1..30}; do
if docker compose exec -T {{ gitea_container_name }} curl -f http://localhost:3000/api/healthz >/dev/null 2>&1; then
echo "Gitea is ready"
exit 0
fi
sleep 2
done
echo "Gitea not ready after 60 seconds"
exit 1
register: gitea_ready
changed_when: false
failed_when: false
tags:
- restart
- fix-timeouts
- complete
# ========================================
# RESTART TRAEFIK (--tags fix-timeouts, complete)
# ========================================
- name: Check Traefik container status before restart
ansible.builtin.shell: |
cd {{ traefik_stack_path }}
docker compose ps {{ traefik_container_name }}
register: traefik_status_before
changed_when: false
tags:
- fix-timeouts
- complete
- name: Restart Traefik to refresh service discovery
ansible.builtin.shell: |
cd {{ traefik_stack_path }}
docker compose restart {{ traefik_container_name }}
register: traefik_restart
changed_when: traefik_restart.rc == 0
when: traefik_auto_restart | default(false) | bool
tags:
- fix-timeouts
- complete
- name: Wait for Traefik to be ready
ansible.builtin.wait_for:
timeout: 30
delay: 2
changed_when: false
when: traefik_restart.changed | default(false) | bool
tags:
- fix-timeouts
- complete
# ========================================
# FIX SERVERS TRANSPORT (--tags fix-servers-transport)
# ========================================
- name: Sync Gitea stack configuration
ansible.builtin.synchronize:
src: "{{ playbook_dir }}/../../stacks/gitea/"
dest: "{{ gitea_stack_path }}/"
delete: no
recursive: yes
rsync_opts:
- "--chmod=D755,F644"
- "--exclude=.git"
- "--exclude=*.log"
- "--exclude=data/"
- "--exclude=volumes/"
tags:
- fix-servers-transport
- name: Restart Gitea container to apply new labels
ansible.builtin.shell: |
cd {{ gitea_stack_path }}
docker compose up -d --force-recreate {{ gitea_container_name }}
register: gitea_restart_transport
changed_when: gitea_restart_transport.rc == 0
tags:
- fix-servers-transport
# ========================================
# VERIFICATION (--tags fix-timeouts, fix-ssl, complete)
# ========================================
- name: Wait for Gitea to be reachable via Traefik (with retries)
ansible.builtin.uri:
url: "{{ gitea_url }}/api/healthz"
method: GET
status_code: [200]
validate_certs: false
timeout: 10
register: gitea_health_via_traefik
until: gitea_health_via_traefik.status == 200
retries: 15
delay: 2
changed_when: false
failed_when: false
when: (traefik_restart.changed | default(false) | bool) or (gitea_restart.changed | default(false) | bool)
tags:
- fix-timeouts
- fix-ssl
- complete
- name: Check if Gitea is in Traefik service discovery
ansible.builtin.shell: |
cd {{ traefik_stack_path }}
docker compose exec -T {{ traefik_container_name }} traefik show providers docker 2>/dev/null | grep -i "gitea" || echo "NOT_FOUND"
register: traefik_gitea_service_check
changed_when: false
failed_when: false
when: (traefik_restart.changed | default(false) | bool) or (gitea_restart.changed | default(false) | bool)
tags:
- fix-timeouts
- fix-ssl
- complete
- name: Final status check
ansible.builtin.uri:
url: "{{ gitea_url }}/api/healthz"
method: GET
status_code: [200]
validate_certs: false
timeout: 10
register: final_status
changed_when: false
failed_when: false
tags:
- fix-timeouts
- fix-ssl
- complete
# ========================================
# SUMMARY
# ========================================
- name: Summary
ansible.builtin.debug:
msg: |
================================================================================
GITEA MANAGEMENT SUMMARY
================================================================================
Actions performed:
{% if 'complete' in ansible_run_tags %}
- Gitea Runner: {% if runner_stop.changed | default(false) %}✅ Stopped{% else %} Not active or not found{% endif %}
{% endif %}
{% if 'restart' in ansible_run_tags or 'fix-timeouts' in ansible_run_tags or 'complete' in ansible_run_tags %}
- Gitea Restart: {% if gitea_restart.changed %}✅ Performed{% else %} Not needed{% endif %}
- Gitea Ready: {% if gitea_ready.rc == 0 %}✅ Ready{% else %}❌ Not ready{% endif %}
{% endif %}
{% if 'fix-timeouts' in ansible_run_tags or 'complete' in ansible_run_tags %}
- Traefik Restart: {% if traefik_restart.changed %}✅ Performed{% else %} Not needed (traefik_auto_restart=false){% endif %}
{% endif %}
{% if 'fix-servers-transport' in ansible_run_tags %}
- ServersTransport Update: {% if gitea_restart_transport.changed %}✅ Applied{% else %} Not needed{% endif %}
{% endif %}
Final Status:
{% if 'fix-timeouts' in ansible_run_tags or 'fix-ssl' in ansible_run_tags or 'complete' in ansible_run_tags %}
- Gitea via Traefik: {% if final_status.status == 200 %}✅ Reachable (Status: 200){% else %}❌ Not reachable (Status: {{ final_status.status | default('TIMEOUT') }}){% endif %}
- Traefik Service Discovery: {% if 'NOT_FOUND' not in traefik_gitea_service_check.stdout %}✅ Gitea found{% else %}❌ Gitea not found{% endif %}
{% endif %}
{% if final_status.status == 200 and 'NOT_FOUND' not in traefik_gitea_service_check.stdout %}
✅ SUCCESS: Gitea is now reachable via Traefik!
URL: {{ gitea_url }}
Next steps:
1. Test Gitea in browser: {{ gitea_url }}
{% if 'complete' in ansible_run_tags %}
2. If everything is stable, you can reactivate the runner:
cd {{ gitea_runner_path }} && docker compose up -d gitea-runner
3. Monitor if the runner overloads Gitea again
{% endif %}
{% else %}
⚠️ PROBLEM: Gitea is not fully reachable
Possible causes:
{% if final_status.status != 200 %}
- Gitea does not respond via Traefik (Status: {{ final_status.status | default('TIMEOUT') }})
{% endif %}
{% if 'NOT_FOUND' in traefik_gitea_service_check.stdout %}
- Traefik Service Discovery has not recognized Gitea yet
{% endif %}
Next steps:
1. Wait 1-2 minutes and test again: curl -k {{ gitea_url }}/api/healthz
2. Check Traefik logs: cd {{ traefik_stack_path }} && docker compose logs {{ traefik_container_name }} --tail=50
3. Check Gitea logs: cd {{ gitea_stack_path }} && docker compose logs {{ gitea_container_name }} --tail=50
4. Run diagnosis: ansible-playbook -i inventory/production.yml playbooks/diagnose/gitea.yml
{% endif %}
================================================================================

View File

@@ -0,0 +1,162 @@
---
# Consolidated Traefik Management Playbook
# Consolidates: stabilize-traefik.yml, disable-traefik-auto-restarts.yml
#
# Usage:
# # Stabilize Traefik (fix acme.json, ensure running, monitor)
# ansible-playbook -i inventory/production.yml playbooks/manage/traefik.yml --tags stabilize
#
# # Disable auto-restarts
# ansible-playbook -i inventory/production.yml playbooks/manage/traefik.yml --tags disable-auto-restart
- name: Manage Traefik
hosts: production
gather_facts: yes
become: no
vars:
traefik_stack_path: "{{ stacks_base_path }}/traefik"
traefik_container_name: "traefik"
traefik_stabilize_wait_minutes: "{{ traefik_stabilize_wait_minutes | default(10) }}"
traefik_stabilize_check_interval: 60
tasks:
- name: Display management plan
ansible.builtin.debug:
msg: |
================================================================================
TRAEFIK MANAGEMENT
================================================================================
Running management tasks with tags: {{ ansible_run_tags | default(['all']) }}
Available actions:
- stabilize: Fix acme.json, ensure running, monitor stability
- disable-auto-restart: Check and document auto-restart mechanisms
================================================================================
# ========================================
# STABILIZE (--tags stabilize)
# ========================================
- name: Fix acme.json permissions
ansible.builtin.file:
path: "{{ traefik_stack_path }}/acme.json"
state: file
mode: '0600'
owner: "{{ ansible_user | default('deploy') }}"
group: "{{ ansible_user | default('deploy') }}"
register: acme_permissions_fixed
tags:
- stabilize
- name: Ensure Traefik container is running
ansible.builtin.shell: |
cd {{ traefik_stack_path }}
docker compose up -d {{ traefik_container_name }}
register: traefik_start
changed_when: traefik_start.rc == 0
tags:
- stabilize
- name: Wait for Traefik to be ready
ansible.builtin.wait_for:
timeout: 30
delay: 2
changed_when: false
tags:
- stabilize
- name: Monitor Traefik stability
ansible.builtin.shell: |
cd {{ traefik_stack_path }}
docker compose ps {{ traefik_container_name }} --format "{{ '{{' }}.State{{ '}}' }}" | head -1 || echo "UNKNOWN"
register: traefik_state_check
changed_when: false
until: traefik_state_check.stdout == "running"
retries: "{{ (traefik_stabilize_wait_minutes | int * 60 / traefik_stabilize_check_interval) | int }}"
delay: "{{ traefik_stabilize_check_interval }}"
tags:
- stabilize
- name: Check Traefik logs for restarts during monitoring
ansible.builtin.shell: |
cd {{ traefik_stack_path }}
docker compose logs {{ traefik_container_name }} --since "{{ traefik_stabilize_wait_minutes }}m" 2>&1 | grep -iE "stopping server gracefully|I have to go" | wc -l
register: restarts_during_monitoring
changed_when: false
tags:
- stabilize
# ========================================
# DISABLE AUTO-RESTART (--tags disable-auto-restart)
# ========================================
- name: Check Ansible traefik_auto_restart setting
ansible.builtin.shell: |
grep -r "traefik_auto_restart" /home/deploy/deployment/ansible/inventory/group_vars/ 2>/dev/null | head -5 || echo "No traefik_auto_restart setting found"
register: ansible_auto_restart_setting
changed_when: false
tags:
- disable-auto-restart
- name: Check for cronjobs that restart Traefik
ansible.builtin.shell: |
(crontab -l 2>/dev/null || true) | grep -E "traefik|docker.*compose.*restart.*traefik|docker.*stop.*traefik" || echo "No cronjobs found"
register: traefik_cronjobs
changed_when: false
tags:
- disable-auto-restart
- name: Check systemd timers for Traefik
ansible.builtin.shell: |
systemctl list-timers --all --no-pager | grep -E "traefik|docker.*compose.*traefik" || echo "No Traefik-related timers"
register: traefik_timers
changed_when: false
tags:
- disable-auto-restart
# ========================================
# SUMMARY
# ========================================
- name: Summary
ansible.builtin.debug:
msg: |
================================================================================
TRAEFIK MANAGEMENT SUMMARY
================================================================================
{% if 'stabilize' in ansible_run_tags %}
Stabilization:
- acme.json permissions: {% if acme_permissions_fixed.changed %}✅ Fixed{% else %} Already correct{% endif %}
- Traefik started: {% if traefik_start.changed %}✅ Started{% else %} Already running{% endif %}
- Stability monitoring: {{ traefik_stabilize_wait_minutes }} minutes
- Restarts during monitoring: {{ restarts_during_monitoring.stdout | default('0') }}
{% if (restarts_during_monitoring.stdout | default('0') | int) == 0 %}
✅ Traefik ran stable during monitoring period!
{% else %}
⚠️ {{ restarts_during_monitoring.stdout }} restarts detected during monitoring
→ Run diagnosis: ansible-playbook -i inventory/production.yml playbooks/diagnose/traefik.yml --tags restart-source
{% endif %}
{% endif %}
{% if 'disable-auto-restart' in ansible_run_tags %}
Auto-Restart Analysis:
- Ansible setting: {{ ansible_auto_restart_setting.stdout | default('Not found') }}
- Cronjobs: {{ traefik_cronjobs.stdout | default('None found') }}
- Systemd timers: {{ traefik_timers.stdout | default('None found') }}
Recommendations:
{% if 'traefik_auto_restart.*true' in ansible_auto_restart_setting.stdout %}
- Set traefik_auto_restart: false in group_vars
{% endif %}
{% if 'No cronjobs' not in traefik_cronjobs.stdout %}
- Remove or disable cronjobs that restart Traefik
{% endif %}
{% if 'No Traefik-related timers' not in traefik_timers.stdout %}
- Disable systemd timers that restart Traefik
{% endif %}
{% endif %}
================================================================================

View File

@@ -1,141 +0,0 @@
---
# Monitor Traefik Continuously
# Überwacht Traefik-Logs und Docker Events in Echtzeit um Restart-Quelle zu finden
- name: Monitor Traefik Continuously
hosts: production
gather_facts: yes
become: no
vars:
traefik_stack_path: "{{ stacks_base_path }}/traefik"
monitor_duration_minutes: 30 # Standard: 30 Minuten, kann überschrieben werden
tasks:
- name: Display monitoring information
ansible.builtin.debug:
msg: |
================================================================================
TRAEFIK CONTINUOUS MONITORING
================================================================================
Überwachungsdauer: {{ monitor_duration_minutes }} Minuten
Überwacht:
1. Traefik-Logs auf "Stopping server gracefully" / "I have to go"
2. Docker Events für Traefik-Container
3. Docker Daemon Logs für Container-Stops
Starte Monitoring...
================================================================================
- name: Get initial Traefik status
ansible.builtin.shell: |
docker inspect traefik --format '{{ '{{' }}.State.Status{{ '}}' }} {{ '{{' }}.State.StartedAt{{ '}}' }}' 2>/dev/null || echo "UNKNOWN"
register: initial_status
changed_when: false
- name: Start monitoring Traefik logs in background
ansible.builtin.shell: |
cd {{ traefik_stack_path }}
timeout {{ monitor_duration_minutes * 60 }} docker compose logs -f traefik 2>&1 | grep --line-buffered -iE "stopping server gracefully|I have to go" | while read line; do
echo "[$(date '+%Y-%m-%d %H:%M:%S')] $line"
done > /tmp/traefik_monitor_$$.log 2>&1 &
echo $!
register: log_monitor_pid
changed_when: false
async: "{{ monitor_duration_minutes * 60 + 60 }}"
poll: 0
- name: Start monitoring Docker events in background
ansible.builtin.shell: |
timeout {{ monitor_duration_minutes * 60 }} docker events --filter container=traefik --filter event=die --format "[{{ '{{' }}.Time{{ '}}' }}] {{ '{{' }}.Action{{ '}}' }} {{ '{{' }}.Actor.Attributes.name{{ '}}' }}" 2>&1 | tee /tmp/traefik_docker_events_$$.log &
echo $!
register: docker_events_pid
changed_when: false
async: "{{ monitor_duration_minutes * 60 + 60 }}"
poll: 0
- name: Wait for monitoring period
ansible.builtin.pause:
minutes: "{{ monitor_duration_minutes }}"
- name: Stop log monitoring
ansible.builtin.shell: |
pkill -f "docker compose logs.*traefik" || true
sleep 2
changed_when: false
failed_when: false
- name: Stop Docker events monitoring
ansible.builtin.shell: |
pkill -f "docker events.*traefik" || true
sleep 2
changed_when: false
failed_when: false
- name: Read Traefik log monitoring results
ansible.builtin.slurp:
src: "{{ item }}"
register: log_results
changed_when: false
failed_when: false
loop: "{{ log_monitor_pid.stdout_lines | map('regex_replace', '^.*', '/tmp/traefik_monitor_' + ansible_date_time.epoch + '.log') | list }}"
- name: Read Docker events monitoring results
ansible.builtin.slurp:
src: "{{ item }}"
register: docker_events_results
changed_when: false
failed_when: false
loop: "{{ docker_events_pid.stdout_lines | map('regex_replace', '^.*', '/tmp/traefik_docker_events_' + ansible_date_time.epoch + '.log') | list }}"
- name: Get final Traefik status
ansible.builtin.shell: |
docker inspect traefik --format '{{ '{{' }}.State.Status{{ '}}' }} {{ '{{' }}.State.StartedAt{{ '}}' }} {{ '{{' }}.RestartCount{{ '}}' }}' 2>/dev/null || echo "UNKNOWN"
register: final_status
changed_when: false
- name: Check Traefik logs for stop messages during monitoring
ansible.builtin.shell: |
cd {{ traefik_stack_path }}
docker compose logs traefik --since {{ monitor_duration_minutes }}m 2>&1 | grep -iE "stopping server gracefully|I have to go" || echo "Keine Stop-Meldungen gefunden"
register: traefik_stop_messages
changed_when: false
failed_when: false
- name: Summary
ansible.builtin.debug:
msg: |
================================================================================
MONITORING ZUSAMMENFASSUNG ({{ monitor_duration_minutes }} Minuten):
================================================================================
Initial Status: {{ initial_status.stdout }}
Final Status: {{ final_status.stdout }}
Traefik Stop-Meldungen während Monitoring:
{% if traefik_stop_messages.stdout and 'Keine Stop-Meldungen' not in traefik_stop_messages.stdout %}
❌ STOP-MELDUNGEN GEFUNDEN:
{{ traefik_stop_messages.stdout }}
⚠️ PROBLEM BESTÄTIGT: Traefik wurde während des Monitorings gestoppt!
Nächste Schritte:
1. Prüfe Docker Events Log: /tmp/traefik_docker_events_*.log
2. Prüfe Traefik Log Monitor: /tmp/traefik_monitor_*.log
3. Prüfe wer den Stop-Befehl ausgeführt hat:
- journalctl -u docker.service --since "{{ monitor_duration_minutes }} minutes ago"
- docker events --since "{{ monitor_duration_minutes }} minutes ago" --filter container=traefik
{% else %}
✅ KEINE STOP-MELDUNGEN GEFUNDEN
Traefik lief stabil während des {{ monitor_duration_minutes }}-minütigen Monitorings.
{% if initial_status.stdout != final_status.stdout %}
⚠️ Status hat sich geändert:
- Vorher: {{ initial_status.stdout }}
- Nachher: {{ final_status.stdout }}
{% endif %}
{% endif %}
================================================================================

View File

@@ -1,150 +0,0 @@
---
# Monitor Traefik for Unexpected Restarts
# Überwacht Traefik-Logs auf "I have to go..." Meldungen und identifiziert die Ursache
- name: Monitor Traefik Restarts
hosts: production
gather_facts: yes
become: no
vars:
monitor_lookback_hours: "{{ monitor_lookback_hours | default(24) }}"
tasks:
- name: Check Traefik logs for "I have to go..." messages
ansible.builtin.shell: |
cd /home/deploy/deployment/stacks/traefik
docker compose logs traefik --since {{ monitor_lookback_hours }}h 2>&1 | grep -E "I have to go|Stopping server gracefully" | tail -20 || echo "No stop messages found"
register: traefik_stop_messages
changed_when: false
- name: Display Traefik stop messages
ansible.builtin.debug:
msg: |
================================================================================
Traefik Stop-Meldungen (letzte {{ monitor_lookback_hours }} Stunden):
================================================================================
{{ traefik_stop_messages.stdout }}
================================================================================
- name: Check Traefik container restart count
ansible.builtin.shell: |
docker inspect traefik --format '{{ '{{' }}.RestartCount{{ '}}' }}' 2>/dev/null || echo "0"
register: traefik_restart_count
changed_when: false
- name: Check Traefik container start time
ansible.builtin.shell: |
docker inspect traefik --format '{{ '{{' }}.State.StartedAt{{ '}}' }}' 2>/dev/null || echo "UNKNOWN"
register: traefik_started_at
changed_when: false
- name: Check Docker events for Traefik stops
ansible.builtin.shell: |
timeout 5 docker events --since {{ monitor_lookback_hours }}h --filter container=traefik --filter event=die --format "{{ '{{' }}.Time{{ '}}' }} {{ '{{' }}.Action{{ '}}' }} {{ '{{' }}.Actor.Attributes.name{{ '}}' }}" 2>/dev/null | tail -20 || echo "No stop events found or docker events not available"
register: traefik_stop_events
changed_when: false
- name: Display Traefik stop events
ansible.builtin.debug:
msg: |
================================================================================
Docker Stop-Events für Traefik (letzte {{ monitor_lookback_hours }} Stunden):
================================================================================
{{ traefik_stop_events.stdout }}
================================================================================
- name: Check for manual docker compose commands in history
ansible.builtin.shell: |
history | grep -E "docker.*compose.*traefik.*(restart|stop|down|up)" | tail -10 || echo "No manual docker compose commands found in history"
register: manual_commands
changed_when: false
failed_when: false
- name: Display manual docker compose commands
ansible.builtin.debug:
msg: |
================================================================================
Manuelle Docker Compose Befehle (aus History):
================================================================================
{{ manual_commands.stdout }}
================================================================================
- name: Check systemd docker service status
ansible.builtin.shell: |
systemctl status docker.service --no-pager -l | head -20 || echo "Could not check docker service status"
register: docker_service_status
changed_when: false
failed_when: false
- name: Display Docker service status
ansible.builtin.debug:
msg: |
================================================================================
Docker Service Status:
================================================================================
{{ docker_service_status.stdout }}
================================================================================
- name: Check for system reboots
ansible.builtin.shell: |
last reboot --since "{{ monitor_lookback_hours }} hours ago" 2>/dev/null | head -5 || echo "No reboots in the last {{ monitor_lookback_hours }} hours"
register: reboots
changed_when: false
failed_when: false
- name: Display reboot history
ansible.builtin.debug:
msg: |
================================================================================
System Reboots (letzte {{ monitor_lookback_hours }} Stunden):
================================================================================
{{ reboots.stdout }}
================================================================================
- name: Analyze stop message timestamps
ansible.builtin.set_fact:
stop_timestamps: "{{ traefik_stop_messages.stdout | regex_findall('\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}') }}"
- name: Count stop messages
ansible.builtin.set_fact:
stop_count: "{{ stop_timestamps | length | int }}"
- name: Summary
ansible.builtin.debug:
msg: |
================================================================================
ZUSAMMENFASSUNG - Traefik Restart Monitoring:
================================================================================
Überwachungszeitraum: Letzte {{ monitor_lookback_hours }} Stunden
Traefik Status:
- Restart Count: {{ traefik_restart_count.stdout }}
- Gestartet um: {{ traefik_started_at.stdout }}
- Stop-Meldungen gefunden: {{ stop_count | default(0) }}
{% if (stop_count | default(0) | int) > 0 %}
⚠️ {{ stop_count }} Stop-Meldungen gefunden:
{{ traefik_stop_messages.stdout }}
Mögliche Ursachen:
{% if reboots.stdout and 'No reboots' not in reboots.stdout %}
1. System-Reboots: {{ reboots.stdout }}
{% endif %}
{% if traefik_stop_events.stdout and 'No stop events' not in traefik_stop_events.stdout %}
2. Docker Stop-Events: {{ traefik_stop_events.stdout }}
{% endif %}
{% if manual_commands.stdout and 'No manual' not in manual_commands.stdout %}
3. Manuelle Befehle: {{ manual_commands.stdout }}
{% endif %}
Nächste Schritte:
- Prüfe ob die Stop-Meldungen mit unseren manuellen Restarts übereinstimmen
- Prüfe ob System-Reboots die Ursache sind
- Prüfe Docker-Service-Logs für automatische Stops
{% else %}
✅ Keine Stop-Meldungen in den letzten {{ monitor_lookback_hours }} Stunden
Traefik läuft stabil!
{% endif %}
================================================================================

View File

@@ -1,95 +0,0 @@
---
# Restart Gitea Complete - Stoppt und startet Gitea neu um alle Konfigurationsänderungen zu übernehmen
- name: Restart Gitea Complete
hosts: production
gather_facts: no
become: no
vars:
gitea_stack_path: "{{ stacks_base_path }}/gitea"
gitea_url: "https://{{ gitea_domain }}"
tasks:
- name: Check current Gitea environment variables
ansible.builtin.shell: |
cd {{ gitea_stack_path }}
docker compose exec -T gitea env | grep -E 'GITEA__database__' | sort || echo "Could not read environment variables"
register: gitea_env_before
changed_when: false
failed_when: false
- name: Display current environment variables
ansible.builtin.debug:
msg: |
Current Gitea Database Environment Variables:
{{ gitea_env_before.stdout }}
- name: Stop Gitea container completely
ansible.builtin.shell: |
cd {{ gitea_stack_path }}
docker compose stop gitea
register: gitea_stop
changed_when: gitea_stop.rc == 0
- name: Wait for Gitea to stop
ansible.builtin.pause:
seconds: 5
- name: Start Gitea container
ansible.builtin.shell: |
cd {{ gitea_stack_path }}
docker compose up -d gitea
register: gitea_start
changed_when: gitea_start.rc == 0
- name: Wait for Gitea to be ready
ansible.builtin.wait_for:
timeout: 60
delay: 5
- name: Check Gitea health after restart
ansible.builtin.uri:
url: "{{ gitea_url }}/api/healthz"
method: GET
validate_certs: false
timeout: 10
register: gitea_health_after
changed_when: false
failed_when: false
retries: 5
delay: 5
- name: Check environment variables after restart
ansible.builtin.shell: |
cd {{ gitea_stack_path }}
docker compose exec -T gitea env | grep -E 'GITEA__database__' | sort || echo "Could not read environment variables"
register: gitea_env_after
changed_when: false
failed_when: false
- name: Display restart results
ansible.builtin.debug:
msg: |
================================================================================
GITEA COMPLETE RESTART - RESULTS
================================================================================
Gitea Health After Restart:
- Status: {{ gitea_health_after.status | default('TIMEOUT') }}
{% if gitea_health_after.status | default(0) == 200 %}
✅ Gitea is healthy after restart
{% else %}
❌ Gitea health check failed (Status: {{ gitea_health_after.status | default('TIMEOUT') }})
{% endif %}
Environment Variables After Restart:
{{ gitea_env_after.stdout }}
{% if 'MAX_OPEN_CONNS' in gitea_env_after.stdout %}
✅ Connection pool settings are present
{% else %}
⚠️ Connection pool settings NOT found in environment variables
→ Check docker-compose.yml configuration
{% endif %}
================================================================================

View File

@@ -1,57 +0,0 @@
---
# Ansible Playbook: Restart Gitea with Redis Cache Enabled
# Purpose: Restart Gitea container to apply new cache configuration from docker-compose.yml
# Usage:
# ansible-playbook -i inventory/production.yml playbooks/restart-gitea-with-cache.yml
- name: Restart Gitea with Redis Cache Enabled
hosts: production
vars:
gitea_stack_path: "{{ stacks_base_path }}/gitea"
gitea_url: "https://{{ gitea_domain }}"
tasks:
- name: Verify Gitea container exists
shell: |
docker compose -f {{ gitea_stack_path }}/docker-compose.yml ps gitea | grep -q "gitea"
register: gitea_exists
changed_when: false
failed_when: false
- name: Fail if Gitea container does not exist
fail:
msg: "Gitea container does not exist. Please deploy Gitea stack first."
when: gitea_exists.rc != 0
- name: Recreate Gitea container with new cache configuration
shell: |
cd {{ gitea_stack_path }} && \
docker compose up -d --force-recreate gitea
register: gitea_recreated
- name: Wait for Gitea to be ready after restart
uri:
url: "{{ gitea_url }}/api/healthz"
method: GET
status_code: [200]
validate_certs: false
timeout: 10
register: gitea_health_after_restart
until: gitea_health_after_restart.status == 200
retries: 30
delay: 5
changed_when: false
- name: Display success message
debug:
msg: |
Gitea has been restarted successfully with Redis cache enabled!
Cache configuration:
- ENABLED: true
- ADAPTER: redis
- HOST: redis:6379
- DB: 0
Gitea should now use Redis for caching, improving performance.

View File

@@ -0,0 +1,210 @@
# Traefik/Gitea Redeploy Guide
This guide explains how to perform a clean redeployment of Traefik and Gitea stacks.
## Overview
A clean redeploy:
- Stops and removes containers (preserves volumes and SSL certificates)
- Syncs latest configurations
- Redeploys stacks with fresh containers
- Restores configurations
- Verifies service discovery
**Expected downtime**: ~2-5 minutes
## Prerequisites
- Ansible installed locally
- SSH access to production server
- Vault password file: `deployment/ansible/secrets/.vault_pass`
## Step-by-Step Guide
### Step 1: Backup
**Automatic backup (recommended):**
```bash
cd deployment/ansible
ansible-playbook -i inventory/production.yml \
playbooks/maintenance/backup-before-redeploy.yml \
--vault-password-file secrets/.vault_pass
```
**Manual backup:**
```bash
# On server
cd /home/deploy/deployment/stacks
docker compose -f gitea/docker-compose.yml exec gitea cat /data/gitea/conf/app.ini > /tmp/gitea-app.ini.backup
cp traefik/acme.json /tmp/acme.json.backup
```
### Step 2: Verify Backup
Check backup contents:
```bash
# Backup location will be shown in output
ls -lh /home/deploy/backups/redeploy-backup-*/
```
Verify:
- `acme.json` exists
- `gitea-app.ini` exists
- `gitea-volume-*.tar.gz` exists (if volumes were backed up)
### Step 3: Redeploy
**With automatic backup:**
```bash
cd deployment/ansible
ansible-playbook -i inventory/production.yml \
playbooks/setup/redeploy-traefik-gitea-clean.yml \
--vault-password-file secrets/.vault_pass
```
**With existing backup:**
```bash
cd deployment/ansible
ansible-playbook -i inventory/production.yml \
playbooks/setup/redeploy-traefik-gitea-clean.yml \
--vault-password-file secrets/.vault_pass \
-e "backup_name=redeploy-backup-1234567890" \
-e "skip_backup=true"
```
### Step 4: Verify Deployment
**Check Gitea accessibility:**
```bash
curl -k https://git.michaelschiemer.de/api/healthz
```
**Check Traefik service discovery:**
```bash
# On server
cd /home/deploy/deployment/stacks/traefik
docker compose exec traefik traefik show providers docker | grep -i gitea
```
**Check container status:**
```bash
# On server
docker ps | grep -E "traefik|gitea"
```
### Step 5: Troubleshooting
**If Gitea is not reachable:**
1. Check Gitea logs:
```bash
cd /home/deploy/deployment/stacks/gitea
docker compose logs gitea --tail=50
```
2. Check Traefik logs:
```bash
cd /home/deploy/deployment/stacks/traefik
docker compose logs traefik --tail=50
```
3. Check service discovery:
```bash
cd /home/deploy/deployment/stacks/traefik
docker compose exec traefik traefik show providers docker
```
4. Run diagnosis:
```bash
cd deployment/ansible
ansible-playbook -i inventory/production.yml \
playbooks/diagnose/gitea.yml \
--vault-password-file secrets/.vault_pass
```
**If SSL certificate issues:**
1. Check acme.json permissions:
```bash
ls -l /home/deploy/deployment/stacks/traefik/acme.json
# Should be: -rw------- (600)
```
2. Check Traefik ACME logs:
```bash
cd /home/deploy/deployment/stacks/traefik
docker compose logs traefik | grep -i acme
```
## Rollback Procedure
If something goes wrong, rollback to the backup:
```bash
cd deployment/ansible
ansible-playbook -i inventory/production.yml \
playbooks/maintenance/rollback-redeploy.yml \
--vault-password-file secrets/.vault_pass \
-e "backup_name=redeploy-backup-1234567890"
```
Replace `redeploy-backup-1234567890` with the actual backup name from Step 1.
## What Gets Preserved
- ✅ Gitea data (volumes)
- ✅ SSL certificates (acme.json)
- ✅ Gitea configuration (app.ini)
- ✅ Traefik configuration
- ✅ PostgreSQL data (if applicable)
## What Gets Recreated
- 🔄 Traefik container
- 🔄 Gitea container
- 🔄 Service discovery
## Common Issues
### Issue: Gitea returns 404 after redeploy
**Solution:**
1. Wait 1-2 minutes for service discovery
2. Restart Traefik: `cd /home/deploy/deployment/stacks/traefik && docker compose restart traefik`
3. Check if Gitea is in traefik-public network: `docker network inspect traefik-public | grep gitea`
### Issue: SSL certificate errors
**Solution:**
1. Verify acme.json permissions: `chmod 600 /home/deploy/deployment/stacks/traefik/acme.json`
2. Check Traefik logs for ACME errors
3. Wait 5-10 minutes for certificate renewal
### Issue: Gitea configuration lost
**Solution:**
1. Restore from backup: `playbooks/maintenance/rollback-redeploy.yml`
2. Or manually restore app.ini:
```bash
cd /home/deploy/deployment/stacks/gitea
docker compose exec gitea sh -c "cat > /data/gitea/conf/app.ini" < /path/to/backup/gitea-app.ini
docker compose restart gitea
```
## Best Practices
1. **Always backup before redeploy** - Use automatic backup
2. **Test in staging first** - If available
3. **Monitor during deployment** - Watch logs in separate terminal
4. **Have rollback ready** - Know backup name before starting
5. **Verify after deployment** - Check all services are accessible
## Related Playbooks
- `playbooks/maintenance/backup-before-redeploy.yml` - Create backup
- `playbooks/setup/redeploy-traefik-gitea-clean.yml` - Perform redeploy
- `playbooks/maintenance/rollback-redeploy.yml` - Rollback from backup
- `playbooks/diagnose/gitea.yml` - Diagnose Gitea issues
- `playbooks/diagnose/traefik.yml` - Diagnose Traefik issues

View File

@@ -0,0 +1,321 @@
---
# Clean Redeploy Traefik and Gitea Stacks
# Complete redeployment with backup, container recreation, and verification
#
# Usage:
# # With automatic backup
# ansible-playbook -i inventory/production.yml playbooks/setup/redeploy-traefik-gitea-clean.yml \
# --vault-password-file secrets/.vault_pass
#
# # With existing backup
# ansible-playbook -i inventory/production.yml playbooks/setup/redeploy-traefik-gitea-clean.yml \
# --vault-password-file secrets/.vault_pass \
# -e "backup_name=redeploy-backup-1234567890" \
# -e "skip_backup=true"
- name: Clean Redeploy Traefik and Gitea
hosts: production
gather_facts: yes
become: no
vars:
traefik_stack_path: "{{ stacks_base_path }}/traefik"
gitea_stack_path: "{{ stacks_base_path }}/gitea"
gitea_url: "https://{{ gitea_domain }}"
traefik_container_name: "traefik"
gitea_container_name: "gitea"
backup_base_path: "{{ backups_path | default('/home/deploy/backups') }}"
skip_backup: "{{ skip_backup | default(false) | bool }}"
backup_name: "{{ backup_name | default('') }}"
tasks:
# ========================================
# 1. BACKUP (unless skipped)
# ========================================
- name: Set backup name fact
ansible.builtin.set_fact:
actual_backup_name: "{{ backup_name | default('redeploy-backup-' + ansible_date_time.epoch) }}"
when: not skip_backup
- name: Display backup note
ansible.builtin.debug:
msg: |
⚠️ NOTE: Backup should be run separately before redeploy:
ansible-playbook -i inventory/production.yml playbooks/maintenance/backup-before-redeploy.yml \
--vault-password-file secrets/.vault_pass \
-e "backup_name={{ actual_backup_name }}"
Or use existing backup with: -e "backup_name=redeploy-backup-XXXXX" -e "skip_backup=true"
when: not skip_backup
- name: Display redeployment plan
ansible.builtin.debug:
msg: |
================================================================================
CLEAN REDEPLOY TRAEFIK AND GITEA
================================================================================
This playbook will:
1. ✅ Backup ({% if skip_backup %}SKIPPED{% else %}Performed{% endif %})
2. ✅ Stop and remove Traefik containers (keeps acme.json)
3. ✅ Stop and remove Gitea containers (keeps volumes/data)
4. ✅ Sync latest stack configurations
5. ✅ Redeploy Traefik stack
6. ✅ Redeploy Gitea stack
7. ✅ Restore Gitea configuration (app.ini)
8. ✅ Verify service discovery
9. ✅ Test Gitea accessibility
⚠️ IMPORTANT:
- SSL certificates (acme.json) will be preserved
- Gitea data (volumes) will be preserved
- Only containers will be recreated
- Expected downtime: ~2-5 minutes
{% if not skip_backup %}
- Backup location: {{ backup_base_path }}/{{ actual_backup_name }}
{% endif %}
================================================================================
# ========================================
# 2. STOP AND REMOVE CONTAINERS
# ========================================
- name: Stop Traefik stack
ansible.builtin.shell: |
cd {{ traefik_stack_path }}
docker compose down
register: traefik_stop
changed_when: traefik_stop.rc == 0
failed_when: false
- name: Remove Traefik containers (if any remain)
ansible.builtin.shell: |
docker ps -a --filter "name={{ traefik_container_name }}" --format "{{ '{{' }}.ID{{ '}}' }}" | xargs -r docker rm -f 2>/dev/null || true
register: traefik_remove
changed_when: traefik_remove.rc == 0
failed_when: false
- name: Stop Gitea stack (preserves volumes)
ansible.builtin.shell: |
cd {{ gitea_stack_path }}
docker compose down
register: gitea_stop
changed_when: gitea_stop.rc == 0
failed_when: false
- name: Remove Gitea containers (if any remain, volumes are preserved)
ansible.builtin.shell: |
docker ps -a --filter "name={{ gitea_container_name }}" --format "{{ '{{' }}.ID{{ '}}' }}" | xargs -r docker rm -f 2>/dev/null || true
register: gitea_remove
changed_when: gitea_remove.rc == 0
failed_when: false
# ========================================
# 3. SYNC CONFIGURATIONS
# ========================================
- name: Get stacks directory path
ansible.builtin.set_fact:
stacks_source_path: "{{ playbook_dir | dirname | dirname | dirname }}/stacks"
delegate_to: localhost
run_once: true
- name: Sync stacks directory to production server
ansible.builtin.synchronize:
src: "{{ stacks_source_path }}/"
dest: "{{ stacks_base_path }}/"
delete: no
recursive: yes
rsync_opts:
- "--chmod=D755,F644"
- "--exclude=.git"
- "--exclude=*.log"
- "--exclude=data/"
- "--exclude=volumes/"
- "--exclude=acme.json" # Preserve SSL certificates
- "--exclude=*.key"
- "--exclude=*.pem"
# ========================================
# 4. ENSURE ACME.JSON EXISTS
# ========================================
- name: Check if acme.json exists
ansible.builtin.stat:
path: "{{ traefik_stack_path }}/acme.json"
register: acme_json_stat
- name: Ensure acme.json exists and has correct permissions
ansible.builtin.file:
path: "{{ traefik_stack_path }}/acme.json"
state: touch
mode: '0600'
owner: "{{ ansible_user }}"
group: "{{ ansible_user }}"
become: yes
register: acme_json_ensure
# ========================================
# 5. REDEPLOY TRAEFIK
# ========================================
- name: Deploy Traefik stack
community.docker.docker_compose_v2:
project_src: "{{ traefik_stack_path }}"
state: present
pull: always
register: traefik_deploy
- name: Wait for Traefik to be ready
ansible.builtin.shell: |
cd {{ traefik_stack_path }}
docker compose ps {{ traefik_container_name }} | grep -Eiq "Up|running"
register: traefik_ready
changed_when: false
until: traefik_ready.rc == 0
retries: 12
delay: 5
failed_when: traefik_ready.rc != 0
# ========================================
# 6. REDEPLOY GITEA
# ========================================
- name: Deploy Gitea stack
community.docker.docker_compose_v2:
project_src: "{{ gitea_stack_path }}"
state: present
pull: always
register: gitea_deploy
- name: Wait for Gitea to be ready
ansible.builtin.shell: |
cd {{ gitea_stack_path }}
docker compose ps {{ gitea_container_name }} | grep -Eiq "Up|running"
register: gitea_ready
changed_when: false
until: gitea_ready.rc == 0
retries: 12
delay: 5
failed_when: gitea_ready.rc != 0
- name: Wait for Gitea to be healthy
ansible.builtin.shell: |
cd {{ gitea_stack_path }}
docker compose exec -T {{ gitea_container_name }} curl -f http://localhost:3000/api/healthz 2>&1 | grep -q "status.*pass" && echo "HEALTHY" || echo "NOT_HEALTHY"
register: gitea_health
changed_when: false
until: gitea_health.stdout == "HEALTHY"
retries: 30
delay: 2
failed_when: false
# ========================================
# 7. RESTORE GITEA CONFIGURATION
# ========================================
- name: Restore Gitea app.ini from backup
ansible.builtin.shell: |
if [ -f "{{ backup_base_path }}/{{ actual_backup_name }}/gitea-app.ini" ]; then
cd {{ gitea_stack_path }}
docker compose exec -T {{ gitea_container_name }} sh -c "cat > /data/gitea/conf/app.ini" < "{{ backup_base_path }}/{{ actual_backup_name }}/gitea-app.ini"
docker compose restart {{ gitea_container_name }}
echo "app.ini restored and Gitea restarted"
else
echo "No app.ini backup found, using default configuration"
fi
when: not skip_backup
register: gitea_app_ini_restore
changed_when: false
failed_when: false
# ========================================
# 8. VERIFY SERVICE DISCOVERY
# ========================================
- name: Wait for service discovery (Traefik needs time to discover Gitea)
ansible.builtin.pause:
seconds: 15
- name: Check if Gitea is in traefik-public network
ansible.builtin.shell: |
docker network inspect traefik-public --format '{{ '{{' }}range .Containers{{ '}}' }}{{ '{{' }}.Name{{ '}}' }} {{ '{{' }}end{{ '}}' }}' 2>/dev/null | grep -q {{ gitea_container_name }} && echo "YES" || echo "NO"
register: gitea_in_network
changed_when: false
- name: Test direct connection from Traefik to Gitea
ansible.builtin.shell: |
cd {{ traefik_stack_path }}
docker compose exec -T {{ traefik_container_name }} wget -qO- --timeout=5 http://{{ gitea_container_name }}:3000/api/healthz 2>&1 | head -5 || echo "CONNECTION_FAILED"
register: traefik_gitea_direct
changed_when: false
failed_when: false
# ========================================
# 9. FINAL VERIFICATION
# ========================================
- name: Test Gitea via HTTPS (with retries)
ansible.builtin.uri:
url: "{{ gitea_url }}/api/healthz"
method: GET
status_code: [200]
validate_certs: false
timeout: 10
register: gitea_https_test
until: gitea_https_test.status == 200
retries: 20
delay: 3
changed_when: false
failed_when: false
- name: Check SSL certificate status
ansible.builtin.shell: |
cd {{ traefik_stack_path }}
if [ -f acme.json ] && [ -s acme.json ]; then
echo "SSL certificates: PRESENT"
else
echo "SSL certificates: MISSING or EMPTY"
fi
register: ssl_status
changed_when: false
- name: Final status summary
ansible.builtin.debug:
msg: |
================================================================================
REDEPLOYMENT SUMMARY
================================================================================
Traefik:
- Status: {{ traefik_ready.rc | ternary('Up', 'Down') }}
- SSL Certificates: {{ ssl_status.stdout }}
Gitea:
- Status: {{ gitea_ready.rc | ternary('Up', 'Down') }}
- Health: {% if gitea_health.stdout == 'HEALTHY' %}✅ Healthy{% else %}❌ Not Healthy{% endif %}
- Configuration: {% if gitea_app_ini_restore.changed %}✅ Restored{% else %} Using default{% endif %}
Service Discovery:
- Gitea in network: {% if gitea_in_network.stdout == 'YES' %}✅{% else %}❌{% endif %}
- Direct connection: {% if 'CONNECTION_FAILED' not in traefik_gitea_direct.stdout %}✅{% else %}❌{% endif %}
Gitea Accessibility:
{% if gitea_https_test.status == 200 %}
✅ Gitea is reachable via HTTPS (Status: 200)
URL: {{ gitea_url }}
{% else %}
❌ Gitea is NOT reachable via HTTPS (Status: {{ gitea_https_test.status | default('TIMEOUT') }})
Possible causes:
1. SSL certificate is still being generated (wait 2-5 minutes)
2. Service discovery needs more time (wait 1-2 minutes)
3. Network configuration issue
Next steps:
- Wait 2-5 minutes and test again: curl -k {{ gitea_url }}/api/healthz
- Check Traefik logs: cd {{ traefik_stack_path }} && docker compose logs {{ traefik_container_name }} --tail=50
- Check Gitea logs: cd {{ gitea_stack_path }} && docker compose logs {{ gitea_container_name }} --tail=50
{% endif %}
{% if not skip_backup %}
Backup location: {{ backup_base_path }}/{{ actual_backup_name }}
To rollback: ansible-playbook -i inventory/production.yml playbooks/maintenance/rollback-redeploy.yml \
--vault-password-file secrets/.vault_pass \
-e "backup_name={{ actual_backup_name }}"
{% endif %}
================================================================================

View File

@@ -1,236 +0,0 @@
---
# Stabilize Traefik
# Stellt sicher, dass Traefik stabil läuft, acme.json korrekt ist und ACME-Challenges durchlaufen
- name: Stabilize Traefik
hosts: production
gather_facts: yes
become: no
vars:
traefik_stabilize_wait_minutes: "{{ traefik_stabilize_wait_minutes | default(10) }}"
traefik_stabilize_check_interval: 60 # Check every 60 seconds
tasks:
- name: Check if Traefik stack directory exists
ansible.builtin.stat:
path: "{{ traefik_stack_path | default('/home/deploy/deployment/stacks/traefik') }}"
register: traefik_stack_exists
- name: Fail if Traefik stack directory does not exist
ansible.builtin.fail:
msg: "Traefik stack directory not found at {{ traefik_stack_path | default('/home/deploy/deployment/stacks/traefik') }}"
when: not traefik_stack_exists.stat.exists
- name: Fix acme.json permissions first
ansible.builtin.file:
path: "{{ traefik_stack_path | default('/home/deploy/deployment/stacks/traefik') }}/acme.json"
state: file
mode: '0600'
owner: "{{ ansible_user | default('deploy') }}"
group: "{{ ansible_user | default('deploy') }}"
ignore_errors: yes
- name: Ensure Traefik container is running
ansible.builtin.shell: |
cd {{ traefik_stack_path | default('/home/deploy/deployment/stacks/traefik') }}
docker compose up -d traefik
register: traefik_start
changed_when: traefik_start.rc == 0
- name: Wait for Traefik to be ready
ansible.builtin.wait_for:
timeout: 30
delay: 2
changed_when: false
- name: Check Traefik container status
ansible.builtin.shell: |
cd {{ traefik_stack_path | default('/home/deploy/deployment/stacks/traefik') }}
docker compose ps traefik
register: traefik_status
changed_when: false
- name: Display Traefik status
ansible.builtin.debug:
msg: |
================================================================================
Traefik Container Status:
================================================================================
{{ traefik_status.stdout }}
================================================================================
- name: Check Traefik health
ansible.builtin.shell: |
cd {{ traefik_stack_path | default('/home/deploy/deployment/stacks/traefik') }}
docker compose exec -T traefik traefik healthcheck --ping 2>&1 || echo "HEALTH_CHECK_FAILED"
register: traefik_health
changed_when: false
failed_when: false
- name: Display Traefik health check
ansible.builtin.debug:
msg: |
================================================================================
Traefik Health Check:
================================================================================
{% if 'HEALTH_CHECK_FAILED' not in traefik_health.stdout %}
✅ Traefik is healthy
{% else %}
⚠️ Traefik health check failed: {{ traefik_health.stdout }}
{% endif %}
================================================================================
- name: Verify acme.json permissions
ansible.builtin.stat:
path: "{{ traefik_stack_path | default('/home/deploy/deployment/stacks/traefik') }}/acme.json"
register: acme_json_stat
- name: Fix acme.json permissions if needed
ansible.builtin.file:
path: "{{ traefik_stack_path | default('/home/deploy/deployment/stacks/traefik') }}/acme.json"
mode: '0600'
owner: "{{ ansible_user | default('deploy') }}"
group: "{{ ansible_user | default('deploy') }}"
when: acme_json_stat.stat.mode | string | regex_replace('^0o?', '') != '0600'
- name: Display acme.json status
ansible.builtin.debug:
msg: |
================================================================================
acme.json Status:
================================================================================
Path: {{ acme_json_stat.stat.path }}
Mode: {{ acme_json_stat.stat.mode | string | regex_replace('^0o?', '') }}
{% if acme_json_stat.stat.mode | string | regex_replace('^0o?', '') == '0600' %}
✅ acme.json has correct permissions (600)
{% else %}
⚠️ acme.json permissions need to be fixed
{% endif %}
================================================================================
- name: Check Port 80/443 configuration
ansible.builtin.shell: |
echo "=== Port 80 ==="
ss -tlnp 2>/dev/null | grep ":80 " || netstat -tlnp 2>/dev/null | grep ":80 " || echo "Could not check port 80"
echo ""
echo "=== Port 443 ==="
ss -tlnp 2>/dev/null | grep ":443 " || netstat -tlnp 2>/dev/null | grep ":443 " || echo "Could not check port 443"
register: port_config_check
changed_when: false
- name: Display Port configuration
ansible.builtin.debug:
msg: |
================================================================================
Port-Konfiguration (80/443):
================================================================================
{{ port_config_check.stdout }}
================================================================================
- name: Get initial Traefik restart count
ansible.builtin.shell: |
docker inspect traefik --format '{{ '{{' }}.RestartCount{{ '}}' }}' 2>/dev/null || echo "0"
register: initial_restart_count
changed_when: false
- name: Display initial restart count
ansible.builtin.debug:
msg: |
================================================================================
Initial Traefik Restart Count: {{ initial_restart_count.stdout }}
================================================================================
- name: Wait for ACME challenges to complete
ansible.builtin.debug:
msg: |
================================================================================
Warte auf ACME-Challenge-Abschluss...
================================================================================
Warte {{ traefik_stabilize_wait_minutes }} Minuten und prüfe alle {{ traefik_stabilize_check_interval }} Sekunden
ob Traefik stabil läuft und keine Restarts auftreten.
================================================================================
- name: Monitor Traefik stability
ansible.builtin.shell: |
cd {{ traefik_stack_path | default('/home/deploy/deployment/stacks/traefik') }}
docker compose ps traefik --format "{{ '{{' }}.State{{ '}}' }}" | head -1 || echo "UNKNOWN"
register: traefik_state_check
changed_when: false
until: traefik_state_check.stdout == "running"
retries: "{{ (traefik_stabilize_wait_minutes | int * 60 / traefik_stabilize_check_interval) | int }}"
delay: "{{ traefik_stabilize_check_interval }}"
- name: Get final Traefik restart count
ansible.builtin.shell: |
docker inspect traefik --format '{{ '{{' }}.RestartCount{{ '}}' }}' 2>/dev/null || echo "0"
register: final_restart_count
changed_when: false
- name: Check for Traefik restarts during monitoring
ansible.builtin.set_fact:
traefik_restarted: "{{ (final_restart_count.stdout | int) > (initial_restart_count.stdout | int) }}"
- name: Check Traefik logs for ACME errors
ansible.builtin.shell: |
cd {{ traefik_stack_path | default('/home/deploy/deployment/stacks/traefik') }}
docker compose logs traefik --since {{ traefik_stabilize_wait_minutes }}m 2>&1 | grep -i "acme\|challenge\|certificate" | tail -20 || echo "No ACME-related messages in logs"
register: traefik_acme_logs
changed_when: false
- name: Display Traefik ACME logs
ansible.builtin.debug:
msg: |
================================================================================
Traefik ACME Logs (letzte {{ traefik_stabilize_wait_minutes }} Minuten):
================================================================================
{{ traefik_acme_logs.stdout }}
================================================================================
- name: Final status check
ansible.builtin.shell: |
cd {{ traefik_stack_path | default('/home/deploy/deployment/stacks/traefik') }}
docker compose ps traefik || echo "Could not get final status"
register: final_status
changed_when: false
- name: Summary
ansible.builtin.debug:
msg: |
================================================================================
ZUSAMMENFASSUNG - Traefik Stabilisierung:
================================================================================
Initial Restart Count: {{ initial_restart_count.stdout }}
Final Restart Count: {{ final_restart_count.stdout }}
{% if traefik_restarted %}
⚠️ WARNUNG: Traefik wurde während der Überwachung neu gestartet!
Restart Count erhöht sich von {{ initial_restart_count.stdout }} auf {{ final_restart_count.stdout }}
Nächste Schritte:
- Führe diagnose-traefik-restarts.yml aus um die Ursache zu finden
- Prüfe Docker-Events und Logs für Restart-Gründe
{% else %}
✅ Traefik lief stabil während der Überwachung ({{ traefik_stabilize_wait_minutes }} Minuten)
Keine Restarts aufgetreten.
{% endif %}
Final Status: {{ final_status.stdout }}
{% if acme_json_stat.stat.mode | string | regex_replace('^0o?', '') == '0600' %}
✅ acme.json hat korrekte Berechtigungen
{% else %}
⚠️ acme.json Berechtigungen müssen korrigiert werden
{% endif %}
Wichtig:
- Traefik muss stabil laufen (keine häufigen Restarts)
- Port 80/443 müssen auf Traefik zeigen
- acme.json muss beschreibbar sein
- ACME-Challenges benötigen 5-10 Minuten um abzuschließen
Nächste Schritte:
- Prüfe Traefik-Logs regelmäßig auf ACME-Fehler
- Stelle sicher, dass keine Auto-Restart-Mechanismen aktiv sind
- Überwache Traefik für weitere {{ traefik_stabilize_wait_minutes }} Minuten
================================================================================

View File

@@ -1,73 +0,0 @@
---
# Test Gitea After Connection Pool Fix
- name: Test Gitea After Connection Pool Fix
hosts: production
gather_facts: no
become: no
vars:
gitea_stack_path: "{{ stacks_base_path }}/gitea"
gitea_url: "https://{{ gitea_domain }}"
tasks:
- name: Test Gitea health endpoint
ansible.builtin.uri:
url: "{{ gitea_url }}/api/healthz"
method: GET
validate_certs: false
timeout: 35
register: gitea_test
changed_when: false
- name: Check Gitea logs for connection pool messages
ansible.builtin.shell: |
cd {{ gitea_stack_path }}
docker compose logs gitea --tail 100 | grep -iE "timeout.*authentication|connection.*pool|MAX_OPEN_CONNS|database.*pool" | tail -20 || echo "No connection pool messages found"
register: gitea_logs_check
changed_when: false
failed_when: false
- name: Check Postgres logs for authentication timeouts
ansible.builtin.shell: |
cd {{ gitea_stack_path }}
docker compose logs postgres --tail 50 | grep -iE "timeout.*authentication|authentication.*timeout" | tail -10 || echo "No authentication timeout messages found"
register: postgres_logs_check
changed_when: false
failed_when: false
- name: Display test results
ansible.builtin.debug:
msg: |
================================================================================
GITEA CONNECTION POOL FIX - TEST RESULTS
================================================================================
Health Check Result:
- Status: {{ gitea_test.status | default('TIMEOUT') }}
- Response Time: {{ gitea_test.elapsed | default('N/A') }}s
{% if gitea_test.status | default(0) == 200 %}
✅ Gitea is reachable
{% else %}
❌ Gitea returned status {{ gitea_test.status | default('TIMEOUT') }}
{% endif %}
Gitea Logs (Connection Pool):
{{ gitea_logs_check.stdout }}
Postgres Logs (Authentication Timeouts):
{{ postgres_logs_check.stdout }}
================================================================================
INTERPRETATION:
================================================================================
{% if 'timeout.*authentication' in gitea_logs_check.stdout | lower or 'timeout.*authentication' in postgres_logs_check.stdout | lower %}
⚠️ Authentication timeout messages still present
→ Connection pool settings may need further tuning
→ Consider increasing MAX_OPEN_CONNS or authentication_timeout
{% else %}
✅ No authentication timeout messages found
→ Connection pool fix appears to be working
{% endif %}
================================================================================

View File

@@ -1,82 +0,0 @@
---
# Ansible Playbook: Update Gitea Traefik Service with Current IP
#
# ⚠️ DEPRECATED: This playbook is no longer needed since Traefik runs in bridge network mode.
# Service discovery via Docker labels works reliably in bridge mode, so manual IP updates
# are not required. This playbook is kept for reference only.
#
# Purpose: Update Traefik dynamic config with current Gitea container IP
# Usage:
# ansible-playbook -i inventory/production.yml playbooks/update-gitea-traefik-service.yml \
# --vault-password-file secrets/.vault_pass
- name: Update Gitea Traefik Service with Current IP
hosts: production
vars:
traefik_stack_path: "{{ stacks_base_path }}/traefik"
gitea_url: "https://{{ gitea_domain }}"
tasks:
- name: Warn that this playbook is deprecated
ansible.builtin.fail:
msg: |
⚠️ This playbook is DEPRECATED and should not be used.
Traefik service discovery via Docker labels works reliably in bridge mode.
If you really need to run this, set traefik_auto_restart=true explicitly.
when: traefik_auto_restart | default(false) | bool == false
- name: Get current Gitea container IP in traefik-public network
shell: |
docker inspect gitea | grep -A 10 'traefik-public' | grep IPAddress | head -1 | awk '{print $2}' | tr -d '",'
register: gitea_ip
changed_when: false
- name: Display Gitea IP
debug:
msg: "Gitea container IP: {{ gitea_ip.stdout }}"
- name: Create Gitea service configuration with current IP
copy:
dest: "{{ traefik_stack_path }}/dynamic/gitea-service.yml"
content: |
http:
services:
gitea:
loadBalancer:
servers:
- url: http://{{ gitea_ip.stdout }}:3000
mode: '0644'
- name: Restart Traefik to load new configuration
shell: |
docker compose -f {{ traefik_stack_path }}/docker-compose.yml restart traefik
when: traefik_auto_restart | default(false) | bool
register: traefik_restart
changed_when: traefik_restart.rc == 0
- name: Wait for Traefik to be ready
pause:
seconds: 10
when: traefik_restart.changed | default(false) | bool
- name: Test Gitea via Traefik
uri:
url: "{{ gitea_url }}/api/healthz"
method: GET
status_code: [200]
validate_certs: false
timeout: 10
register: final_test
retries: 5
delay: 2
changed_when: false
- name: Display result
debug:
msg: |
Gitea-Traefik connection:
- Gitea IP: {{ gitea_ip.stdout }}
- Via Traefik: {{ 'OK' if final_test.status == 200 else 'FAILED' }}
Note: This is a temporary fix. The IP will need to be updated if the container restarts.

View File

@@ -1,143 +0,0 @@
---
# Verify Traefik Restart Loop Fix
# Prüft ob die Änderungen (traefik_auto_restart: false) die Restart-Loops beheben
- name: Verify Traefik Restart Loop Fix
hosts: production
gather_facts: yes
become: no
vars:
traefik_stack_path: "{{ stacks_base_path }}/traefik"
monitor_duration_minutes: 10 # 10 Minuten Monitoring
tasks:
- name: Display current configuration
ansible.builtin.debug:
msg: |
================================================================================
TRAEFIK RESTART LOOP FIX - VERIFICATION:
================================================================================
Aktuelle Konfiguration:
- traefik_auto_restart: {{ traefik_auto_restart | default('NOT SET') }}
- traefik_ssl_restart: {{ traefik_ssl_restart | default('NOT SET') }}
- gitea_auto_restart: {{ gitea_auto_restart | default('NOT SET') }}
Erwartetes Verhalten:
- Traefik sollte NICHT automatisch nach Config-Deployment neu starten
- Traefik sollte NICHT automatisch während SSL-Setup neu starten
- Gitea sollte NICHT automatisch bei Healthcheck-Fehlern neu starten
Monitoring: {{ monitor_duration_minutes }} Minuten
================================================================================
- name: Get initial Traefik status
ansible.builtin.shell: |
docker inspect traefik --format '{{ '{{' }}.State.Status{{ '}}' }}|{{ '{{' }}.State.StartedAt{{ '}}' }}|{{ '{{' }}.RestartCount{{ '}}' }}' 2>/dev/null || echo "UNKNOWN"
register: initial_traefik_status
changed_when: false
- name: Get initial Gitea status
ansible.builtin.shell: |
docker inspect gitea --format '{{ '{{' }}.State.Status{{ '}}' }}|{{ '{{' }}.State.StartedAt{{ '}}' }}|{{ '{{' }}.RestartCount{{ '}}' }}' 2>/dev/null || echo "UNKNOWN"
register: initial_gitea_status
changed_when: false
- name: Check Traefik logs for recent restarts
ansible.builtin.shell: |
cd {{ traefik_stack_path }}
docker compose logs traefik --since 1h 2>&1 | grep -iE "stopping server gracefully|I have to go" | wc -l
register: recent_restarts
changed_when: false
- name: Wait for monitoring period
ansible.builtin.pause:
minutes: "{{ monitor_duration_minutes }}"
- name: Get final Traefik status
ansible.builtin.shell: |
docker inspect traefik --format '{{ '{{' }}.State.Status{{ '}}' }}|{{ '{{' }}.State.StartedAt{{ '}}' }}|{{ '{{' }}.RestartCount{{ '}}' }}' 2>/dev/null || echo "UNKNOWN"
register: final_traefik_status
changed_when: false
- name: Get final Gitea status
ansible.builtin.shell: |
docker inspect gitea --format '{{ '{{' }}.State.Status{{ '}}' }}|{{ '{{' }}.State.StartedAt{{ '}}' }}|{{ '{{' }}.RestartCount{{ '}}' }}' 2>/dev/null || echo "UNKNOWN"
register: final_gitea_status
changed_when: false
- name: Check Traefik logs for restarts during monitoring
ansible.builtin.shell: |
cd {{ traefik_stack_path }}
docker compose logs traefik --since {{ monitor_duration_minutes }}m 2>&1 | grep -iE "stopping server gracefully|I have to go" || echo "Keine Restarts gefunden"
register: restarts_during_monitoring
changed_when: false
failed_when: false
- name: Test Gitea accessibility (multiple attempts)
ansible.builtin.uri:
url: "https://git.michaelschiemer.de/api/healthz"
method: GET
status_code: [200]
validate_certs: false
timeout: 10
register: gitea_test
until: gitea_test.status == 200
retries: 5
delay: 2
changed_when: false
failed_when: false
- name: Summary
ansible.builtin.debug:
msg: |
================================================================================
VERIFICATION SUMMARY:
================================================================================
Initial Status:
- Traefik: {{ initial_traefik_status.stdout }}
- Gitea: {{ initial_gitea_status.stdout }}
Final Status:
- Traefik: {{ final_traefik_status.stdout }}
- Gitea: {{ final_gitea_status.stdout }}
Restarts während Monitoring ({{ monitor_duration_minutes }} Minuten):
{% if restarts_during_monitoring.stdout and 'Keine Restarts' not in restarts_during_monitoring.stdout %}
❌ RESTARTS GEFUNDEN:
{{ restarts_during_monitoring.stdout }}
⚠️ PROBLEM: Traefik wurde während des Monitorings gestoppt!
→ Die Änderungen haben das Problem noch nicht vollständig behoben
→ Prüfe ob externe Ansible-Playbooks noch laufen
→ Prüfe ob andere Automatisierungen Traefik stoppen
{% else %}
✅ KEINE RESTARTS GEFUNDEN
Traefik lief stabil während des {{ monitor_duration_minutes }}-minütigen Monitorings!
→ Die Änderungen scheinen zu funktionieren
{% endif %}
Gitea Accessibility:
{% if gitea_test.status == 200 %}
✅ Gitea ist erreichbar (Status: 200)
{% else %}
❌ Gitea ist nicht erreichbar (Status: {{ gitea_test.status | default('TIMEOUT') }})
{% endif %}
================================================================================
NÄCHSTE SCHRITTE:
================================================================================
{% if restarts_during_monitoring.stdout and 'Keine Restarts' not in restarts_during_monitoring.stdout %}
1. ❌ Prüfe externe Ansible-Playbooks die noch laufen könnten
2. ❌ Prüfe CI/CD-Pipelines die Traefik restarten könnten
3. ❌ Führe 'find-ansible-automation-source.yml' erneut aus
{% else %}
1. ✅ Traefik läuft stabil - keine automatischen Restarts mehr
2. ✅ Überwache Traefik weiterhin für 1-2 Stunden um sicherzugehen
3. ✅ Teste Gitea im Browser: https://git.michaelschiemer.de
{% endif %}
================================================================================

View File

@@ -23,6 +23,10 @@ DOMAIN = {{ gitea_domain }}
HTTP_ADDR = 0.0.0.0
HTTP_PORT = 3000
ROOT_URL = https://{{ gitea_domain }}/
# LOCAL_ROOT_URL for internal access (Runner/Webhooks)
LOCAL_ROOT_URL = http://gitea:3000/
# Trust Traefik proxy (Docker network: 172.18.0.0/16)
PROXY_TRUSTED_PROXIES = 172.18.0.0/16,::1,127.0.0.1
DISABLE_SSH = false
START_SSH_SERVER = false
SSH_DOMAIN = {{ gitea_domain }}
@@ -68,7 +72,11 @@ HOST = redis://:{{ redis_password }}@redis:6379/0?pool_size=100&idle_timeout=180
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
[session]
PROVIDER = redis
PROVIDER_CONFIG = network=tcp,addr=redis:6379,password={{ redis_password }},db=0,pool_size=100,idle_timeout=180
# PROVIDER_CONFIG must be a Redis connection string (as per Gitea documentation)
# Format: redis://:password@host:port/db?pool_size=100&idle_timeout=180s
# Using same format as cache HOST and queue CONN_STR for consistency
PROVIDER_CONFIG = redis://:{{ redis_password }}@redis:6379/0?pool_size=100&idle_timeout=180s
SAME_SITE = lax
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; Queue Configuration (Redis)
@@ -82,6 +90,8 @@ CONN_STR = redis://:{{ redis_password }}@redis:6379/0
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
[security]
INSTALL_LOCK = true
# Cookie security (only if ROOT_URL is https)
COOKIE_SECURE = true
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; Service Configuration

View File

@@ -218,3 +218,4 @@ ansible-playbook -i inventory/production.yml \

View File

@@ -37,8 +37,16 @@ services:
- "traefik.http.routers.gitea.priority=100"
# Service configuration (Docker provider uses port, not url)
- "traefik.http.services.gitea.loadbalancer.server.port=3000"
# Middleware chain (removed temporarily to test if it causes issues)
# - "traefik.http.routers.gitea.middlewares=security-headers-global@file,gzip-compression@file"
# ServersTransport for longer timeouts (prevents 504 for SSE/Long-Polling like /user/events)
# Temporarily removed to test if this is causing the service discovery issue
# - "traefik.http.services.gitea.loadbalancer.serversTransport=gitea-transport@docker"
# - "traefik.http.serverstransports.gitea-transport.forwardingtimeouts.dialtimeout=10s"
# - "traefik.http.serverstransports.gitea-transport.forwardingtimeouts.responseheadertimeout=120s"
# - "traefik.http.serverstransports.gitea-transport.forwardingtimeouts.idleconntimeout=180s"
# - "traefik.http.serverstransports.gitea-transport.maxidleconnsperhost=100"
# X-Forwarded-Proto header (helps with redirects/cookies)
- "traefik.http.middlewares.gitea-headers.headers.customrequestheaders.X-Forwarded-Proto=https"
- "traefik.http.routers.gitea.middlewares=gitea-headers@docker"
# Explicitly reference the service (like MinIO does)
- "traefik.http.routers.gitea.service=gitea"
healthcheck: