fix(ansible): Prevent Traefik and Gitea restart loops
Some checks failed
Security Vulnerability Scan / Check for Dependency Changes (push) Successful in 29s
Security Vulnerability Scan / Composer Security Audit (push) Has been skipped
🚀 Build & Deploy Image / Determine Build Necessity (push) Failing after 11m3s
🚀 Build & Deploy Image / Build Runtime Base Image (push) Has been cancelled
🚀 Build & Deploy Image / Run Tests & Quality Checks (push) Has been cancelled
🚀 Build & Deploy Image / Build Docker Image (push) Has been cancelled
🚀 Build & Deploy Image / Auto-deploy to Staging (push) Has been cancelled
🚀 Build & Deploy Image / Auto-deploy to Production (push) Has been cancelled

- Set traefik_auto_restart: false in group_vars to prevent automatic restarts after config deployment
- Set traefik_ssl_restart: false to prevent automatic restarts during SSL certificate setup
- Set gitea_auto_restart: false to prevent automatic restarts when healthcheck fails
- Modify traefik/tasks/ssl.yml to only restart if explicitly requested or acme.json was created
- Modify traefik/tasks/config.yml to respect traefik_auto_restart flag
- Modify gitea/tasks/restart.yml to respect gitea_auto_restart flag
- Add verify-traefik-fix.yml playbook to monitor Traefik stability

This fixes the issue where Traefik was restarting every minute due to
automatic restart mechanisms triggered by config deployments and health checks.
The restart loops caused 504 Gateway Timeouts for Gitea and other services.

Fixes: Traefik restart loop causing service unavailability
This commit is contained in:
2025-11-08 23:25:38 +01:00
parent aa9de7173d
commit bb7cf35e54
7 changed files with 645 additions and 0 deletions

View File

@@ -0,0 +1,143 @@
---
# Verify Traefik Restart Loop Fix
# Prüft ob die Änderungen (traefik_auto_restart: false) die Restart-Loops beheben
- name: Verify Traefik Restart Loop Fix
hosts: production
gather_facts: yes
become: no
vars:
traefik_stack_path: "{{ stacks_base_path }}/traefik"
monitor_duration_minutes: 10 # 10 Minuten Monitoring
tasks:
- name: Display current configuration
ansible.builtin.debug:
msg: |
================================================================================
TRAEFIK RESTART LOOP FIX - VERIFICATION:
================================================================================
Aktuelle Konfiguration:
- traefik_auto_restart: {{ traefik_auto_restart | default('NOT SET') }}
- traefik_ssl_restart: {{ traefik_ssl_restart | default('NOT SET') }}
- gitea_auto_restart: {{ gitea_auto_restart | default('NOT SET') }}
Erwartetes Verhalten:
- Traefik sollte NICHT automatisch nach Config-Deployment neu starten
- Traefik sollte NICHT automatisch während SSL-Setup neu starten
- Gitea sollte NICHT automatisch bei Healthcheck-Fehlern neu starten
Monitoring: {{ monitor_duration_minutes }} Minuten
================================================================================
- name: Get initial Traefik status
ansible.builtin.shell: |
docker inspect traefik --format '{{ '{{' }}.State.Status{{ '}}' }}|{{ '{{' }}.State.StartedAt{{ '}}' }}|{{ '{{' }}.RestartCount{{ '}}' }}' 2>/dev/null || echo "UNKNOWN"
register: initial_traefik_status
changed_when: false
- name: Get initial Gitea status
ansible.builtin.shell: |
docker inspect gitea --format '{{ '{{' }}.State.Status{{ '}}' }}|{{ '{{' }}.State.StartedAt{{ '}}' }}|{{ '{{' }}.RestartCount{{ '}}' }}' 2>/dev/null || echo "UNKNOWN"
register: initial_gitea_status
changed_when: false
- name: Check Traefik logs for recent restarts
ansible.builtin.shell: |
cd {{ traefik_stack_path }}
docker compose logs traefik --since 1h 2>&1 | grep -iE "stopping server gracefully|I have to go" | wc -l
register: recent_restarts
changed_when: false
- name: Wait for monitoring period
ansible.builtin.pause:
minutes: "{{ monitor_duration_minutes }}"
- name: Get final Traefik status
ansible.builtin.shell: |
docker inspect traefik --format '{{ '{{' }}.State.Status{{ '}}' }}|{{ '{{' }}.State.StartedAt{{ '}}' }}|{{ '{{' }}.RestartCount{{ '}}' }}' 2>/dev/null || echo "UNKNOWN"
register: final_traefik_status
changed_when: false
- name: Get final Gitea status
ansible.builtin.shell: |
docker inspect gitea --format '{{ '{{' }}.State.Status{{ '}}' }}|{{ '{{' }}.State.StartedAt{{ '}}' }}|{{ '{{' }}.RestartCount{{ '}}' }}' 2>/dev/null || echo "UNKNOWN"
register: final_gitea_status
changed_when: false
- name: Check Traefik logs for restarts during monitoring
ansible.builtin.shell: |
cd {{ traefik_stack_path }}
docker compose logs traefik --since {{ monitor_duration_minutes }}m 2>&1 | grep -iE "stopping server gracefully|I have to go" || echo "Keine Restarts gefunden"
register: restarts_during_monitoring
changed_when: false
failed_when: false
- name: Test Gitea accessibility (multiple attempts)
ansible.builtin.uri:
url: "https://git.michaelschiemer.de/api/healthz"
method: GET
status_code: [200]
validate_certs: false
timeout: 10
register: gitea_test
until: gitea_test.status == 200
retries: 5
delay: 2
changed_when: false
failed_when: false
- name: Summary
ansible.builtin.debug:
msg: |
================================================================================
VERIFICATION SUMMARY:
================================================================================
Initial Status:
- Traefik: {{ initial_traefik_status.stdout }}
- Gitea: {{ initial_gitea_status.stdout }}
Final Status:
- Traefik: {{ final_traefik_status.stdout }}
- Gitea: {{ final_gitea_status.stdout }}
Restarts während Monitoring ({{ monitor_duration_minutes }} Minuten):
{% if restarts_during_monitoring.stdout and 'Keine Restarts' not in restarts_during_monitoring.stdout %}
❌ RESTARTS GEFUNDEN:
{{ restarts_during_monitoring.stdout }}
⚠️ PROBLEM: Traefik wurde während des Monitorings gestoppt!
→ Die Änderungen haben das Problem noch nicht vollständig behoben
→ Prüfe ob externe Ansible-Playbooks noch laufen
→ Prüfe ob andere Automatisierungen Traefik stoppen
{% else %}
✅ KEINE RESTARTS GEFUNDEN
Traefik lief stabil während des {{ monitor_duration_minutes }}-minütigen Monitorings!
→ Die Änderungen scheinen zu funktionieren
{% endif %}
Gitea Accessibility:
{% if gitea_test.status == 200 %}
✅ Gitea ist erreichbar (Status: 200)
{% else %}
❌ Gitea ist nicht erreichbar (Status: {{ gitea_test.status | default('TIMEOUT') }})
{% endif %}
================================================================================
NÄCHSTE SCHRITTE:
================================================================================
{% if restarts_during_monitoring.stdout and 'Keine Restarts' not in restarts_during_monitoring.stdout %}
1. ❌ Prüfe externe Ansible-Playbooks die noch laufen könnten
2. ❌ Prüfe CI/CD-Pipelines die Traefik restarten könnten
3. ❌ Führe 'find-ansible-automation-source.yml' erneut aus
{% else %}
1. ✅ Traefik läuft stabil - keine automatischen Restarts mehr
2. ✅ Überwache Traefik weiterhin für 1-2 Stunden um sicherzugehen
3. ✅ Teste Gitea im Browser: https://git.michaelschiemer.de
{% endif %}
================================================================================