The previous regex was removing port numbers from registry URLs.
Now using sed to only remove the image name part after the slash,
preserving the full registry URL including port (e.g. git.michaelschiemer.de:5000)
Registry Login Fixes:
- Filter out service names (minio, redis) from registry URL extraction
- Only recognize actual registry URLs (with TLD or port)
- Preserve port numbers in registry URLs (e.g. git.michaelschiemer.de:5000)
- Better error messages for failed logins
Traefik Restart Loop Prevention:
- Set traefik_auto_restart default to false in traefik role
- Add traefik_auto_restart, traefik_ssl_restart, gitea_auto_restart to staging vars
- Add guard to fix-gitea-traefik-connection.yml restart task
- Add guard and deprecation warning to update-gitea-traefik-service.yml
This ensures that:
- CI/CD pipelines won't cause Traefik restart loops
- Staging environment uses same safe defaults as production
- Deprecated playbooks fail by default unless explicitly enabled
- Only actual Docker registries are used for login, not service names
- Extract actual registry URLs from docker-compose files
- Login to all registries found in compose files (e.g. git.michaelschiemer.de:5000)
- This fixes the 'no basic auth credentials' error when pulling images
- The playbook now automatically detects which registry is used in compose files
- Falls back to docker_registry variable if no registry found in compose files
- Add traefik_auto_restart check to fix-gitea-timeouts.yml
- Add traefik_auto_restart check to fix-gitea-ssl-routing.yml
- Add traefik_auto_restart check to fix-gitea-complete.yml
- Set traefik_auto_restart=false in all Gitea workflow Ansible calls
- Set gitea_auto_restart=false in all Gitea workflow Ansible calls
- Add redeploy-traefik-gitea.yml playbook for clean redeployment
This prevents CI/CD pipelines from causing Traefik restart loops by
ensuring all remediation playbooks respect the traefik_auto_restart
flag, which is set to false in group_vars/production/vars.yml.
- Set traefik_auto_restart: false in group_vars to prevent automatic restarts after config deployment
- Set traefik_ssl_restart: false to prevent automatic restarts during SSL certificate setup
- Set gitea_auto_restart: false to prevent automatic restarts when healthcheck fails
- Modify traefik/tasks/ssl.yml to only restart if explicitly requested or acme.json was created
- Modify traefik/tasks/config.yml to respect traefik_auto_restart flag
- Modify gitea/tasks/restart.yml to respect gitea_auto_restart flag
- Add verify-traefik-fix.yml playbook to monitor Traefik stability
This fixes the issue where Traefik was restarting every minute due to
automatic restart mechanisms triggered by config deployments and health checks.
The restart loops caused 504 Gateway Timeouts for Gitea and other services.
Fixes: Traefik restart loop causing service unavailability
- Add deploy-traefik-config.yml to copy updated config files to server
- Deploys docker-compose.yml and traefik.yml
- Shows deployment status and next steps
- Required before restarting Traefik with new configuration
- Change health check to use docker exec traefik healthcheck
- HTTP ping endpoint requires BasicAuth (401), internal check is more reliable
- Improves health check accuracy in restart-traefik.yml playbook
- Add restart-traefik.yml playbook to restart Traefik container
- Verify Traefik health after restart
- Check for ACME challenge errors in logs
- Display status summary with next steps
- Useful after Traefik configuration changes
- Add explicit exclusion of /.well-known/acme-challenge from catch-all redirect
- Ensures ACME challenges are never redirected to HTTPS
- Traefik handles ACME challenges automatically, but explicit exclusion is safer
- Remove explicit ACME challenge router that had no service defined
- Traefik handles ACME challenges automatically when httpChallenge.entryPoint is set
- The router was interfering with automatic challenge handling
- Fixes 'Cannot retrieve the ACME challenge' errors in Traefik logs
- Increase fetch_interval from 2s to 10s to reduce load on Gitea
- Increase fetch_timeout from 5s to 30s for better error handling
- Add documentation about runner overloading Gitea and how to fix it
- Prevents 504 errors caused by runner bombarding Gitea with requests
- Add retry logic (5 retries, 10s delay) to git clone and update tasks
- Handle 504 Gateway Timeout errors from Gitea gracefully
- Fail with clear error message if all retries are exhausted
- Prevents workflow failures due to temporary Gitea unavailability
- Add FIX_RUNNER_CONFIG.md with manual steps to re-register runner
- Add fix-gitea-runner-config.yml to diagnose runner issues
- Add register-gitea-runner.yml to re-register runner via Ansible
- Fixes issue where runner falls back to GitHub on 504 errors
- Add fix-gitea-runner-config.yml to diagnose runner configuration issues
- Add register-gitea-runner.yml to re-register runner with correct Gitea URL
- Check for GitHub URLs in runner configuration (should only use git.michaelschiemer.de)
- Verify .env file has correct GITEA_INSTANCE_URL
- Fixes 504 timeouts caused by runner trying to connect to GitHub fallback
- Check Gitea container status
- Check Gitea health endpoint
- Display container logs
- Restart container if unhealthy or not running
- Wait for Gitea to be ready after restart
- Display comprehensive status summary
- Helps diagnose and fix 504 Gateway Timeout issues
- Change Docker daemon wait from TCP port 2375 to Unix socket /var/run/docker.sock
- Add Docker registry login task before docker compose up
- Ensures authentication is available when pulling images
- Fixes 'no basic auth credentials' error during image pull
- Replace append() with list concatenation (+ operator)
- Use combine filter instead of update() method
- Avoids 'unsafe append' error with AnsibleLazyTemplateList
- All operations are now immutable and safe
- Replace Python heredoc in shell script with native Ansible modules
- Use slurp to read existing daemon.json
- Use set_fact and copy modules to update configuration
- Fixes YAML parsing error with heredoc syntax
- More idempotent and Ansible-native approach
- Add Docker daemon configuration to use HTTP for git.michaelschiemer.de:5000 registry
- Configure insecure-registries in /etc/docker/daemon.json
- Add GIT_BRANCH environment variable (staging for staging, main for production)
- Set default GIT_REPOSITORY_URL if not provided
- Fixes 'http: server gave HTTP response to HTTPS client' error
- Fixes missing GIT_BRANCH variable warnings
- Change registry_accessible to string comparison ('true'/'false') instead of bool
- Fix 'argument of type bool is not iterable' error in when conditions
- Set correct owner/group for .env file (ansible_user instead of root)
- Fixes 'permission denied' error when docker compose reads .env file
- Fix 'argument of type bool is not iterable' error in image pull task
- Check if .env file exists before docker compose up
- Create minimal .env file if it doesn't exist with required variables
- Load secrets from vault file if available
- Set database and MinIO variables from vault or defaults
- Pass environment variables to docker compose command
- Fixes missing MINIO_ROOT_USER, DB_USERNAME, DB_PASSWORD, SECRETS_DIR errors
- Add registry_accessible flag to safely check registry status
- Fix 'argument of type bool is not iterable' error in when conditions
- Only pull image if registry is accessible
- Add ignore_errors to image pull task to prevent failures
- Improves handling of registry connectivity issues
- Fix recursive loop in app_name variable
- Set app_name and deploy_image using set_fact tasks
- Replace application_stack_dest with application_code_dest (consistent with other playbooks)
- Change registry URL from HTTP to HTTPS
- Add validate_certs: no for registry accessibility check
- Fixes 'Recursive loop detected' error in image deployment
- Remove container start logic - containers should be started by deploy-image.yml
- Add clear error message if container is not running
- Provides helpful instructions for manual container start if needed
- Check if container is running before executing composer
- Start container if not running
- Display detailed error output for debugging
- Fixes composer install failures when container is not running
- Change from stacks path to application code directory (/home/deploy/michaelschiemer/current)
- docker-compose files are in the application root, not in deployment/stacks
- Fixes 'no such file or directory' error for docker-compose.base.yml
- Change stacks_base_path_default from /home/deploy to /home/deploy/deployment/stacks
- Matches actual server directory structure where stacks are located
- Check if destination directory exists separately from git repo check
- Remove directory if it exists but is not a git repository
- Prevents 'destination path already exists' error during clone
- ansible.builtin.git no longer supports owner and group parameters
- Set ownership in separate file task after git operations
- Fixes 'Unsupported parameters' error
- Use git_repo_url instead of git_repository_url in tasks
- Set git_repo_url based on whether git_repository_url is provided
- This completely avoids the recursive loop issue
- Change git_repository_url to use git_repository_url_default instead of self-reference
- Fixes 'Recursive loop detected in template' error in Ansible playbook
- Test commit to verify that workflow can now:
- Use php-ci image with Ansible
- Use ANSIBLE_VAULT_PASSWORD secret for vault decryption
- Successfully deploy to staging
- Add build-ci-image-production.sh script for building CI images on production
- Add BUILD_ON_PRODUCTION.md documentation
- Fix Dockerfile to handle optional PECL extensions for PHP 8.5 RC
This fixes the issue where Gitea workflows fail with:
'Error response from daemon: pull access denied for php-ci'
- Add gitea-service.yml with proper timeout configuration
- Service definition required for Traefik to route to Gitea
- Replaces old gitea.yml file that was removed
- Change Traefik local HTTP port from 8080 to 8081 (conflict with cadvisor)
- Change Traefik dashboard port to 8093 (conflicts with cadvisor, Hyperion)
- Update Gitea SSH service IP from 172.23.0.2 to 172.23.0.3
- Note: Gitea SSH works directly via Docker port mapping in local dev
- Traefik TCP routing only needed for production (host network mode)