feat: optimize workflows with repository artifacts and add performance monitoring
Some checks failed
🚀 Build & Deploy Image / Determine Build Necessity (push) Failing after 33s
🚀 Build & Deploy Image / Build Runtime Base Image (push) Has been skipped
🚀 Build & Deploy Image / Build Docker Image (push) Has been skipped
Security Vulnerability Scan / Check for Dependency Changes (push) Successful in 32s
🚀 Build & Deploy Image / Run Tests & Quality Checks (push) Has been skipped
🚀 Build & Deploy Image / Auto-deploy to Staging (push) Has been skipped
🚀 Build & Deploy Image / Auto-deploy to Production (push) Has been skipped
Security Vulnerability Scan / Composer Security Audit (push) Has been skipped
🧊 Warm Docker Build Cache / Refresh Buildx Caches (push) Failing after 11s
📊 Monitor Workflow Performance / Monitor Workflow Performance (push) Failing after 20s
Some checks failed
🚀 Build & Deploy Image / Determine Build Necessity (push) Failing after 33s
🚀 Build & Deploy Image / Build Runtime Base Image (push) Has been skipped
🚀 Build & Deploy Image / Build Docker Image (push) Has been skipped
Security Vulnerability Scan / Check for Dependency Changes (push) Successful in 32s
🚀 Build & Deploy Image / Run Tests & Quality Checks (push) Has been skipped
🚀 Build & Deploy Image / Auto-deploy to Staging (push) Has been skipped
🚀 Build & Deploy Image / Auto-deploy to Production (push) Has been skipped
Security Vulnerability Scan / Composer Security Audit (push) Has been skipped
🧊 Warm Docker Build Cache / Refresh Buildx Caches (push) Failing after 11s
📊 Monitor Workflow Performance / Monitor Workflow Performance (push) Failing after 20s
- Use repository artifacts in test and build jobs (reduces 2-3 git clones per run) - Add comprehensive workflow performance monitoring system - Add monitoring playbook and Gitea workflow for automated metrics collection - Add monitoring documentation and scripts Optimizations: - Repository artifact caching: changes job uploads repo, test/build jobs download it - Reduces Gitea load by eliminating redundant git operations - Faster job starts (artifact download is typically faster than git clone) Monitoring: - Script for local workflow metrics collection via Gitea API - Ansible playbook for server-side system and Gitea metrics - Automated Gitea workflow that runs every 6 hours - Tracks workflow durations, system load, Gitea API response times, and more
This commit is contained in:
173
monitoring/README.md
Normal file
173
monitoring/README.md
Normal file
@@ -0,0 +1,173 @@
|
||||
# Workflow Performance Monitoring
|
||||
|
||||
Dieses Verzeichnis enthält Tools und Metriken zur Überwachung der Workflow-Performance und Systemressourcen.
|
||||
|
||||
## Übersicht
|
||||
|
||||
Das Monitoring-System sammelt Metriken über:
|
||||
- Workflow-Ausführungszeiten
|
||||
- Gitea-Last und API-Antwortzeiten
|
||||
- Systemressourcen (CPU, Memory, Load)
|
||||
- Docker-Container-Status
|
||||
- Workflow-Optimierungen
|
||||
|
||||
## Komponenten
|
||||
|
||||
### 1. Monitoring-Script (`scripts/ci/monitor-workflow-performance.sh`)
|
||||
|
||||
Lokales Script zur Sammlung von Workflow-Metriken über die Gitea API.
|
||||
|
||||
**Verwendung:**
|
||||
```bash
|
||||
export GITEA_TOKEN="your-token"
|
||||
export GITEA_URL="https://git.michaelschiemer.de"
|
||||
export GITHUB_REPOSITORY="michael/michaelschiemer"
|
||||
|
||||
./scripts/ci/monitor-workflow-performance.sh
|
||||
```
|
||||
|
||||
**Ausgabe:**
|
||||
- JSON-Datei mit Metriken in `monitoring/workflow-metrics/`
|
||||
- Konsolen-Zusammenfassung
|
||||
|
||||
### 2. Ansible Playbook (`deployment/ansible/playbooks/monitor-workflow-performance.yml`)
|
||||
|
||||
Server-seitiges Monitoring.
|
||||
|
||||
**Verwendung:**
|
||||
```bash
|
||||
cd deployment/ansible
|
||||
ansible-playbook -i inventory/production.yml \
|
||||
playbooks/monitor-workflow-performance.yml \
|
||||
-e "monitoring_lookback_hours=24"
|
||||
```
|
||||
|
||||
**Gesammelte Metriken:**
|
||||
- System Load Average
|
||||
- CPU- und Memory-Nutzung
|
||||
- Docker-Container-Status
|
||||
- Gitea Runner-Status
|
||||
- Gitea API-Antwortzeiten
|
||||
- Workflow-Log-Einträge
|
||||
- Container-Ressourcennutzung (Gitea, Traefik)
|
||||
|
||||
**Ausgabe:**
|
||||
- JSON-Datei auf dem Server: `/home/deploy/monitoring/workflow-metrics/workflow_metrics_<timestamp>.json`
|
||||
- Konsolen-Zusammenfassung
|
||||
|
||||
### 3. Gitea Workflow (`.gitea/workflows/monitor-performance.yml`)
|
||||
|
||||
Automatisches Monitoring-Workflow, der alle 6 Stunden läuft.
|
||||
|
||||
**Manuelle Ausführung:**
|
||||
- Über Gitea UI: Actions → Monitor Workflow Performance → Run workflow
|
||||
- Optional: `lookback_hours` Parameter anpassen
|
||||
|
||||
**Ausgabe:**
|
||||
- Artifact mit kombinierten Metriken (30 Tage Retention)
|
||||
- Workflow-Logs mit Zusammenfassung
|
||||
|
||||
## Metriken-Format
|
||||
|
||||
### System-Metriken
|
||||
```json
|
||||
{
|
||||
"system_metrics": {
|
||||
"load_average": "0.5",
|
||||
"cpu_usage_percent": "15.2",
|
||||
"memory_usage": "2.1G/8.0G",
|
||||
"docker_containers": "12",
|
||||
"docker_disk_usage": "5.2GB",
|
||||
"gitea_data_size": "1.2G"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Gitea-Metriken
|
||||
```json
|
||||
{
|
||||
"gitea_metrics": {
|
||||
"runner_status": "running",
|
||||
"api_response_time_ms": 45,
|
||||
"workflow_log_entries_last_24h": 150,
|
||||
"container_stats": {
|
||||
"cpu_percent": "2.5%",
|
||||
"memory_usage": "512MiB / 2GiB",
|
||||
"memory_percent": "25.0%"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Workflow-Metriken
|
||||
```json
|
||||
{
|
||||
"workflow_metrics": {
|
||||
"build_image": {
|
||||
"average_duration_seconds": 420,
|
||||
"recent_runs": 20
|
||||
},
|
||||
"manual_deploy": {
|
||||
"average_duration_seconds": 180,
|
||||
"recent_runs": 10
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Optimierungen
|
||||
|
||||
Das Monitoring-System trackt folgende Optimierungen:
|
||||
|
||||
- ✅ **Repository Artifact Caching**: Repository wird als Artifact zwischen Jobs geteilt
|
||||
- ✅ **Helper Script Caching**: CI-Helper-Scripts werden als Artifact gecacht
|
||||
- ✅ **Combined Deployment Playbook**: Einzelnes Playbook für alle Deployment-Schritte
|
||||
- ✅ **Exponential Backoff Health Checks**: Intelligente Retry-Strategie
|
||||
- ✅ **Concurrency Groups**: Verhindert parallele Deployments
|
||||
|
||||
## Interpretation der Metriken
|
||||
|
||||
### Gute Werte
|
||||
- **Load Average**: < 1.0 (für Single-Core), < Anzahl Cores (für Multi-Core)
|
||||
- **Gitea API Response**: < 100ms
|
||||
- **Workflow Duration**: < 10 Minuten (Build), < 5 Minuten (Deploy)
|
||||
- **Memory Usage**: < 80% des verfügbaren Speichers
|
||||
|
||||
### Warnzeichen
|
||||
- **Load Average**: > 2.0 (kann auf Überlastung hinweisen)
|
||||
- **Gitea API Response**: > 500ms (kann auf Gitea-Überlastung hinweisen)
|
||||
- **Workflow Duration**: > 20 Minuten (kann auf Ineffizienzen hinweisen)
|
||||
- **Workflow Log Entries**: > 1000 pro Stunde (kann auf zu viele Workflows hinweisen)
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Keine Metriken gesammelt
|
||||
1. Prüfe Gitea API-Zugriff (Token, URL)
|
||||
2. Prüfe SSH-Zugriff auf Server (für Ansible Playbook)
|
||||
3. Prüfe ob Monitoring-Verzeichnis existiert
|
||||
|
||||
### Hohe System-Last
|
||||
1. Prüfe laufende Workflows
|
||||
2. Prüfe Gitea Runner-Status
|
||||
3. Prüfe Docker-Container-Ressourcennutzung
|
||||
4. Prüfe ob zu viele parallele Deployments laufen
|
||||
|
||||
### Langsame Workflows
|
||||
1. Prüfe ob Repository-Artifacts verwendet werden
|
||||
2. Prüfe ob Helper-Scripts gecacht werden
|
||||
3. Prüfe Docker Build Cache
|
||||
4. Prüfe Netzwerk-Latenz zu Registry
|
||||
|
||||
## Nächste Schritte
|
||||
|
||||
1. **Baseline etablieren**: Sammle Metriken über 1-2 Wochen
|
||||
2. **Trends analysieren**: Identifiziere langfristige Trends
|
||||
3. **Alerts einrichten**: Warnungen bei kritischen Werten
|
||||
4. **Weitere Optimierungen**: Basierend auf Metriken
|
||||
|
||||
## Weitere Ressourcen
|
||||
|
||||
- [Gitea Actions Documentation](https://docs.gitea.com/usage/actions)
|
||||
- [Ansible Best Practices](https://docs.ansible.com/ansible/latest/user_guide/playbooks_best_practices.html)
|
||||
- [Docker Monitoring](https://docs.docker.com/config/containers/logging/)
|
||||
|
||||
Reference in New Issue
Block a user