- Add comprehensive health check system with multiple endpoints - Add Prometheus metrics endpoint - Add production logging configurations (5 strategies) - Add complete deployment documentation suite: * QUICKSTART.md - 30-minute deployment guide * DEPLOYMENT_CHECKLIST.md - Printable verification checklist * DEPLOYMENT_WORKFLOW.md - Complete deployment lifecycle * PRODUCTION_DEPLOYMENT.md - Comprehensive technical reference * production-logging.md - Logging configuration guide * ANSIBLE_DEPLOYMENT.md - Infrastructure as Code automation * README.md - Navigation hub * DEPLOYMENT_SUMMARY.md - Executive summary - Add deployment scripts and automation - Add DEPLOYMENT_PLAN.md - Concrete plan for immediate deployment - Update README with production-ready features All production infrastructure is now complete and ready for deployment.
808 lines
20 KiB
Markdown
808 lines
20 KiB
Markdown
# ML Model Management System - Production Deployment Guide
|
|
|
|
Complete guide for deploying the ML Model Management System to production with all three integrated ML systems.
|
|
|
|
## Table of Contents
|
|
|
|
1. [System Overview](#system-overview)
|
|
2. [Prerequisites](#prerequisites)
|
|
3. [Configuration](#configuration)
|
|
4. [Deployment Steps](#deployment-steps)
|
|
5. [Monitoring Setup](#monitoring-setup)
|
|
6. [Troubleshooting](#troubleshooting)
|
|
7. [Maintenance](#maintenance)
|
|
|
|
## System Overview
|
|
|
|
The ML Model Management System provides centralized management for three ML systems:
|
|
|
|
1. **N+1 Detection** (`n1-detector` - v1.0.0)
|
|
- Detects N+1 query patterns in database operations
|
|
- Alert threshold: 85% accuracy
|
|
- Auto-tuning: confidence_threshold (0.5-0.9)
|
|
|
|
2. **WAF Behavioral Analysis** (`waf-behavioral` - v1.0.0)
|
|
- Detects malicious behavioral patterns in web requests
|
|
- Alert threshold: 90% accuracy (critical)
|
|
- Auto-tuning: anomaly_threshold (0.5-0.9)
|
|
|
|
3. **Queue Job Anomaly Detection** (`queue-anomaly` - v1.0.0)
|
|
- Detects anomalous job execution patterns
|
|
- Alert threshold: 80% accuracy
|
|
- Auto-tuning: anomaly_threshold (0.4-0.8)
|
|
|
|
**Core Components**:
|
|
- **ModelRegistry**: Cache-based model version and configuration storage
|
|
- **ModelPerformanceMonitor**: Real-time performance metrics tracking
|
|
- **AutoTuningEngine**: Automatic threshold optimization via grid search
|
|
- **AlertingService**: Log-based alerting for performance issues
|
|
- **MLMonitoringScheduler**: Periodic monitoring jobs
|
|
|
|
## Prerequisites
|
|
|
|
### System Requirements
|
|
|
|
- PHP 8.3+
|
|
- Redis (for cache-based model registry)
|
|
- Sufficient memory for model tracking (minimum 256MB PHP memory_limit)
|
|
|
|
### Framework Components
|
|
|
|
Ensure these framework components are initialized:
|
|
|
|
```php
|
|
// Required initializers:
|
|
- ModelManagementInitializer
|
|
- NPlusOneDetectionEngineInitializer
|
|
- MLEnhancedWafLayerInitializer
|
|
- JobAnomalyDetectionInitializer
|
|
- MLMonitoringSchedulerInitializer
|
|
```
|
|
|
|
### Environment Variables
|
|
|
|
```env
|
|
# Model Management
|
|
MODEL_REGISTRY_CACHE_TTL=604800 # 7 days
|
|
AUTO_TUNING_ENABLED=true
|
|
|
|
# N+1 Detection
|
|
NPLUSONE_ML_ENABLED=true
|
|
NPLUSONE_ML_AUTO_REGISTER=true
|
|
NPLUSONE_ML_TIMEOUT_MS=5000
|
|
NPLUSONE_ML_CONFIDENCE_THRESHOLD=60.0
|
|
|
|
# WAF Behavioral
|
|
WAF_ML_ENABLED=true
|
|
WAF_ML_AUTO_REGISTER=true
|
|
|
|
# Queue Anomaly
|
|
QUEUE_ML_ENABLED=true
|
|
QUEUE_ML_AUTO_REGISTER=true
|
|
JOB_ANOMALY_MIN_CONFIDENCE=0.6
|
|
JOB_ANOMALY_THRESHOLD=50
|
|
JOB_ANOMALY_ZSCORE_THRESHOLD=3.0
|
|
JOB_ANOMALY_IQR_MULTIPLIER=1.5
|
|
```
|
|
|
|
## Configuration
|
|
|
|
### 1. Model Registry Configuration
|
|
|
|
The model registry uses cache for storage with a default TTL of 7 days:
|
|
|
|
```php
|
|
// In ModelManagementInitializer
|
|
$cacheKey = CacheKey::fromString('ml_model_registry');
|
|
$ttl = Duration::fromDays($environment->getInt('MODEL_REGISTRY_CACHE_TTL', 7));
|
|
```
|
|
|
|
**Production Recommendations**:
|
|
- Use Redis for cache backend (better persistence than file cache)
|
|
- Set TTL to at least 7 days to maintain model history
|
|
- Configure Redis persistence (AOF or RDB) for data durability
|
|
|
|
### 2. Performance Monitor Configuration
|
|
|
|
Performance monitoring tracks predictions with sliding windows:
|
|
|
|
```php
|
|
// Window configurations
|
|
- Short-term: Last 100 predictions
|
|
- Medium-term: Last 1000 predictions
|
|
- Long-term: All predictions since deployment
|
|
```
|
|
|
|
**Memory Considerations**:
|
|
- Each prediction: ~200 bytes
|
|
- 1000 predictions: ~200KB
|
|
- Recommend periodic cleanup for long-running systems
|
|
|
|
### 3. Auto-Tuning Configuration
|
|
|
|
Auto-tuning uses grid search optimization:
|
|
|
|
```php
|
|
// N+1 Detector
|
|
thresholdRange: [0.5, 0.9]
|
|
step: 0.05
|
|
timeWindow: 1 hour
|
|
metricToOptimize: 'f1_score'
|
|
|
|
// WAF Behavioral
|
|
thresholdRange: [0.5, 0.9]
|
|
step: 0.05
|
|
timeWindow: 1 hour
|
|
metricToOptimize: 'f1_score'
|
|
|
|
// Queue Anomaly
|
|
thresholdRange: [0.4, 0.8]
|
|
step: 0.05
|
|
timeWindow: 1 hour
|
|
metricToOptimize: 'f1_score'
|
|
```
|
|
|
|
**Tuning Recommendations**:
|
|
- Auto-apply only if improvement > 5%
|
|
- Run hourly to balance responsiveness and stability
|
|
- Monitor threshold changes via logs
|
|
|
|
### 4. Monitoring Schedule Configuration
|
|
|
|
Four periodic jobs with different frequencies:
|
|
|
|
```php
|
|
// Performance Monitoring - Every 5 minutes
|
|
- Checks current accuracy against thresholds
|
|
- Sends alerts if accuracy drops below threshold
|
|
- Logs: 'ml-performance-monitoring'
|
|
|
|
// Degradation Detection - Every 15 minutes
|
|
- Compares current vs. baseline performance
|
|
- Detects 5%+ degradation
|
|
- Logs: 'ml-degradation-detection'
|
|
|
|
// Auto-Tuning - Every hour
|
|
- Optimizes thresholds via grid search
|
|
- Auto-applies if improvement > 5%
|
|
- Logs: 'ml-auto-tuning'
|
|
|
|
// Registry Cleanup - Daily
|
|
- Cleans up old performance data
|
|
- Maintains production model list
|
|
- Logs: 'ml-registry-cleanup'
|
|
```
|
|
|
|
## Deployment Steps
|
|
|
|
### Step 1: Pre-Deployment Verification
|
|
|
|
```bash
|
|
# 1. Verify all ML systems are functional
|
|
docker exec php php console.php ml:verify-systems
|
|
|
|
# 2. Run integration tests
|
|
docker exec php ./vendor/bin/pest tests/Integration/MachineLearning/
|
|
|
|
# 3. Check environment configuration
|
|
docker exec php php console.php config:check-ml
|
|
|
|
# 4. Verify cache backend is accessible
|
|
docker exec php php console.php cache:health
|
|
```
|
|
|
|
### Step 2: Deploy Core Components
|
|
|
|
```bash
|
|
# 1. Deploy application code (including ML components)
|
|
git pull origin main
|
|
|
|
# 2. Rebuild dependencies if needed
|
|
docker exec php composer install --no-dev --optimize-autoloader
|
|
|
|
# 3. Clear and warm up caches
|
|
docker exec php php console.php cache:clear
|
|
docker exec php php console.php cache:warmup
|
|
```
|
|
|
|
### Step 3: Initialize Model Registry
|
|
|
|
```bash
|
|
# Register all three models on first deployment
|
|
docker exec php php console.php ml:register-all-models
|
|
|
|
# Expected output:
|
|
# - N+1 Detector (n1-detector v1.0.0) registered
|
|
# - WAF Behavioral (waf-behavioral v1.0.0) registered
|
|
# - Queue Anomaly (queue-anomaly v1.0.0) registered
|
|
```
|
|
|
|
### Step 4: Start Monitoring Scheduler
|
|
|
|
The monitoring scheduler is automatically initialized during framework startup.
|
|
|
|
Verify it's running:
|
|
|
|
```bash
|
|
# Check scheduled tasks
|
|
docker exec php php console.php scheduler:status
|
|
|
|
# Expected tasks:
|
|
# - ml-performance-monitoring (next run: +5 min)
|
|
# - ml-degradation-detection (next run: +15 min)
|
|
# - ml-auto-tuning (next run: +1 hour)
|
|
# - ml-registry-cleanup (next run: +1 day)
|
|
```
|
|
|
|
### Step 5: Deploy to Production
|
|
|
|
```bash
|
|
# 1. Deploy all models to production environment
|
|
docker exec php php console.php ml:deploy-to-production --all
|
|
|
|
# 2. Verify production deployment
|
|
docker exec php php console.php ml:list-production-models
|
|
|
|
# Expected output:
|
|
# - n1-detector v1.0.0 (deployed: YYYY-MM-DD HH:MM:SS)
|
|
# - waf-behavioral v1.0.0 (deployed: YYYY-MM-DD HH:MM:SS)
|
|
# - queue-anomaly v1.0.0 (deployed: YYYY-MM-DD HH:MM:SS)
|
|
```
|
|
|
|
### Step 6: Verify Monitoring
|
|
|
|
```bash
|
|
# Wait 5 minutes for first monitoring run, then check logs
|
|
docker exec php tail -f storage/logs/ml-monitoring.log
|
|
|
|
# Expected log entries:
|
|
# [INFO] ML monitoring scheduler initialized (jobs_scheduled: 4, models_monitored: 3)
|
|
# [INFO] Performance monitoring executed successfully
|
|
# [INFO] N+1 detector status: monitored (accuracy: 0.XX)
|
|
# [INFO] WAF behavioral status: monitored (accuracy: 0.XX)
|
|
# [INFO] Queue anomaly status: monitored (accuracy: 0.XX)
|
|
```
|
|
|
|
## Monitoring Setup
|
|
|
|
### 1. Log Monitoring
|
|
|
|
All ML operations are logged to:
|
|
- Main log: `storage/logs/application.log`
|
|
- ML-specific: `storage/logs/ml-monitoring.log` (if configured)
|
|
|
|
**Key log patterns to monitor**:
|
|
|
|
```bash
|
|
# Performance warnings
|
|
grep "Performance Warning" storage/logs/*.log
|
|
|
|
# Degradation alerts
|
|
grep "Performance Degraded" storage/logs/*.log
|
|
|
|
# Auto-tuning applications
|
|
grep "auto-tuned" storage/logs/*.log
|
|
|
|
# Monitoring failures
|
|
grep "monitoring failed" storage/logs/*.log
|
|
```
|
|
|
|
### 2. Metrics Dashboard
|
|
|
|
Create a dashboard to track:
|
|
|
|
**N+1 Detector Metrics**:
|
|
```
|
|
- Accuracy (target: >85%)
|
|
- Total predictions per day
|
|
- False positive rate
|
|
- Average confidence score
|
|
```
|
|
|
|
**WAF Behavioral Metrics**:
|
|
```
|
|
- Accuracy (target: >90%)
|
|
- Total requests analyzed
|
|
- Blocked requests ratio
|
|
- Detection latency
|
|
```
|
|
|
|
**Queue Anomaly Metrics**:
|
|
```
|
|
- Accuracy (target: >80%)
|
|
- Total jobs analyzed
|
|
- Anomaly detection rate
|
|
- Average processing time
|
|
```
|
|
|
|
**System Metrics**:
|
|
```
|
|
- Model registry cache hit rate
|
|
- Auto-tuning improvement rate
|
|
- Alert frequency
|
|
- Scheduler job success rate
|
|
```
|
|
|
|
### 3. Alerting Rules
|
|
|
|
Configure alerts for:
|
|
|
|
```yaml
|
|
# Critical Alerts (immediate action)
|
|
- WAF accuracy < 90%
|
|
- WAF performance degraded > 5%
|
|
- Scheduler job failures > 3 consecutive
|
|
|
|
# Warning Alerts (investigate within 1 hour)
|
|
- N+1 accuracy < 85%
|
|
- Queue anomaly accuracy < 80%
|
|
- Auto-tuning improvements < 1% (indicates stagnation)
|
|
- Model registry cache misses > 10%
|
|
|
|
# Info Alerts (review during business hours)
|
|
- Auto-tuning applied successfully
|
|
- Model configuration updated
|
|
- Registry cleanup completed
|
|
```
|
|
|
|
### 4. Performance Monitoring Commands
|
|
|
|
```bash
|
|
# Get current performance metrics for all models
|
|
docker exec php php console.php ml:performance-summary
|
|
|
|
# Check for degradation
|
|
docker exec php php console.php ml:check-degradation
|
|
|
|
# View auto-tuning history
|
|
docker exec php php console.php ml:auto-tuning-history
|
|
|
|
# Get model configuration
|
|
docker exec php php console.php ml:show-config n1-detector
|
|
docker exec php php console.php ml:show-config waf-behavioral
|
|
docker exec php php console.php ml:show-config queue-anomaly
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
### Issue: Model Not Registered
|
|
|
|
**Symptoms**:
|
|
```
|
|
RuntimeException: Model not registered. Call registerCurrentModel() first.
|
|
```
|
|
|
|
**Solution**:
|
|
```bash
|
|
# Check if model exists
|
|
docker exec php php console.php ml:list-models
|
|
|
|
# Register missing model
|
|
docker exec php php console.php ml:register-model n1-detector
|
|
docker exec php php console.php ml:register-model waf-behavioral
|
|
docker exec php php console.php ml:register-model queue-anomaly
|
|
|
|
# Or register all at once
|
|
docker exec php php console.php ml:register-all-models
|
|
```
|
|
|
|
### Issue: Low Accuracy Alerts
|
|
|
|
**Symptoms**:
|
|
```
|
|
[WARNING] N+1 Detector Performance Warning: Accuracy dropped to 78.50%
|
|
```
|
|
|
|
**Investigation Steps**:
|
|
|
|
1. **Check recent prediction patterns**:
|
|
```bash
|
|
docker exec php php console.php ml:performance-details n1-detector --recent 100
|
|
```
|
|
|
|
2. **Analyze false positives/negatives**:
|
|
```bash
|
|
docker exec php php console.php ml:analyze-errors n1-detector
|
|
```
|
|
|
|
3. **Review ground truth data**:
|
|
```bash
|
|
docker exec php php console.php ml:validate-ground-truth n1-detector
|
|
```
|
|
|
|
4. **Possible causes**:
|
|
- Data drift (application behavior changed)
|
|
- Incorrect ground truth labels
|
|
- Model needs retraining
|
|
- Configuration drift (thresholds changed)
|
|
|
|
**Resolution**:
|
|
- If data drift: Consider retraining model
|
|
- If ground truth issues: Update labeling process
|
|
- If config drift: Revert to known-good configuration
|
|
- If temporary: Wait for auto-tuning to adapt
|
|
|
|
### Issue: Auto-Tuning Not Improving
|
|
|
|
**Symptoms**:
|
|
```
|
|
[INFO] Auto-tuning completed (improvement_percent: 0.5%)
|
|
```
|
|
|
|
**Investigation**:
|
|
|
|
1. **Check optimization history**:
|
|
```bash
|
|
docker exec php php console.php ml:auto-tuning-history n1-detector --limit 10
|
|
```
|
|
|
|
2. **Verify enough data for optimization**:
|
|
```bash
|
|
docker exec php php console.php ml:data-sufficiency n1-detector
|
|
```
|
|
|
|
3. **Possible causes**:
|
|
- Insufficient prediction data (<100 predictions)
|
|
- Already at optimal threshold
|
|
- Metric plateaued (reached model capacity)
|
|
- Time window too short
|
|
|
|
**Resolution**:
|
|
- Wait for more prediction data
|
|
- Consider different optimization metric (precision, recall)
|
|
- Adjust threshold range or step size
|
|
- Increase time window for optimization
|
|
|
|
### Issue: Performance Degradation Detected
|
|
|
|
**Symptoms**:
|
|
```
|
|
[CRITICAL] WAF Behavioral Performance Degraded (degradation_percent: 8.5%)
|
|
```
|
|
|
|
**Immediate Actions**:
|
|
|
|
1. **Verify it's not a false alarm**:
|
|
```bash
|
|
docker exec php php console.php ml:verify-degradation waf-behavioral
|
|
```
|
|
|
|
2. **Check for system changes**:
|
|
```bash
|
|
# Review recent deployments
|
|
git log --oneline --since="1 week ago"
|
|
|
|
# Check configuration changes
|
|
docker exec php php console.php config:diff --since="1 week ago"
|
|
```
|
|
|
|
3. **Rollback if necessary**:
|
|
```bash
|
|
# Revert to previous model configuration
|
|
docker exec php php console.php ml:rollback-config waf-behavioral
|
|
|
|
# Or revert to previous model version
|
|
docker exec php php console.php ml:rollback-version waf-behavioral
|
|
```
|
|
|
|
4. **Monitor recovery**:
|
|
```bash
|
|
# Watch real-time performance
|
|
docker exec php watch -n 60 "php console.php ml:performance-summary"
|
|
```
|
|
|
|
### Issue: Monitoring Jobs Not Running
|
|
|
|
**Symptoms**:
|
|
```
|
|
No monitoring logs in the last 5 minutes
|
|
```
|
|
|
|
**Investigation**:
|
|
|
|
1. **Check scheduler status**:
|
|
```bash
|
|
docker exec php php console.php scheduler:status
|
|
```
|
|
|
|
2. **Verify scheduler is running**:
|
|
```bash
|
|
# Check if scheduler daemon is active
|
|
docker exec php ps aux | grep scheduler
|
|
```
|
|
|
|
3. **Check for errors**:
|
|
```bash
|
|
docker exec php grep "scheduler" storage/logs/application.log | tail -20
|
|
```
|
|
|
|
**Resolution**:
|
|
```bash
|
|
# Restart scheduler daemon
|
|
docker exec php php console.php scheduler:restart
|
|
|
|
# Or restart entire application
|
|
docker-compose restart php
|
|
|
|
# Verify monitoring resumes
|
|
docker exec php tail -f storage/logs/ml-monitoring.log
|
|
```
|
|
|
|
### Issue: Cache Miss Rate High
|
|
|
|
**Symptoms**:
|
|
```
|
|
Model registry cache miss rate: 45% (expected: <10%)
|
|
```
|
|
|
|
**Possible Causes**:
|
|
- Cache TTL too short
|
|
- Cache eviction due to memory pressure
|
|
- Cache backend not persistent
|
|
|
|
**Resolution**:
|
|
|
|
1. **Increase cache TTL**:
|
|
```env
|
|
# .env
|
|
MODEL_REGISTRY_CACHE_TTL=1209600 # 14 days
|
|
```
|
|
|
|
2. **Configure Redis persistence**:
|
|
```redis
|
|
# redis.conf
|
|
appendonly yes
|
|
appendfsync everysec
|
|
```
|
|
|
|
3. **Monitor Redis memory**:
|
|
```bash
|
|
docker exec redis redis-cli INFO memory
|
|
```
|
|
|
|
## Maintenance
|
|
|
|
### Daily Tasks
|
|
|
|
1. **Review performance metrics**:
|
|
```bash
|
|
docker exec php php console.php ml:daily-summary
|
|
```
|
|
|
|
2. **Check alert log**:
|
|
```bash
|
|
grep -E "(WARNING|CRITICAL)" storage/logs/ml-monitoring.log | tail -20
|
|
```
|
|
|
|
3. **Verify scheduler health**:
|
|
```bash
|
|
docker exec php php console.php scheduler:health
|
|
```
|
|
|
|
### Weekly Tasks
|
|
|
|
1. **Review auto-tuning effectiveness**:
|
|
```bash
|
|
docker exec php php console.php ml:auto-tuning-report --last-week
|
|
```
|
|
|
|
2. **Analyze degradation incidents**:
|
|
```bash
|
|
docker exec php php console.php ml:degradation-report --last-week
|
|
```
|
|
|
|
3. **Check model configuration drift**:
|
|
```bash
|
|
docker exec php php console.php ml:config-drift-report
|
|
```
|
|
|
|
### Monthly Tasks
|
|
|
|
1. **Performance trend analysis**:
|
|
```bash
|
|
docker exec php php console.php ml:performance-trends --last-month
|
|
```
|
|
|
|
2. **Evaluate model improvement opportunities**:
|
|
```bash
|
|
docker exec php php console.php ml:improvement-recommendations
|
|
```
|
|
|
|
3. **Review and update thresholds**:
|
|
```bash
|
|
# Review current thresholds
|
|
docker exec php php console.php ml:show-thresholds
|
|
|
|
# Update if needed
|
|
docker exec php php console.php ml:update-threshold n1-detector --accuracy 0.88
|
|
```
|
|
|
|
4. **Clean up old performance data**:
|
|
```bash
|
|
docker exec php php console.php ml:cleanup-old-data --older-than 90days
|
|
```
|
|
|
|
### Quarterly Tasks
|
|
|
|
1. **Model retraining evaluation**:
|
|
- Review accumulated prediction data
|
|
- Evaluate if retraining is beneficial
|
|
- Plan retraining if significant drift detected
|
|
|
|
2. **System capacity planning**:
|
|
- Review prediction volume trends
|
|
- Assess cache and memory requirements
|
|
- Plan for scaling if needed
|
|
|
|
3. **Alert threshold tuning**:
|
|
- Review alert frequency and relevance
|
|
- Adjust thresholds based on operational experience
|
|
- Update escalation procedures
|
|
|
|
### Model Version Upgrades
|
|
|
|
When deploying a new model version:
|
|
|
|
```bash
|
|
# 1. Register new version
|
|
docker exec php php console.php ml:register-model n1-detector --version 1.1.0
|
|
|
|
# 2. Deploy to staging first
|
|
docker exec php php console.php ml:deploy n1-detector --version 1.1.0 --env staging
|
|
|
|
# 3. Monitor staging performance
|
|
docker exec php php console.php ml:compare-versions n1-detector 1.0.0 1.1.0
|
|
|
|
# 4. If successful, deploy to production
|
|
docker exec php php console.php ml:deploy n1-detector --version 1.1.0 --env production
|
|
|
|
# 5. Monitor production rollout
|
|
docker exec php watch -n 60 "php console.php ml:deployment-status n1-detector"
|
|
|
|
# 6. Rollback if issues detected
|
|
docker exec php php console.php ml:rollback n1-detector --to-version 1.0.0
|
|
```
|
|
|
|
## Performance Baselines
|
|
|
|
Expected performance metrics after deployment:
|
|
|
|
### N+1 Detector
|
|
```
|
|
Accuracy: 88-92%
|
|
Precision: 85-90%
|
|
Recall: 85-90%
|
|
F1-Score: 86-91%
|
|
Total Predictions: 1000+ per day
|
|
False Positive Rate: <10%
|
|
Detection Latency: <50ms
|
|
```
|
|
|
|
### WAF Behavioral
|
|
```
|
|
Accuracy: 92-96%
|
|
Precision: 90-95%
|
|
Recall: 90-95%
|
|
F1-Score: 91-95%
|
|
Total Predictions: 10,000+ per day
|
|
False Positive Rate: <5%
|
|
Detection Latency: <100ms
|
|
```
|
|
|
|
### Queue Anomaly
|
|
```
|
|
Accuracy: 82-88%
|
|
Precision: 80-85%
|
|
Recall: 80-85%
|
|
F1-Score: 81-86%
|
|
Total Predictions: 500+ per day
|
|
False Positive Rate: <15%
|
|
Detection Latency: <30ms
|
|
```
|
|
|
|
### System Performance
|
|
```
|
|
Model Registry Cache Hit Rate: >90%
|
|
Scheduler Job Success Rate: >99%
|
|
Auto-Tuning Application Rate: 20-30% (1-2 times per week)
|
|
Alert Rate: <5 alerts per day (excluding info alerts)
|
|
Memory Usage: <200MB for model tracking
|
|
```
|
|
|
|
## Security Considerations
|
|
|
|
1. **Model Configuration Access**:
|
|
- Restrict access to model configuration updates
|
|
- Log all configuration changes with user attribution
|
|
- Implement approval workflow for production changes
|
|
|
|
2. **Performance Data Privacy**:
|
|
- Anonymize sensitive data in tracking
|
|
- Implement data retention policies
|
|
- Ensure GDPR compliance for tracked predictions
|
|
|
|
3. **Alert Notification Security**:
|
|
- Use secure channels for critical alerts
|
|
- Implement alert authentication
|
|
- Rate-limit alert notifications
|
|
|
|
## Support and Escalation
|
|
|
|
### L1 Support (Monitoring Team)
|
|
- Monitor dashboard and alerts
|
|
- Execute basic troubleshooting commands
|
|
- Escalate to L2 for unresolved issues
|
|
|
|
### L2 Support (ML Team)
|
|
- Investigate performance degradation
|
|
- Adjust thresholds and configurations
|
|
- Plan model retraining
|
|
- Escalate to L3 for system issues
|
|
|
|
### L3 Support (Core Team)
|
|
- System architecture issues
|
|
- Framework integration problems
|
|
- Performance optimization
|
|
- Long-term capacity planning
|
|
|
|
## Appendix
|
|
|
|
### Example Integration Code
|
|
|
|
See `/examples/n1-model-management-integration.php` for comprehensive integration examples.
|
|
|
|
### CLI Commands Reference
|
|
|
|
Complete list of ML management commands:
|
|
|
|
```bash
|
|
# Model Registry
|
|
ml:list-models # List all registered models
|
|
ml:register-model # Register a specific model
|
|
ml:register-all-models # Register all three models
|
|
ml:show-config # Show model configuration
|
|
ml:update-config # Update model configuration
|
|
ml:rollback-config # Rollback configuration
|
|
ml:list-production-models # List production-deployed models
|
|
|
|
# Performance Monitoring
|
|
ml:performance-summary # Current performance for all models
|
|
ml:performance-details # Detailed performance for specific model
|
|
ml:check-degradation # Check for performance degradation
|
|
ml:performance-trends # Historical performance trends
|
|
ml:daily-summary # Daily performance summary
|
|
ml:compare-versions # Compare performance between versions
|
|
|
|
# Auto-Tuning
|
|
ml:auto-tuning-history # View auto-tuning history
|
|
ml:auto-tuning-report # Generate auto-tuning report
|
|
ml:show-thresholds # Show current thresholds
|
|
ml:update-threshold # Manually update threshold
|
|
|
|
# Deployment
|
|
ml:deploy # Deploy model to environment
|
|
ml:deploy-to-production # Deploy to production
|
|
ml:rollback # Rollback to previous version
|
|
ml:deployment-status # Check deployment status
|
|
|
|
# Monitoring
|
|
ml:verify-systems # Verify all ML systems functional
|
|
ml:scheduler-status # Check scheduler status
|
|
ml:scheduler-health # Scheduler health check
|
|
ml:analyze-errors # Analyze prediction errors
|
|
ml:validate-ground-truth # Validate ground truth data
|
|
|
|
# Maintenance
|
|
ml:cleanup-old-data # Clean up old performance data
|
|
ml:data-sufficiency # Check if enough data for optimization
|
|
ml:improvement-recommendations # Get improvement recommendations
|
|
ml:config-drift-report # Check for configuration drift
|
|
```
|
|
|
|
### Additional Resources
|
|
|
|
- [ML Model Management Architecture](/docs/ml-model-management-architecture.md)
|
|
- [Integration Examples](/examples/n1-model-management-integration.php)
|
|
- [Integration Tests](/tests/Integration/MachineLearning/MLModelManagementIntegrationTest.php)
|
|
- [Framework Documentation](/docs/claude/)
|