Files
michaelschiemer/docs/ml-model-management.md

24 KiB

ML Model Management System

Umfassendes System für ML-Model-Verwaltung, Versionierung, A/B-Testing, Performance-Monitoring und Auto-Tuning.

Übersicht

Das ML Model Management System bietet eine vollständige Lösung für die Verwaltung von Machine Learning Modellen in Production:

  • Model Registry: Zentralisierte Versionsverwaltung für alle ML-Modelle
  • A/B Testing: Traffic-Splitting und statistische Vergleiche zwischen Modell-Versionen
  • Performance Monitoring: Real-Time Accuracy Tracking und Drift Detection
  • Auto-Tuning: Automatische Threshold- und Hyperparameter-Optimierung

Architektur

┌─────────────────────┐
│   Model Registry    │  ← Zentrale Modell-Verwaltung
└─────────────────────┘
         │
         ├─── ModelMetadata (Value Object)
         ├─── CacheModelRegistry (Production)
         └─── InMemoryModelRegistry (Testing)

┌─────────────────────┐
│  A/B Testing        │  ← Traffic Splitting & Vergleiche
└─────────────────────┘
         │
         ├─── ABTestConfig (Value Object)
         ├─── ABTestResult (Value Object)
         └─── ABTestingService

┌─────────────────────┐
│ Performance Monitor │  ← Real-Time Tracking
└─────────────────────┘
         │
         ├─── ModelPerformanceMonitor
         ├─── PerformanceStorage
         └─── AlertingService

┌─────────────────────┐
│  Auto-Tuning        │  ← Automatische Optimierung
└─────────────────────┘
         │
         └─── AutoTuningEngine

Komponenten

1. Model Registry

Zentrale Verwaltung für ML-Modell-Metadaten mit Versionierung.

Core Value Objects:

ModelMetadata

final readonly class ModelMetadata
{
    public function __construct(
        public string $modelName,
        public ModelType $modelType,
        public Version $version,  // Framework Core Version
        public array $configuration = [],
        public array $performanceMetrics = [],
        public Timestamp $createdAt = new Timestamp(),
        public ?Timestamp $deployedAt = null,
        public ?string $environment = null,
        public array $metadata = []
    )
}

Factory Methods:

// Für N+1 Detection
$metadata = ModelMetadata::forN1Detector(
    version: Version::fromString('1.0.0'),
    configuration: ['threshold' => 0.7]
);

// Für WAF Behavioral Analysis
$metadata = ModelMetadata::forWafBehavioral(
    version: Version::fromString('2.0.0'),
    configuration: ['window_size' => 100]
);

// Für Queue Job Anomaly Detection
$metadata = ModelMetadata::forQueueAnomaly(
    version: Version::fromString('1.2.0'),
    configuration: ['min_cluster_size' => 3]
);

ModelType

enum ModelType: string
{
    case SUPERVISED = 'supervised';
    case UNSUPERVISED = 'unsupervised';
    case SEMI_SUPERVISED = 'semi_supervised';
    case REINFORCEMENT = 'reinforcement';
}

Registry Interface:

interface ModelRegistry
{
    // CRUD Operations
    public function register(ModelMetadata $metadata): void;
    public function get(string $modelName, Version $version): ?ModelMetadata;
    public function getLatest(string $modelName): ?ModelMetadata;
    public function update(ModelMetadata $metadata): void;
    public function delete(string $modelName, Version $version): bool;

    // Querying
    public function getAll(string $modelName): array;
    public function getByType(ModelType $type): array;
    public function getByEnvironment(string $environment): array;
    public function getProductionModels(): array;

    // Utilities
    public function exists(string $modelName, Version $version): bool;
    public function getAllModelNames(): array;
    public function getVersionCount(string $modelName): int;
}

Verwendung:

use App\Framework\MachineLearning\ModelManagement\ModelRegistry;
use App\Framework\MachineLearning\ModelManagement\ValueObjects\ModelMetadata;
use App\Framework\Core\ValueObjects\Version;

// 1. Modell registrieren
$metadata = ModelMetadata::forN1Detector(
    version: Version::fromString('1.0.0'),
    configuration: ['threshold' => 0.7, 'window_size' => 100]
);

$metadata = $metadata->withPerformanceMetrics([
    'accuracy' => 0.92,
    'precision' => 0.89,
    'recall' => 0.88,
    'f1_score' => 0.885,
]);

$registry->register($metadata);

// 2. Modell abrufen
$model = $registry->get('n1-detector', Version::fromString('1.0.0'));

// 3. Alle Versionen auflisten
$versions = $registry->getAll('n1-detector');

// 4. Production-Modelle
$productionModels = $registry->getProductionModels();

2. A/B Testing Service

Traffic-Splitting und statistische Vergleiche zwischen Modell-Versionen.

Value Objects:

ABTestConfig

final readonly class ABTestConfig
{
    public function __construct(
        public string $modelName,
        public Version $versionA,       // Control (Baseline)
        public Version $versionB,       // Treatment (New Version)
        public float $trafficSplitA = 0.5,  // 0.0-1.0
        public string $primaryMetric = 'accuracy',
        public float $minimumImprovement = 0.05,
        public float $significanceLevel = 0.05
    )
}

Factory Methods:

// Standard 50/50 split
$config = ABTestConfig::create(
    modelName: 'n1-detector',
    versionA: Version::fromString('1.0.0'),
    versionB: Version::fromString('1.1.0'),
    trafficSplit: 0.5
);

// Gradual rollout (10% to new version)
$config = ABTestConfig::forGradualRollout(
    modelName: 'n1-detector',
    currentVersion: Version::fromString('1.0.0'),
    newVersion: Version::fromString('1.1.0')
);

// Champion/Challenger (80/20 split)
$config = ABTestConfig::forChallenger(
    modelName: 'n1-detector',
    champion: Version::fromString('1.0.0'),
    challenger: Version::fromString('1.1.0')
);

ABTestResult

final readonly class ABTestResult
{
    public function __construct(
        public ABTestConfig $config,
        public ModelMetadata $metadataA,
        public ModelMetadata $metadataB,
        public array $metricsDifference,
        public string $winner,  // 'A', 'B', or 'tie'
        public bool $isStatisticallySignificant,
        public string $recommendation
    )
}

Service Methods:

// 1. Traffic Routing
$selectedVersion = $abTesting->selectVersion($config);

// 2. Model Comparison
$result = $abTesting->compareModels($config, $metadataA, $metadataB);

// 3. Automated Test
$result = $abTesting->runTest($config);

// 4. Gradual Rollout Plan
$plan = $abTesting->generateRolloutPlan(steps: 5);
// Returns: [1 => 0.2, 2 => 0.4, 3 => 0.6, 4 => 0.8, 5 => 1.0]

// 5. Required Sample Size
$sampleSize = $abTesting->calculateRequiredSampleSize(
    confidenceLevel: 0.95,
    marginOfError: 0.05
);

Verwendung:

use App\Framework\MachineLearning\ModelManagement\ABTestingService;
use App\Framework\MachineLearning\ModelManagement\ValueObjects\ABTestConfig;

// 1. Test konfigurieren
$config = ABTestConfig::create(
    modelName: 'n1-detector',
    versionA: Version::fromString('1.0.0'),
    versionB: Version::fromString('1.1.0'),
    trafficSplit: 0.5
);

// 2. Test durchführen
$result = $abTesting->runTest($config);

// 3. Ergebnisse analysieren
if ($result->shouldDeployVersionB()) {
    echo "Deploy Version B - {$result->recommendation}";
    echo "Improvement: {$result->getPrimaryMetricImprovementPercent()}%";

    // Gradual Rollout
    $plan = $abTesting->generateRolloutPlan(5);
    foreach ($plan as $step => $traffic) {
        // Deploy step-by-step
    }
}

// 4. Metriken-Zusammenfassung
$summary = $result->getMetricsSummary();
print_r($summary);

3. Performance Monitor

Real-Time Accuracy Tracking, Drift Detection und Alerting.

Core Features:

  • Real-time prediction tracking
  • Performance degradation detection
  • Concept drift detection
  • Confusion matrix calculation
  • Performance trend analysis
  • Multi-version comparison

Service Methods:

// 1. Track Prediction
$performanceMonitor->trackPrediction(
    modelName: 'n1-detector',
    version: Version::fromString('1.0.0'),
    prediction: true,
    actual: true,
    confidence: 0.95,
    features: ['query_count' => 5, 'pattern' => 'SELECT']
);

// 2. Current Metrics
$metrics = $performanceMonitor->getCurrentMetrics(
    'n1-detector',
    Version::fromString('1.0.0'),
    timeWindow: Duration::fromHours(24)
);
// Returns: accuracy, precision, recall, f1_score, confusion_matrix, etc.

// 3. Performance Degradation Check
$degradationInfo = $performanceMonitor->getPerformanceDegradationInfo(
    'n1-detector',
    Version::fromString('1.0.0'),
    thresholdPercent: 0.05  // 5% degradation threshold
);

if ($degradationInfo['has_degraded']) {
    // Alert and take action
}

// 4. Concept Drift Detection
$hasDrift = $performanceMonitor->detectConceptDrift(
    'n1-detector',
    Version::fromString('1.0.0'),
    timeWindow: Duration::fromHours(24)
);

// 5. Performance Trend
$trend = $performanceMonitor->getPerformanceTrend(
    'n1-detector',
    Version::fromString('1.0.0'),
    timeWindow: Duration::fromDays(7),
    interval: Duration::fromHours(1)
);

// 6. Version Comparison
$comparison = $performanceMonitor->compareVersions(
    'n1-detector',
    [Version::fromString('1.0.0'), Version::fromString('1.1.0')],
    timeWindow: Duration::fromHours(24)
);

Verwendung:

use App\Framework\MachineLearning\ModelManagement\ModelPerformanceMonitor;
use App\Framework\Core\ValueObjects\Version;
use App\Framework\Core\ValueObjects\Duration;

// 1. Predictions tracken (nach jeder Vorhersage)
$performanceMonitor->trackPrediction(
    modelName: 'n1-detector',
    version: $currentVersion,
    prediction: $modelPrediction,
    actual: $groundTruth,  // Wenn bekannt
    confidence: $confidenceScore
);

// 2. Performance überwachen (Scheduler, alle 5 Minuten)
$metrics = $performanceMonitor->getCurrentMetrics(
    'n1-detector',
    $currentVersion,
    Duration::fromHours(1)
);

if ($metrics['accuracy'] < 0.85) {
    // Alert: Accuracy drop detected
}

// 3. Degradation Check (Scheduler, stündlich)
$degradation = $performanceMonitor->getPerformanceDegradationInfo(
    'n1-detector',
    $currentVersion
);

if ($degradation['has_degraded']) {
    // Automatic alert sent via AlertingService
    // Consider rollback or retraining
}

// 4. Drift Detection (Scheduler, täglich)
if ($performanceMonitor->detectConceptDrift('n1-detector', $currentVersion)) {
    // Schedule model retraining
}

4. Auto-Tuning Engine

Automatische Threshold- und Hyperparameter-Optimierung.

Core Features:

  • Threshold optimization (Grid Search)
  • Hyperparameter tuning
  • Precision-recall trade-off optimization
  • Adaptive threshold adjustment
  • Performance-cost trade-off

Service Methods:

// 1. Threshold Optimization
$result = $autoTuning->optimizeThreshold(
    modelName: 'n1-detector',
    version: Version::fromString('1.0.0'),
    metricToOptimize: 'f1_score',
    thresholdRange: [0.5, 0.9],
    step: 0.05,
    timeWindow: Duration::fromHours(24)
);
// Returns: optimal_threshold, improvement_percent, recommendation

// 2. Hyperparameter Optimization
$result = $autoTuning->optimizeHyperparameters(
    modelName: 'n1-detector',
    version: Version::fromString('1.0.0'),
    parameterRanges: [
        'window_size' => [50, 150, 25],
        'min_cluster_size' => [2, 5, 1],
    ],
    metricToOptimize: 'f1_score'
);

// 3. Precision-Recall Trade-off
$result = $autoTuning->optimizePrecisionRecallTradeoff(
    modelName: 'n1-detector',
    version: Version::fromString('1.0.0'),
    targetPrecision: 0.95,  // 95% precision target
    thresholdRange: [0.5, 0.99]
);

// 4. Adaptive Adjustment
$result = $autoTuning->adaptiveThresholdAdjustment(
    'n1-detector',
    Version::fromString('1.0.0')
);
// Automatically adjusts based on FP/FN rates

Verwendung:

use App\Framework\MachineLearning\ModelManagement\AutoTuningEngine;

// 1. Threshold optimieren (wöchentlich via Scheduler)
$optimization = $autoTuning->optimizeThreshold(
    modelName: 'n1-detector',
    version: $currentVersion,
    metricToOptimize: 'f1_score',
    thresholdRange: [0.5, 0.9],
    step: 0.05
);

if ($optimization['improvement_percent'] > 5.0) {
    // Apply optimized threshold
    $updatedConfig = array_merge(
        $currentConfig,
        ['threshold' => $optimization['optimal_threshold']]
    );

    $updatedMetadata = $metadata->withConfiguration($updatedConfig);
    $registry->update($updatedMetadata);
}

// 2. Adaptive Adjustment (täglich via Scheduler)
$adaptive = $autoTuning->adaptiveThresholdAdjustment(
    'n1-detector',
    $currentVersion
);

if ($adaptive['recommended_threshold'] !== $adaptive['current_threshold']) {
    // Apply adaptive adjustment
}

// 3. Precision-Recall Optimization (on-demand)
$tradeoff = $autoTuning->optimizePrecisionRecallTradeoff(
    'n1-detector',
    $currentVersion,
    targetPrecision: 0.95
);

// Apply if precision target met with acceptable recall
if ($tradeoff['achieved_precision'] >= 0.95
    && $tradeoff['achieved_recall'] >= 0.80) {
    // Apply optimized threshold
}

DI Container Integration

use App\Framework\MachineLearning\ModelManagement\MLModelManagementInitializer;

// Automatically registered via Initializer attribute
#[Initializer]
final readonly class MLModelManagementInitializer
{
    public function initialize(): void
    {
        // ModelRegistry (Singleton)
        $this->container->singleton(
            ModelRegistry::class,
            fn(Container $c) => new CacheModelRegistry($c->get(Cache::class))
        );

        // ABTestingService
        $this->container->bind(ABTestingService::class, ...);

        // ModelPerformanceMonitor
        $this->container->bind(ModelPerformanceMonitor::class, ...);

        // AutoTuningEngine
        $this->container->bind(AutoTuningEngine::class, ...);
    }
}

Complete Workflow Example

// ============================================================================
// Step 1: Register Model Versions
// ============================================================================

$v1 = ModelMetadata::forN1Detector(
    version: Version::fromString('1.0.0'),
    configuration: ['threshold' => 0.7]
)->withPerformanceMetrics([
    'accuracy' => 0.92,
    'f1_score' => 0.885,
]);

$registry->register($v1);

$v2 = ModelMetadata::forN1Detector(
    version: Version::fromString('1.1.0'),
    configuration: ['threshold' => 0.75]
)->withPerformanceMetrics([
    'accuracy' => 0.95,
    'f1_score' => 0.92,
]);

$registry->register($v2);

// ============================================================================
// Step 2: A/B Test New Version
// ============================================================================

$config = ABTestConfig::create(
    modelName: 'n1-detector',
    versionA: Version::fromString('1.0.0'),
    versionB: Version::fromString('1.1.0'),
    trafficSplit: 0.5
);

$abResult = $abTesting->runTest($config);

if ($abResult->shouldDeployVersionB()) {
    // ========================================================================
    // Step 3: Gradual Rollout
    // ========================================================================

    $plan = $abTesting->generateRolloutPlan(steps: 5);

    foreach ($plan as $step => $trafficToB) {
        // Update traffic split
        $stepConfig = new ABTestConfig(
            modelName: 'n1-detector',
            versionA: Version::fromString('1.0.0'),
            versionB: Version::fromString('1.1.0'),
            trafficSplitA: 1.0 - $trafficToB
        );

        // Wait and monitor (e.g., 1 hour per step)
        sleep(3600);

        // Check metrics
        $metrics = $performanceMonitor->getCurrentMetrics(
            'n1-detector',
            Version::fromString('1.1.0'),
            Duration::fromHours(1)
        );

        if ($metrics['accuracy'] < 0.90) {
            // Rollback!
            break;
        }
    }

    // ========================================================================
    // Step 4: Full Deployment
    // ========================================================================

    $deployed = $v2->withDeployment(
        environment: 'production',
        deployedAt: Timestamp::now()
    );

    $registry->update($deployed);
}

// ============================================================================
// Step 5: Continuous Monitoring
// ============================================================================

// Scheduler Job: Every 5 minutes
$metrics = $performanceMonitor->getCurrentMetrics(
    'n1-detector',
    Version::fromString('1.1.0'),
    Duration::fromHours(1)
);

// Scheduler Job: Every hour
$degradation = $performanceMonitor->getPerformanceDegradationInfo(
    'n1-detector',
    Version::fromString('1.1.0')
);

if ($degradation['has_degraded']) {
    // Automatic alert sent
    // Consider rollback or retraining
}

// ============================================================================
// Step 6: Auto-Tuning
// ============================================================================

// Scheduler Job: Weekly
$optimization = $autoTuning->optimizeThreshold(
    'n1-detector',
    Version::fromString('1.1.0'),
    'f1_score',
    [0.5, 0.9],
    0.05
);

if ($optimization['improvement_percent'] > 5.0) {
    $updatedConfig = ['threshold' => $optimization['optimal_threshold']];
    $updated = $deployed->withConfiguration($updatedConfig);
    $registry->update($updated);
}

Scheduler Integration

use App\Framework\Scheduler\Services\SchedulerService;
use App\Framework\Scheduler\Schedules\IntervalSchedule;
use App\Framework\Core\ValueObjects\Duration;

// Performance Monitoring (Every 5 minutes)
$scheduler->schedule(
    'ml-performance-monitoring',
    IntervalSchedule::every(Duration::fromMinutes(5)),
    function() use ($performanceMonitor) {
        $productionModels = $this->registry->getProductionModels();

        foreach ($productionModels as $model) {
            $metrics = $performanceMonitor->getCurrentMetrics(
                $model->modelName,
                $model->version,
                Duration::fromHours(1)
            );

            // Log metrics for dashboard
        }
    }
);

// Degradation Check (Every hour)
$scheduler->schedule(
    'ml-degradation-check',
    IntervalSchedule::every(Duration::fromHours(1)),
    function() use ($performanceMonitor) {
        $productionModels = $this->registry->getProductionModels();

        foreach ($productionModels as $model) {
            $degradation = $performanceMonitor->getPerformanceDegradationInfo(
                $model->modelName,
                $model->version
            );

            if ($degradation['has_degraded']) {
                // Automatic alert sent via AlertingService
            }
        }
    }
);

// Auto-Tuning (Weekly)
$scheduler->schedule(
    'ml-auto-tuning',
    CronSchedule::fromExpression('0 2 * * 0'),  // Sunday 2 AM
    function() use ($autoTuning, $registry) {
        $productionModels = $registry->getProductionModels();

        foreach ($productionModels as $model) {
            $optimization = $autoTuning->optimizeThreshold(
                $model->modelName,
                $model->version,
                'f1_score',
                [0.5, 0.9],
                0.05
            );

            if ($optimization['improvement_percent'] > 3.0) {
                $updated = $model->withConfiguration([
                    'threshold' => $optimization['optimal_threshold']
                ]);
                $registry->update($updated);
            }
        }
    }
);

// Drift Detection (Daily)
$scheduler->schedule(
    'ml-drift-detection',
    CronSchedule::fromExpression('0 3 * * *'),  // Daily 3 AM
    function() use ($performanceMonitor) {
        $productionModels = $this->registry->getProductionModels();

        foreach ($productionModels as $model) {
            $hasDrift = $performanceMonitor->detectConceptDrift(
                $model->modelName,
                $model->version
            );

            if ($hasDrift) {
                // Schedule retraining
            }
        }
    }
);

Best Practices

1. Model Versioning

  • Semantic Versioning: MAJOR.MINOR.PATCH für alle Modelle
  • Breaking Changes: Major version increment bei Breaking Changes
  • Feature Additions: Minor version increment bei neuen Features
  • Bug Fixes: Patch version für Threshold-Adjustments

2. A/B Testing

  • Sample Size: Mindestens 100 Predictions pro Version für statistische Signifikanz
  • Gradual Rollout: Schrittweise Erhöhung (10% → 25% → 50% → 75% → 100%)
  • Monitoring: Kontinuierliches Monitoring während Rollout
  • Rollback Plan: Automatischer Rollback bei Performance-Degradation

3. Performance Monitoring

  • Real-Time Tracking: Track jede Prediction mit Ground Truth (wenn verfügbar)
  • Alert Thresholds: 5% Degradation = Warning, 10% = Critical
  • Drift Detection: Tägliche Checks auf Concept Drift
  • Retention: Performance-Daten 30 Tage aufbewahren

4. Auto-Tuning

  • Grid Search: Wöchentlich für Threshold-Optimierung
  • Adaptive Adjustment: Täglich basierend auf FP/FN Rates
  • Minimum Improvement: Mindestens 3% Verbesserung für Threshold-Änderung
  • Validation: Immer auf separatem Validation-Set testen

5. Production Deployment

  • Baseline: Immer Baseline-Metrics vor Deployment erfassen
  • Canary Deployment: Gradual Rollout mit 10% Start
  • Monitoring Window: Mindestens 24h Monitoring nach Full Deployment
  • Rollback: Automatischer Rollback bei >5% Accuracy Drop

Performance Characteristics

  • ModelRegistry: O(1) Lookup, 7-day Cache TTL
  • A/B Testing: <50ms per traffic routing decision
  • Performance Monitor: ~1ms per prediction tracking
  • Auto-Tuning: ~10s for threshold optimization (grid search 0.05 steps)
  • Storage: 30-day retention für Performance-Daten

Troubleshooting

Problem: Hohe False Positive Rate

Solution:

$adaptive = $autoTuning->adaptiveThresholdAdjustment(
    'n1-detector',
    $currentVersion
);
// Automatically suggests threshold increase

Problem: Performance Degradation

Solution:

// 1. Check degradation info
$degradation = $performanceMonitor->getPerformanceDegradationInfo(...);

// 2. Rollback to previous version
$previousVersion = Version::fromString('1.0.0');
$previous = $registry->get('n1-detector', $previousVersion);
$redeployed = $previous->withDeployment('production', Timestamp::now());
$registry->update($redeployed);

// 3. Schedule retraining

Problem: A/B Test Inconclusive

Solution:

// Increase sample size
$requiredSize = $abTesting->calculateRequiredSampleSize(
    confidenceLevel: 0.95,
    marginOfError: 0.03  // Reduce margin of error
);

// Continue test until sample size reached

Integration mit bestehenden ML-Systemen

N+1 Detection

$metadata = ModelMetadata::forN1Detector(
    version: Version::fromString('1.0.0'),
    configuration: $n1Detector->getConfiguration()
);

$registry->register($metadata);

// Track predictions
foreach ($detections as $detection) {
    $performanceMonitor->trackPrediction(
        modelName: 'n1-detector',
        version: $metadata->version,
        prediction: $detection->isN1Pattern,
        actual: $groundTruth,  // If available
        confidence: $detection->confidence
    );
}

WAF Behavioral Analysis

$metadata = ModelMetadata::forWafBehavioral(
    version: Version::fromString('2.0.0'),
    configuration: $wafBehavioral->getConfiguration()
);

$registry->register($metadata);

Queue Job Anomaly Detection

$metadata = ModelMetadata::forQueueAnomaly(
    version: Version::fromString('1.2.0'),
    configuration: $queueAnomaly->getConfiguration()
);

$registry->register($metadata);

Zusammenfassung

Das ML Model Management System bietet:

Centralized Model Registry mit Versionsverwaltung A/B Testing mit statistischer Signifikanz-Prüfung Real-Time Performance Monitoring mit Drift Detection Automatic Threshold Optimization mit Grid Search Production-Ready mit Cache-basierter Persistenz Framework-Compliant mit Value Objects und readonly Classes Fully Integrated mit Scheduler und Queue System Scalable mit 30-day Performance Data Retention

Das System ist vollständig in das Custom PHP Framework integriert und folgt allen Framework-Patterns.