Files
michaelschiemer/docs/ERROR-HANDLING-AUDIT-REPORT.md
Michael Schiemer fc3d7e6357 feat(Production): Complete production deployment infrastructure
- Add comprehensive health check system with multiple endpoints
- Add Prometheus metrics endpoint
- Add production logging configurations (5 strategies)
- Add complete deployment documentation suite:
  * QUICKSTART.md - 30-minute deployment guide
  * DEPLOYMENT_CHECKLIST.md - Printable verification checklist
  * DEPLOYMENT_WORKFLOW.md - Complete deployment lifecycle
  * PRODUCTION_DEPLOYMENT.md - Comprehensive technical reference
  * production-logging.md - Logging configuration guide
  * ANSIBLE_DEPLOYMENT.md - Infrastructure as Code automation
  * README.md - Navigation hub
  * DEPLOYMENT_SUMMARY.md - Executive summary
- Add deployment scripts and automation
- Add DEPLOYMENT_PLAN.md - Concrete plan for immediate deployment
- Update README with production-ready features

All production infrastructure is now complete and ready for deployment.
2025-10-25 19:18:37 +02:00

24 KiB
Raw Permalink Blame History

Error Handling System - Audit Report

Date: 2025-10-12 Status: Audit Phase Complete Next Phase: Unified Architecture Design

Executive Summary

Das Framework verfügt über 4 separate Error-Handling-Module mit überlappenden Verantwortlichkeiten und redundanten Funktionalitäten. Eine Konsolidierung ist notwendig, um:

  • Developer Experience zu verbessern (einheitliche API)
  • Wartbarkeit zu erhöhen (Single Source of Truth)
  • Funktionalität zu erhalten (alle Features integrieren)
  • Performance zu optimieren (redundante Operationen eliminieren)

1. Current Architecture Overview

Module Structure

src/Framework/
├── Exception/               # ✅ CORE - Base exception infrastructure
│   ├── FrameworkException.php (280 lines) - Base exception with rich context
│   ├── ErrorCode.php (599 lines) - Systematic error categorization
│   ├── ExceptionContext.php (124 lines) - Domain context container
│   └── Http/, Security/, Authentication/ - Specific exceptions
│
├── ErrorHandling/           # ⚠️ ORCHESTRATION - Main error handling
│   ├── ErrorHandler.php (367 lines) - set_exception_handler, HTTP responses
│   ├── ErrorLogger.php - Structured logging
│   ├── SecurityEventHandler.php - Security exception handling
│   └── ErrorHandlerContext.php - Combined context (Exception + Request + System)
│
├── ErrorAggregation/        # 📊 ANALYTICS - Pattern detection
│   ├── ErrorAggregator.php (339 lines) - Pattern analysis & alerting
│   ├── ErrorEvent.php (290 lines) - Single error event
│   ├── ErrorPattern.php (300 lines) - Pattern of similar errors
│   ├── Storage/DatabaseErrorStorage.php - Persistence (2 tables)
│   └── Commands/ - Cleanup and management commands
│
├── ErrorBoundaries/         # 🛡️ RESILIENCE - Circuit breaker pattern
│   ├── ErrorBoundary.php (442 lines) - Graceful degradation
│   ├── CircuitBreaker/ - Circuit breaker implementation
│   ├── RetryStrategy.php - Retry patterns
│   └── BoundaryResult.php - Result objects
│
└── ErrorReporting/          # 📝 REPORTING - Analytics & storage
    ├── ErrorReporter.php (279 lines) - Report creation & storage
    ├── ErrorReport.php - Report value object
    ├── Storage/ErrorReportStorage.php - Persistence
    └── ErrorAnalyticsEngine.php - Velocity & anomaly detection

Database Schema

ErrorAggregation Tables (2):

  • error_events - Individual error occurrences
  • error_patterns - Aggregated error patterns

ErrorReporting Tables (1):

  • error_reports - Error reports with analytics

Total: 3 database tables for error storage

2. Core Components Analysis

2.1 Exception/ - Foundation Layer

FrameworkException.php (280 lines)

Strengths:

  • Rich context via ExceptionContext
  • Fluent API (withContext(), withData(), withErrorCode())
  • Multiple factory methods (simple(), create(), forOperation(), fromContext())
  • Retry logic support (withRetryAfter(), isRecoverable())
  • Serialization (toArray())
  • Automatic data sanitization (passwords, tokens, secrets)

Weaknesses:

  • ⚠️ Keine Integration mit ErrorAggregation
  • ⚠️ Keine Integration mit ErrorReporting
  • ⚠️ Keine automatische Pattern Detection
  • ⚠️ Keine Circuit Breaker Integration

ErrorCode.php (599 lines)

Strengths:

  • 100+ systematische Error Codes (Kategorien: SYS, DB, AUTH, VAL, HTTP, SEC, etc.)
  • Category extraction (getCategory())
  • Human-readable descriptions (getDescription())
  • Recovery hints (getRecoveryHint())
  • Recoverability flags (isRecoverable())
  • Retry timing (getRetryAfterSeconds())

Weaknesses:

  • ⚠️ Keine ErrorSeverity-Integration
  • ⚠️ Keine AlertUrgency-Mapping

ExceptionContext.php (124 lines)

Strengths:

  • Immutable readonly class
  • Fluent API für Context Building
  • Automatic sanitization für sensitive data
  • Clean separation: operation, component, data, debug, metadata

Weaknesses:

  • ⚠️ Keine Request/System Context (nur in ErrorHandlerContext)

2.2 ErrorHandling/ - Orchestration Layer ⚠️

ErrorHandler.php (367 lines)

Strengths:

  • Global error handler registration (set_exception_handler, set_error_handler, register_shutdown_function)
  • HTTP response creation (API vs HTML)
  • HTTP status code mapping (401, 403, 404, 413, 429, 400, 500)
  • Special handling: ValidationException, SecurityException
  • ErrorHandlerContext creation (combines Exception + Request + System context)
  • Error level determination

Weaknesses:

  • ⚠️ Keine Integration mit ErrorAggregation (no pattern detection)
  • ⚠️ Keine Integration mit ErrorBoundaries (no circuit breaker)
  • ⚠️ Redundant mit ErrorReporter functionality

ErrorHandlerContext.php

Strengths:

  • Combines ExceptionContext + RequestContext + SystemContext
  • Used by ErrorAggregation für ErrorEvent creation

Problem:

  • ⚠️ Separate context type from ExceptionContext (unnötige Fragmentierung)

2.3 ErrorAggregation/ - Analytics Layer 📊

ErrorAggregator.php (339 lines)

Strengths:

  • Pattern detection via fingerprinting (SHA-256 hash of service/component/operation/code/message)
  • ErrorEvent creation from ErrorHandlerContext
  • ErrorPattern management (occurrenceCount, firstOccurrence, lastOccurrence, affectedUsers, affectedIps)
  • Alert queuing basierend auf Pattern urgency
  • Cache integration (Pattern Cache mit 1h TTL)
  • Batch processing support
  • Cleanup strategy (Retention: DEBUG=7d, INFO=14d, WARNING=30d, ERROR=90d, CRITICAL=180d)
  • Statistics & Analytics (getStatistics, getErrorTrends, getTopPatterns)
  • Pattern acknowledgement & resolution workflow
  • Health monitoring

Weaknesses:

  • ⚠️ Not automatically integrated with ErrorHandler
  • ⚠️ Requires manual processError() call
  • ⚠️ Separate database schema from ErrorReporting

ErrorEvent.php (290 lines)

Strengths:

  • Rich event data (service, component, operation, errorCode, severity, context, metadata)
  • Created from ErrorHandlerContext
  • Fingerprint generation for grouping
  • Alert triggering logic (CRITICAL/ERROR always alert, security events always alert)
  • AlertUrgency mapping (URGENT, HIGH, MEDIUM, LOW)
  • Error message normalization (removes file paths, IDs, timestamps, ULIDs, UUIDs)

ErrorPattern.php (300 lines)

Strengths:

  • Immutable readonly class
  • Pattern lifecycle (fromErrorEvent, withNewOccurrence, acknowledge, resolve)
  • Critical pattern detection (frequency >10/min, affectedUsers >50, CRITICAL with 3+ occurrences, ERROR with 100+ occurrences)
  • Dynamic alert thresholds (CRITICAL=1, ERROR=5, WARNING=20, INFO=100, DEBUG=500)
  • Frequency calculation (errors per minute)

AlertUrgency.php

Enum: URGENT, HIGH, MEDIUM, LOW

ErrorSeverity.php

Enum: CRITICAL (180d retention), ERROR (90d), WARNING (30d), INFO (14d), DEBUG (7d)

2.4 ErrorBoundaries/ - Resilience Layer 🛡️

ErrorBoundary.php (442 lines)

Strengths:

  • Multiple execution patterns:
    • execute($operation, $fallback) - Main pattern mit Fallback
    • executeOptional($operation, $fallback) - Returns null on failure
    • executeWithDefault($operation, $defaultValue) - Returns default on failure
    • executeForResult($operation) - Returns BoundaryResult object
    • executeParallel($operations) - Parallel execution with individual boundaries
    • executeWithCircuitBreaker($operation, $fallback) - Circuit breaker pattern
    • executeWithTimeout($operation, $fallback, $timeout) - Timeout protection
    • executeBulk($items, $processor) - Bulk processing with partial failure tolerance
  • Retry strategies: FIXED, LINEAR, EXPONENTIAL, EXPONENTIAL_JITTER
  • Non-retryable error detection (VAL, SEC, AUTH errors)
  • Circuit breaker integration (BoundaryCircuitBreakerManager)
  • Event system integration (BoundaryExecutionSucceeded, BoundaryExecutionFailed, BoundaryFallbackExecuted, BoundaryTimeoutOccurred)
  • Configurable via BoundaryConfig (maxRetries, baseDelay, maxDelay, circuitBreakerEnabled, maxBulkErrorRate)

Weaknesses:

  • ⚠️ Not integrated with FrameworkException
  • ⚠️ Separate exception types (BoundaryTimeoutException, BoundaryFailedException)
  • ⚠️ No automatic ErrorAggregation integration

BoundaryResult.php

Strengths:

  • Result pattern for error handling without exceptions
  • success($value), failure($exception) factory methods
  • isSuccess(), isFailure(), getValue(), getError() methods

2.5 ErrorReporting/ - Reporting Layer 📝

ErrorReporter.php (279 lines)

Strengths:

  • Report creation from Throwable or manual
  • Filter system (skip reporting based on filters)
  • Processor system (enrich reports)
  • Async processing via Queue
  • Batch reporting support
  • Contextual reporters (RequestContextualReporter, UserContextualReporter)
  • Statistics & cleanup
  • ErrorReportCriteria for searching

Weaknesses:

  • ⚠️ Redundant with ErrorAggregation functionality
  • ⚠️ Separate database schema
  • ⚠️ Not integrated with FrameworkException automatic reporting

ErrorReport.php

Strengths:

  • Rich report data (id, level, message, context, exception, stackTrace, environment, request, user)
  • Created from Throwable or manually
  • Serialization support

ErrorAnalyticsEngine.php

Strengths:

  • Velocity tracking (ErrorVelocity: eventsPerMinute, trendDirection)
  • Anomaly detection (ErrorAnomaly: deviationFromBaseline, isAnomaly)

3. Overlap Analysis

Functionality Overlap Matrix

Feature Exception ErrorHandling ErrorAggregation ErrorBoundaries ErrorReporting
Exception Creation Primary
Error Context Primary Extended
Error Logging Primary Duplicate Duplicate
HTTP Response Primary
Pattern Detection Primary
Alerting Primary
Database Storage Primary Duplicate
Statistics Primary Duplicate
Circuit Breaker Primary
Retry Logic Partial Primary
Graceful Degradation Primary
Analytics Primary Duplicate

Redundancy Identification

🔴 HIGH REDUNDANCY:

  1. Error Storage:

    • ErrorAggregation: error_events, error_patterns tables
    • ErrorReporting: error_reports table
    • Solution: Unified storage strategy
  2. Error Logging:

    • ErrorHandler: ErrorLogger
    • ErrorAggregation: Storage via ErrorAggregator
    • ErrorReporting: Report storage
    • Solution: Single logging pathway
  3. Statistics & Analytics:

    • ErrorAggregation: getStatistics(), getErrorTrends(), getTopPatterns()
    • ErrorReporting: getStatistics(), ErrorAnalyticsEngine
    • Solution: Unified analytics engine

🟡 MEDIUM REDUNDANCY:

  1. Context Management:

    • ExceptionContext (Exception/)
    • ErrorHandlerContext (ErrorHandling/)
    • Solution: Extend ExceptionContext instead of separate type
  2. Retry Logic:

    • FrameworkException: withRetryAfter(), isRecoverable()
    • ErrorBoundary: RetryStrategy with multiple patterns
    • Solution: Integrate retry strategies into FrameworkException

🟢 LOW REDUNDANCY (Different concerns):

  1. Pattern Detection vs Error Reporting:
    • ErrorAggregation: Pattern detection für alerting
    • ErrorReporting: Individual error reports
    • Keep both: Different use cases

4. Integration Issues

Current Integration Problems

Problem 1: ErrorHandler doesn't use ErrorAggregation

  • ErrorHandler creates responses but doesn't trigger pattern detection
  • Manual $errorAggregator->processError($context) call needed
  • Impact: Patterns not automatically detected

Problem 2: FrameworkException doesn't trigger ErrorReporting

  • Exceptions created but not automatically reported
  • Manual $errorReporter->reportThrowable($e) call needed
  • Impact: Missing error reports in analytics

Problem 3: ErrorBoundaries use separate exception types

  • BoundaryTimeoutException, BoundaryFailedException not FrameworkException
  • Don't benefit from ErrorCode, ExceptionContext, etc.
  • Impact: Inconsistent exception handling

Problem 4: Fragmented context

  • ExceptionContext in Exception/
  • ErrorHandlerContext in ErrorHandling/
  • Developers confused about which to use
  • Impact: Poor developer experience

Problem 5: No circuit breaker in FrameworkException

  • ErrorBoundary has circuit breaker
  • FrameworkException has retry hints
  • Not integrated
  • Impact: Resilience features not unified

5. Strengths to Preserve

Must-Keep Features

From Exception/:

  • FrameworkException as base class
  • ErrorCode enum with 100+ codes
  • ExceptionContext fluent API
  • Factory methods (simple, create, forOperation, fromContext)
  • Automatic data sanitization
  • Serialization support

From ErrorHandling/:

  • Global error handler registration
  • HTTP response creation (API vs HTML)
  • HTTP status code mapping
  • ValidationException special handling
  • SecurityException special handling

From ErrorAggregation/:

  • Pattern detection via fingerprinting
  • ErrorEvent & ErrorPattern value objects
  • Alert queuing system
  • Severity-based retention policies
  • Critical pattern detection
  • Pattern acknowledgement workflow
  • Statistics & trend analysis

From ErrorBoundaries/:

  • Multiple execution patterns (execute, executeOptional, executeWithDefault, etc.)
  • Retry strategies (FIXED, LINEAR, EXPONENTIAL, EXPONENTIAL_JITTER)
  • Circuit breaker pattern
  • Timeout protection
  • Bulk processing with partial failure tolerance
  • Event system integration
  • BoundaryResult pattern

From ErrorReporting/:

  • Report filtering system
  • Report processors (enrichment)
  • Async processing via Queue
  • Contextual reporters (Request, User)
  • ErrorVelocity & ErrorAnomaly detection

6. Proposed Consolidation Strategy

High-Level Architecture

src/Framework/Exception/              # UNIFIED MODULE
├── Core/
│   ├── FrameworkException.php        # Base exception (enhanced)
│   ├── ErrorCode.php                 # Error categorization (enhanced)
│   ├── ExceptionContext.php          # Domain context (enhanced with Request/System)
│   └── ErrorSeverity.php             # Moved from ErrorAggregation
│
├── Handling/                         # Error Handler (integrated)
│   ├── ErrorHandler.php              # Global handler (integrated with Aggregation/Reporting)
│   ├── ErrorLogger.php               # Structured logging
│   └── ResponseFactory.php           # HTTP response creation
│
├── Aggregation/                      # Pattern Detection (integrated)
│   ├── ErrorAggregator.php           # Pattern analysis (auto-triggered)
│   ├── ErrorEvent.php                # Single error event
│   ├── ErrorPattern.php              # Pattern of similar errors
│   ├── AlertManager.php              # Alert queuing
│   └── Storage/                      # Database storage
│
├── Boundaries/                       # Resilience (integrated)
│   ├── ErrorBoundary.php             # Graceful degradation
│   ├── CircuitBreaker/               # Circuit breaker pattern
│   ├── RetryStrategy.php             # Retry patterns
│   └── BoundaryResult.php            # Result objects
│
├── Reporting/                        # Analytics (integrated)
│   ├── ErrorReporter.php             # Report creation (auto-triggered)
│   ├── ErrorReport.php               # Report value object
│   ├── AnalyticsEngine.php           # Velocity & anomaly detection
│   └── Storage/                      # Database storage
│
└── Http/                             # HTTP-specific exceptions
    ├── HttpException.php
    ├── NotFoundException.php
    └── ...

Integration Points

1. FrameworkException → ErrorHandler

  • ErrorHandler automatically catches all FrameworkException
  • Creates HTTP responses
  • Already implemented

2. ErrorHandler → ErrorAggregation

  • ErrorHandler automatically calls ErrorAggregator->processError()
  • Pattern detection happens on every error
  • New integration needed

3. ErrorHandler → ErrorReporting

  • ErrorHandler automatically calls ErrorReporter->report()
  • Error reports created for all errors
  • New integration needed

4. FrameworkException → ErrorBoundaries

  • Boundary exceptions extend FrameworkException
  • Inherit ErrorCode, ExceptionContext, etc.
  • Refactoring needed

5. ErrorCode → ErrorSeverity

  • ErrorCode knows its severity
  • Used by ErrorAggregation for retention policies
  • Enhancement needed

6. ErrorBoundaries → ErrorAggregation

  • Circuit breaker events trigger pattern detection
  • Fallback executions logged as error patterns
  • New integration needed

Database Schema Consolidation

Unified Schema:

-- Main error events table (from ErrorAggregation)
error_events (
    id,
    service,
    component,
    operation,
    error_code,
    error_message,
    severity,
    occurred_at,
    context,
    metadata,
    request_id,
    user_id,
    client_ip,
    is_security_event,
    stack_trace,
    user_agent
)

-- Error patterns table (from ErrorAggregation)
error_patterns (
    id,
    fingerprint,
    service,
    component,
    operation,
    error_code,
    normalized_message,
    severity,
    occurrence_count,
    first_occurrence,
    last_occurrence,
    affected_users,
    affected_ips,
    is_active,
    is_acknowledged,
    acknowledged_by,
    acknowledged_at,
    resolution,
    metadata
)

-- Error reports table (from ErrorReporting) - MERGE INTO error_events
-- Instead of separate table, add report-specific columns to error_events

Migration Strategy:

  • Add report-specific columns to error_events
  • Migrate data from error_reports to error_events
  • Drop error_reports table
  • Update ErrorReporter to use unified storage

7. Recommendations

Immediate Actions (Phase 1: Design)

  1. Design unified ExceptionContext

    • Combine ExceptionContext + ErrorHandlerContext
    • Add Request/System context support
    • Maintain backward compatibility
  2. Design ErrorCode → ErrorSeverity mapping

    • Add getSeverity() method to ErrorCode
    • Map all 100+ codes to severities
  3. Design automatic integration

    • ErrorHandler → ErrorAggregation (auto-trigger pattern detection)
    • ErrorHandler → ErrorReporting (auto-create reports)
    • Define integration interfaces
  4. Design ErrorBoundaries integration

    • BoundaryTimeoutException extends FrameworkException
    • BoundaryFailedException extends FrameworkException
    • Inherit all FrameworkException features
  5. Design database schema consolidation

    • Merge error_reports into error_events
    • Add migration scripts
    • Plan data migration

Short-term Actions (Phase 2: Migration Plan)

  1. Create migration roadmap

    • Step-by-step migration plan
    • Backward compatibility strategy
    • Rollback plan
  2. Design testing strategy

    • Unit tests for all unified components
    • Integration tests for error flow
    • Performance benchmarks
  3. Plan deprecation strategy

    • Deprecate old APIs
    • Provide migration guides
    • Timeline for removal

Long-term Actions (Phase 3-5: Implementation)

  1. Implement unified architecture

    • Step-by-step implementation
    • Continuous testing
    • Performance monitoring
  2. Documentation updates

    • New unified error handling guide
    • Migration guide for existing code
    • Best practices documentation

8. Success Metrics

Before Consolidation

  • 4 separate error modules
  • 3 database tables for error storage
  • Manual integration between modules
  • Inconsistent exception types
  • Fragmented context types
  • Redundant logging
  • No automatic pattern detection
  • No automatic error reporting

After Consolidation

  • 1 unified Exception module
  • 2 database tables (events + patterns)
  • Automatic integration
  • All exceptions extend FrameworkException
  • Unified ExceptionContext
  • Single logging pathway
  • Automatic pattern detection
  • Automatic error reporting
  • Better developer experience
  • Improved maintainability

Performance Targets

  • Exception Creation: <1ms (current: ~0.5ms)
  • Pattern Detection: <5ms (current: N/A - manual)
  • Error Reporting: <10ms async (current: N/A - manual)
  • Database Writes: Batch async (no blocking)
  • Memory Overhead: <100KB per exception (current: ~50KB)

9. Risk Assessment

High Risks

Risk 1: Breaking existing error handling code

  • Mitigation: Backward compatibility layer
  • Likelihood: Medium
  • Impact: High

Risk 2: Performance degradation from automatic integration

  • Mitigation: Async processing, batching, caching
  • Likelihood: Low
  • Impact: Medium

Risk 3: Data loss during database migration

  • Mitigation: Comprehensive testing, backup strategy
  • Likelihood: Low
  • Impact: Critical

Medium Risks

Risk 4: Complex migration for existing code

  • Mitigation: Detailed migration guide, support period
  • Likelihood: Medium
  • Impact: Medium

Risk 5: Incomplete pattern detection

  • Mitigation: Thorough testing, monitoring
  • Likelihood: Low
  • Impact: Low

Low Risks

Risk 6: Developer confusion during transition

  • Mitigation: Clear documentation, examples
  • Likelihood: Medium
  • Impact: Low

10. Next Steps

Audit Phase COMPLETE

  • Analyze Exception/
  • Analyze ErrorHandling/
  • Analyze ErrorAggregation/
  • Analyze ErrorBoundaries/
  • Analyze ErrorReporting/
  • Identify overlap and redundancy
  • Document strengths to preserve
  • Create audit report

Design Phase (Next)

  • Design unified ExceptionContext
  • Design ErrorCode → ErrorSeverity mapping
  • Design automatic ErrorHandler integration
  • Design ErrorBoundaries integration
  • Design database schema consolidation
  • Design backward compatibility strategy
  • Create detailed architecture diagrams
  • Define integration interfaces
  • Review and approve design

Migration Plan Phase

  • Create step-by-step migration roadmap
  • Plan backward compatibility layer
  • Design deprecation timeline
  • Create database migration scripts
  • Plan rollback strategies
  • Define testing strategy

Implementation Phase

  • Phase 1: Core unification
  • Phase 2: Integration implementation
  • Phase 3: Database consolidation
  • Phase 4: Testing & validation
  • Phase 5: Documentation & cleanup

11. Conclusion

Die Audit-Phase hat eine klare Fragmentierung identifiziert:

  • Exception/ als solides Foundation Layer
  • ⚠️ ErrorHandling/ als Orchestration Layer ohne automatische Integration
  • ⚠️ ErrorAggregation/ als eigenständiges Analytics Layer
  • ⚠️ ErrorBoundaries/ als separates Resilience Layer
  • ⚠️ ErrorReporting/ als redundantes Reporting Layer

Empfehlung: Konsolidierung unter Exception/ als unified module mit automatischer Integration aller Komponenten.

Erwarteter Aufwand: 3-5 Tage wie ursprünglich geschätzt.

Erwarteter Impact:

  • Erheblich verbesserte Developer Experience
  • Reduzierte Redundanz
  • Automatische Integration
  • Bessere Wartbarkeit
  • Konsistentes Error Handling

Status: Ready for Design Phase Next Document: ERROR-HANDLING-UNIFIED-ARCHITECTURE.md