Files

Michael Schiemer fc3d7e6357 feat(Production): Complete production deployment infrastructure

- Add comprehensive health check system with multiple endpoints
- Add Prometheus metrics endpoint
- Add production logging configurations (5 strategies)
- Add complete deployment documentation suite:
  * QUICKSTART.md - 30-minute deployment guide
  * DEPLOYMENT_CHECKLIST.md - Printable verification checklist
  * DEPLOYMENT_WORKFLOW.md - Complete deployment lifecycle
  * PRODUCTION_DEPLOYMENT.md - Comprehensive technical reference
  * production-logging.md - Logging configuration guide
  * ANSIBLE_DEPLOYMENT.md - Infrastructure as Code automation
  * README.md - Navigation hub
  * DEPLOYMENT_SUMMARY.md - Executive summary
- Add deployment scripts and automation
- Add DEPLOYMENT_PLAN.md - Concrete plan for immediate deployment
- Update README with production-ready features

All production infrastructure is now complete and ready for deployment.

2025-10-25 19:18:37 +02:00

15 KiB

Raw Blame History

LiveComponents Concurrent Upload Load Tests

Performance and scalability testing for the LiveComponents chunked upload system under high concurrent load.

Overview

This test suite validates system behavior under various load conditions:

Light Load: 5 concurrent users, 2 files each (1MB) - Baseline performance
Moderate Load: 10 concurrent users, 3 files each (2MB) - Typical production load
Heavy Load: 20 concurrent users, 5 files each (5MB) - Peak traffic simulation
Stress Test: 50 concurrent users, 2 files each (1MB) - System limits discovery

Quick Start

Prerequisites

# Ensure Playwright is installed
npm install

# Install Chromium browser
npx playwright install chromium

# Ensure development server is running with adequate resources
make up

# For heavy/stress tests, ensure server has sufficient resources:
# - At least 4GB RAM available
# - At least 2 CPU cores
# - Adequate disk I/O capacity

Running Load Tests

# Run all load tests (WARNING: Resource intensive!)
npm run test:load

# Run specific load test
npx playwright test concurrent-upload-load.spec.js --grep "Light Load"

# Run with visible browser (for debugging)
npm run test:load:headed

# Run in headless mode (recommended for CI/CD)
npm run test:load

Load Test Scenarios

1. Light Load Test

Configuration:

Users: 5 concurrent
Files per User: 2
File Size: 1MB each
Total Data: 10MB
Expected Duration: <30 seconds

Performance Thresholds:

Max Duration: 30 seconds
Max Memory: 200MB
Max Avg Response Time: 1 second
Min Success Rate: 95%

Use Case: Baseline performance validation, continuous integration tests

Example Results:

=== Light Load Test Results ===
Total Duration: 18,234ms
Total Uploads: 10
Successful: 10
Failed: 0
Success Rate: 100.00%
Avg Response Time: 823.45ms
Max Response Time: 1,452ms
Avg Memory: 125.34MB
Max Memory: 178.21MB

2. Moderate Load Test

Configuration:

Users: 10 concurrent
Files per User: 3
File Size: 2MB each
Total Data: 60MB
Expected Duration: <60 seconds

Performance Thresholds:

Max Duration: 60 seconds
Max Memory: 500MB
Max Avg Response Time: 2 seconds
Min Success Rate: 90%

Use Case: Typical production load simulation, daily performance monitoring

Example Results:

=== Moderate Load Test Results ===
Total Duration: 47,892ms
Total Uploads: 30
Successful: 28
Failed: 2
Success Rate: 93.33%
Avg Response Time: 1,567.23ms
Max Response Time: 2,891ms
Avg Memory: 342.56MB
Max Memory: 467.89MB

3. Heavy Load Test

Configuration:

Users: 20 concurrent
Files per User: 5
File Size: 5MB each
Total Data: 500MB
Expected Duration: <120 seconds

Performance Thresholds:

Max Duration: 120 seconds
Max Memory: 1GB (1024MB)
Max Avg Response Time: 3 seconds
Min Success Rate: 85%

Use Case: Peak traffic simulation, capacity planning

Example Results:

=== Heavy Load Test Results ===
Total Duration: 102,456ms
Total Uploads: 100
Successful: 87
Failed: 13
Success Rate: 87.00%
Avg Response Time: 2,734.12ms
Max Response Time: 4,567ms
Avg Memory: 723.45MB
Max Memory: 956.78MB

4. Stress Test

Configuration:

Users: 50 concurrent
Files per User: 2
File Size: 1MB each
Total Data: 100MB
Expected Duration: <180 seconds

Performance Thresholds:

Max Duration: 180 seconds
Max Memory: 2GB (2048MB)
Max Avg Response Time: 5 seconds
Min Success Rate: 80%

Use Case: System limits discovery, failure mode analysis

Example Results:

=== Stress Test Results ===
Total Duration: 156,789ms
Total Uploads: 100
Successful: 82
Failed: 18
Success Rate: 82.00%
Avg Response Time: 4,234.56ms
Max Response Time: 7,891ms
Avg Memory: 1,456.78MB
Max Memory: 1,923.45MB
Total Errors: 18

5. Queue Management Test

Tests: Concurrent upload queue handling with proper limits

Validates:

Maximum concurrent uploads respected (default: 3)
Queue properly manages waiting uploads
All uploads eventually complete
No queue starvation or deadlocks

Configuration:

Files: 10 files uploaded simultaneously
Expected Max Concurrent: 3 uploads at any time

Example Output:

Queue States Captured: 18
Max Concurrent Uploads: 3
Final Completed: 10
Queue Properly Managed: ✅

6. Resource Cleanup Test

Tests: Memory cleanup after concurrent uploads complete

Validates:

Memory properly released after uploads
No memory leaks
Garbage collection effective
System returns to baseline state

Measurement Points:

Baseline Memory: Before any uploads
After Upload Memory: Immediately after all uploads complete
After Cleanup Memory: After garbage collection

Expected Behavior:

Memory after cleanup should be <50% of memory increase during uploads

Example Output:

Memory Usage:
Baseline: 45.23MB
After Uploads: 156.78MB (Δ +111.55MB)
After Cleanup: 52.34MB (Δ +7.11MB from baseline)
Cleanup Effectiveness: 93.6%

7. Error Recovery Test

Tests: System recovery from failures during concurrent uploads

Validates:

Automatic retry on failures
Failed uploads eventually succeed
No corruption from partial failures
Graceful degradation under stress

Simulation:

Every 3rd chunk request fails
System must retry and complete all uploads

Expected Behavior:

All uploads complete successfully despite failures
Retry logic handles failures transparently

8. Throughput Test

Tests: Sustained upload throughput measurement

Configuration:

Files: 20 files
File Size: 5MB each
Total Data: 100MB

Metrics:

Throughput: Total MB / Total Time (seconds)
Expected Minimum: >1 MB/s on localhost

Example Output:

Throughput Test Results:
Total Data: 100MB
Total Duration: 67.89s
Throughput: 1.47 MB/s

Performance Metrics Explained

Duration Metrics

Total Duration: Time from test start to all uploads complete
Avg Response Time: Average time for single upload completion
Max Response Time: Slowest single upload time
Min Response Time: Fastest single upload time

Success Metrics

Total Uploads: Number of attempted uploads
Successful Uploads: Successfully completed uploads
Failed Uploads: Uploads that failed even after retries
Success Rate: Successful / Total (as percentage)

Resource Metrics

Avg Memory: Average browser memory usage during test
Max Memory: Peak browser memory usage
Memory Delta: Difference between baseline and peak

Throughput Metrics

Throughput (MB/s): Data uploaded per second
Formula: Total MB / Total Duration (seconds)
Expected Range: 1-10 MB/s depending on system

Understanding Test Results

Success Criteria

✅ PASS - All thresholds met:

Duration within expected maximum
Memory usage within limits
Success rate above minimum
Avg response time acceptable

⚠️ WARNING - Some thresholds exceeded:

Review specific metrics
Check server resources
Analyze error patterns

❌ FAIL - Critical thresholds exceeded:

System unable to handle load
Investigate bottlenecks
Scale resources or optimize code

Interpreting Results

High Success Rate (>95%)

System handling load well
Retry logic effective
Infrastructure adequate

Moderate Success Rate (85-95%)

Occasional failures acceptable
Monitor error patterns
May need optimization

Low Success Rate (<85%)

System struggling under load
Critical bottlenecks present
Immediate action required

High Memory Usage (>75% of threshold)

Potential memory leak
Inefficient resource management
Review memory cleanup logic

Slow Response Times (>75% of threshold)

Server bottleneck
Network congestion
Database query optimization needed

Troubleshooting

Tests Timing Out

Symptoms:

Load tests exceed timeout limits
Uploads never complete
Browser hangs or crashes

Solutions:

Increase Test Timeout:

test('Heavy Load', async ({ browser }) => {
    test.setTimeout(180000); // 3 minutes
    // ... test code
});

Reduce Load:

const LOAD_TEST_CONFIG = {
    heavy: {
        users: 10,  // Reduced from 20
        filesPerUser: 3,  // Reduced from 5
        fileSizeMB: 2,  // Reduced from 5
        expectedDuration: 90000  // Adjusted
    }
};

Check Server Resources:

# Monitor server resources during test
docker stats

# Increase Docker resources if needed
# In Docker Desktop: Settings → Resources

High Failure Rate

Symptoms:

Success rate below threshold
Many upload failures
Timeout errors

Solutions:

Check Server Logs:

# View PHP error logs
docker logs php

# View Nginx logs
docker logs nginx

Increase Server Resources:

# Check current limits
docker exec php php -i | grep memory_limit

# Update php.ini or .env
UPLOAD_MAX_FILESIZE=100M
POST_MAX_SIZE=100M
MEMORY_LIMIT=512M

Optimize Queue Configuration:

UPLOAD_PARALLEL_CHUNKS=5  # Increase concurrent chunks
UPLOAD_CHUNK_SIZE=1048576  # Increase chunk size to 1MB

Memory Issues

Symptoms:

Browser out of memory errors
System becomes unresponsive
Test crashes

Solutions:

Increase Browser Memory:

const browser = await chromium.launch({
    args: [
        '--max-old-space-size=4096',  // 4GB Node heap
        '--disable-dev-shm-usage'      // Use /tmp instead of /dev/shm
    ]
});

Reduce File Sizes:

const LOAD_TEST_CONFIG = {
    heavy: {
        fileSizeMB: 2,  // Reduced from 5MB
    }
};

Run Tests Sequentially:

# Run one test at a time
npx playwright test concurrent-upload-load.spec.js --grep "Light Load"
npx playwright test concurrent-upload-load.spec.js --grep "Moderate Load"
# etc.

Network Errors

Symptoms:

Connection refused errors
Request timeouts
SSL/TLS errors

Solutions:

Verify Server Running:

# Check server status
docker ps

# Restart if needed
make down && make up

Check SSL Certificates:

# Navigate to https://localhost in browser
# Accept self-signed certificate if prompted

Increase Network Timeouts:

await page.goto('https://localhost/livecomponents/test/upload', {
    timeout: 60000  // 60 seconds
});

CI/CD Integration

GitHub Actions

name: Load Tests

on:
  schedule:
    - cron: '0 2 * * *'  # Daily at 2 AM
  workflow_dispatch:  # Manual trigger

jobs:
  load-tests:
    runs-on: ubuntu-latest
    timeout-minutes: 30

    steps:
      - uses: actions/checkout@v3

      - uses: actions/setup-node@v3
        with:
          node-version: '18'

      - name: Install dependencies
        run: npm ci

      - name: Install Playwright
        run: npx playwright install --with-deps chromium

      - name: Start dev server
        run: |
          make up
          sleep 10  # Wait for server to be ready

      - name: Run Light Load Test
        run: npx playwright test concurrent-upload-load.spec.js --grep "Light Load"

      - name: Run Moderate Load Test
        run: npx playwright test concurrent-upload-load.spec.js --grep "Moderate Load"

      - name: Upload test results
        if: always()
        uses: actions/upload-artifact@v3
        with:
          name: load-test-results
          path: test-results/

      - name: Notify on failure
        if: failure()
        uses: actions/github-script@v6
        with:
          script: |
            github.rest.issues.create({
              owner: context.repo.owner,
              repo: context.repo.repo,
              title: 'Load Tests Failed',
              body: 'Load tests failed. Check workflow run for details.'
            });

Performance Regression Detection

# Run baseline test and save results
npm run test:load > baseline-results.txt

# After code changes, run again and compare
npm run test:load > current-results.txt
diff baseline-results.txt current-results.txt

# Automated regression check
npm run test:load:regression

Best Practices

1. Test Environment

Dedicated Server: Use dedicated test server to avoid interference
Consistent Resources: Same hardware/container specs for reproducibility
Isolated Network: Minimize network variability
Clean State: Reset database/cache between test runs

2. Test Execution

Start Small: Begin with light load, increase gradually
Monitor Resources: Watch server CPU, memory, disk during tests
Sequential Heavy Tests: Don't run heavy/stress tests in parallel
Adequate Timeouts: Set realistic timeouts based on load

3. Result Analysis

Track Trends: Compare results over time, not single runs
Statistical Significance: Multiple runs for reliable metrics
Identify Patterns: Look for consistent failure patterns
Correlate Metrics: Memory spikes with response time increases

4. Continuous Improvement

Regular Baselines: Update baselines as system improves
Performance Budget: Define and enforce performance budgets
Proactive Monitoring: Catch regressions early
Capacity Planning: Use load test data for scaling decisions

Performance Optimization Tips

Server-Side

Increase Worker Processes:

# nginx.conf
worker_processes auto;
worker_connections 2048;

Optimize PHP-FPM:

; php-fpm pool config
pm = dynamic
pm.max_children = 50
pm.start_servers = 10
pm.min_spare_servers = 5
pm.max_spare_servers = 20

Enable Caching:

// Cache upload sessions
$this->cache->set(
    "upload-session:{$sessionId}",
    $sessionData,
    ttl: 3600  // 1 hour
);

Client-Side

Optimize Chunk Size:

// Larger chunks = fewer requests but more memory
const CHUNK_SIZE = 1024 * 1024; // 1MB (default: 512KB)

Increase Parallel Uploads:

// More parallelism = faster completion but more memory
const MAX_PARALLEL_CHUNKS = 5; // (default: 3)

Implement Request Pooling:

// Reuse HTTP connections
const keepAlive = true;
const maxSockets = 10;

Resources

Support

For issues or questions:

Review this documentation and troubleshooting section
Check server logs and resource usage
Analyze test results for patterns
Consult LiveComponents upload documentation
Create GitHub issue with full test output

15 KiB Raw Blame History

LiveComponents Concurrent Upload Load Tests

Overview

Quick Start

Prerequisites

Running Load Tests

Load Test Scenarios

1. Light Load Test

2. Moderate Load Test

3. Heavy Load Test

4. Stress Test

5. Queue Management Test

6. Resource Cleanup Test

7. Error Recovery Test

8. Throughput Test

Performance Metrics Explained

Duration Metrics

Success Metrics

Resource Metrics

Throughput Metrics

Understanding Test Results

Success Criteria

Interpreting Results

Troubleshooting

Tests Timing Out

High Failure Rate

Memory Issues

Network Errors

CI/CD Integration

GitHub Actions

Performance Regression Detection

Best Practices

1. Test Environment

2. Test Execution

3. Result Analysis

4. Continuous Improvement

Performance Optimization Tips

Server-Side

Client-Side

Resources

Support

15 KiB

Raw Blame History