michaelschiemer/tests/e2e/livecomponents/LOAD-TESTS.md

# LiveComponents Concurrent Upload Load Tests

Performance and scalability testing for the LiveComponents chunked upload system under high concurrent load.

## Overview

This test suite validates system behavior under various load conditions:
- **Light Load**: 5 concurrent users, 2 files each (1MB) - Baseline performance
- **Moderate Load**: 10 concurrent users, 3 files each (2MB) - Typical production load
- **Heavy Load**: 20 concurrent users, 5 files each (5MB) - Peak traffic simulation
- **Stress Test**: 50 concurrent users, 2 files each (1MB) - System limits discovery

## Quick Start

### Prerequisites

```bash
# Ensure Playwright is installed
npm install

# Install Chromium browser
npx playwright install chromium

# Ensure development server is running with adequate resources
make up

# For heavy/stress tests, ensure server has sufficient resources:
# - At least 4GB RAM available
# - At least 2 CPU cores
# - Adequate disk I/O capacity
```

### Running Load Tests

```bash
# Run all load tests (WARNING: Resource intensive!)
npm run test:load

# Run specific load test
npx playwright test concurrent-upload-load.spec.js --grep "Light Load"

# Run with visible browser (for debugging)
npm run test:load:headed

# Run in headless mode (recommended for CI/CD)
npm run test:load
```

## Load Test Scenarios

### 1. Light Load Test

**Configuration:**
- **Users**: 5 concurrent
- **Files per User**: 2
- **File Size**: 1MB each
- **Total Data**: 10MB
- **Expected Duration**: <30 seconds

**Performance Thresholds:**
- **Max Duration**: 30 seconds
- **Max Memory**: 200MB
- **Max Avg Response Time**: 1 second
- **Min Success Rate**: 95%

**Use Case:** Baseline performance validation, continuous integration tests

**Example Results:**
```
=== Light Load Test Results ===
Total Duration: 18,234ms
Total Uploads: 10
Successful: 10
Failed: 0
Success Rate: 100.00%
Avg Response Time: 823.45ms
Max Response Time: 1,452ms
Avg Memory: 125.34MB
Max Memory: 178.21MB
```

### 2. Moderate Load Test

**Configuration:**
- **Users**: 10 concurrent
- **Files per User**: 3
- **File Size**: 2MB each
- **Total Data**: 60MB
- **Expected Duration**: <60 seconds

**Performance Thresholds:**
- **Max Duration**: 60 seconds
- **Max Memory**: 500MB
- **Max Avg Response Time**: 2 seconds
- **Min Success Rate**: 90%

**Use Case:** Typical production load simulation, daily performance monitoring

**Example Results:**
```
=== Moderate Load Test Results ===
Total Duration: 47,892ms
Total Uploads: 30
Successful: 28
Failed: 2
Success Rate: 93.33%
Avg Response Time: 1,567.23ms
Max Response Time: 2,891ms
Avg Memory: 342.56MB
Max Memory: 467.89MB
```

### 3. Heavy Load Test

**Configuration:**
- **Users**: 20 concurrent
- **Files per User**: 5
- **File Size**: 5MB each
- **Total Data**: 500MB
- **Expected Duration**: <120 seconds

**Performance Thresholds:**
- **Max Duration**: 120 seconds
- **Max Memory**: 1GB (1024MB)
- **Max Avg Response Time**: 3 seconds
- **Min Success Rate**: 85%

**Use Case:** Peak traffic simulation, capacity planning

**Example Results:**
```
=== Heavy Load Test Results ===
Total Duration: 102,456ms
Total Uploads: 100
Successful: 87
Failed: 13
Success Rate: 87.00%
Avg Response Time: 2,734.12ms
Max Response Time: 4,567ms
Avg Memory: 723.45MB
Max Memory: 956.78MB
```

### 4. Stress Test

**Configuration:**
- **Users**: 50 concurrent
- **Files per User**: 2
- **File Size**: 1MB each
- **Total Data**: 100MB
- **Expected Duration**: <180 seconds

**Performance Thresholds:**
- **Max Duration**: 180 seconds
- **Max Memory**: 2GB (2048MB)
- **Max Avg Response Time**: 5 seconds
- **Min Success Rate**: 80%

**Use Case:** System limits discovery, failure mode analysis

**Example Results:**
```
=== Stress Test Results ===
Total Duration: 156,789ms
Total Uploads: 100
Successful: 82
Failed: 18
Success Rate: 82.00%
Avg Response Time: 4,234.56ms
Max Response Time: 7,891ms
Avg Memory: 1,456.78MB
Max Memory: 1,923.45MB
Total Errors: 18
```

### 5. Queue Management Test

**Tests:** Concurrent upload queue handling with proper limits

**Validates:**
- Maximum concurrent uploads respected (default: 3)
- Queue properly manages waiting uploads
- All uploads eventually complete
- No queue starvation or deadlocks

**Configuration:**
- **Files**: 10 files uploaded simultaneously
- **Expected Max Concurrent**: 3 uploads at any time

**Example Output:**
```
Queue States Captured: 18
Max Concurrent Uploads: 3
Final Completed: 10
Queue Properly Managed: ✅
```

### 6. Resource Cleanup Test

**Tests:** Memory cleanup after concurrent uploads complete

**Validates:**
- Memory properly released after uploads
- No memory leaks
- Garbage collection effective
- System returns to baseline state

**Measurement Points:**
- **Baseline Memory**: Before any uploads
- **After Upload Memory**: Immediately after all uploads complete
- **After Cleanup Memory**: After garbage collection

**Expected Behavior:**
- Memory after cleanup should be <50% of memory increase during uploads

**Example Output:**
```
Memory Usage:
Baseline: 45.23MB
After Uploads: 156.78MB (Δ +111.55MB)
After Cleanup: 52.34MB (Δ +7.11MB from baseline)
Cleanup Effectiveness: 93.6%
```

### 7. Error Recovery Test

**Tests:** System recovery from failures during concurrent uploads

**Validates:**
- Automatic retry on failures
- Failed uploads eventually succeed
- No corruption from partial failures
- Graceful degradation under stress

**Simulation:**
- Every 3rd chunk request fails
- System must retry and complete all uploads

**Expected Behavior:**
- All uploads complete successfully despite failures
- Retry logic handles failures transparently

### 8. Throughput Test

**Tests:** Sustained upload throughput measurement

**Configuration:**
- **Files**: 20 files
- **File Size**: 5MB each
- **Total Data**: 100MB

**Metrics:**
- **Throughput**: Total MB / Total Time (seconds)
- **Expected Minimum**: >1 MB/s on localhost

**Example Output:**
```
Throughput Test Results:
Total Data: 100MB
Total Duration: 67.89s
Throughput: 1.47 MB/s
```

## Performance Metrics Explained

### Duration Metrics

- **Total Duration**: Time from test start to all uploads complete
- **Avg Response Time**: Average time for single upload completion
- **Max Response Time**: Slowest single upload time
- **Min Response Time**: Fastest single upload time

### Success Metrics

- **Total Uploads**: Number of attempted uploads
- **Successful Uploads**: Successfully completed uploads
- **Failed Uploads**: Uploads that failed even after retries
- **Success Rate**: Successful / Total (as percentage)

### Resource Metrics

- **Avg Memory**: Average browser memory usage during test
- **Max Memory**: Peak browser memory usage
- **Memory Delta**: Difference between baseline and peak

### Throughput Metrics

- **Throughput (MB/s)**: Data uploaded per second
- **Formula**: Total MB / Total Duration (seconds)
- **Expected Range**: 1-10 MB/s depending on system

## Understanding Test Results

### Success Criteria

✅ **PASS** - All thresholds met:
- Duration within expected maximum
- Memory usage within limits
- Success rate above minimum
- Avg response time acceptable

⚠️ **WARNING** - Some thresholds exceeded:
- Review specific metrics
- Check server resources
- Analyze error patterns

❌ **FAIL** - Critical thresholds exceeded:
- System unable to handle load
- Investigate bottlenecks
- Scale resources or optimize code

### Interpreting Results

**High Success Rate (>95%)**
- System handling load well
- Retry logic effective
- Infrastructure adequate

**Moderate Success Rate (85-95%)**
- Occasional failures acceptable
- Monitor error patterns
- May need optimization

**Low Success Rate (<85%)**
- System struggling under load
- Critical bottlenecks present
- Immediate action required

**High Memory Usage (>75% of threshold)**
- Potential memory leak
- Inefficient resource management
- Review memory cleanup logic

**Slow Response Times (>75% of threshold)**
- Server bottleneck
- Network congestion
- Database query optimization needed

## Troubleshooting

### Tests Timing Out

**Symptoms:**
- Load tests exceed timeout limits
- Uploads never complete
- Browser hangs or crashes

**Solutions:**

1. **Increase Test Timeout:**
```javascript
test('Heavy Load', async ({ browser }) => {
    test.setTimeout(180000); // 3 minutes
    // ... test code
});
```

2. **Reduce Load:**
```javascript
const LOAD_TEST_CONFIG = {
    heavy: {
        users: 10,  // Reduced from 20
        filesPerUser: 3,  // Reduced from 5
        fileSizeMB: 2,  // Reduced from 5
        expectedDuration: 90000  // Adjusted
    }
};
```

3. **Check Server Resources:**
```bash
# Monitor server resources during test
docker stats

# Increase Docker resources if needed
# In Docker Desktop: Settings → Resources
```

### High Failure Rate

**Symptoms:**
- Success rate below threshold
- Many upload failures
- Timeout errors

**Solutions:**

1. **Check Server Logs:**
```bash
# View PHP error logs
docker logs php

# View Nginx logs
docker logs nginx
```

2. **Increase Server Resources:**
```bash
# Check current limits
docker exec php php -i | grep memory_limit

# Update php.ini or .env
UPLOAD_MAX_FILESIZE=100M
POST_MAX_SIZE=100M
MEMORY_LIMIT=512M
```

3. **Optimize Queue Configuration:**
```env
UPLOAD_PARALLEL_CHUNKS=5  # Increase concurrent chunks
UPLOAD_CHUNK_SIZE=1048576  # Increase chunk size to 1MB
```

### Memory Issues

**Symptoms:**
- Browser out of memory errors
- System becomes unresponsive
- Test crashes

**Solutions:**

1. **Increase Browser Memory:**
```javascript
const browser = await chromium.launch({
    args: [
        '--max-old-space-size=4096',  // 4GB Node heap
        '--disable-dev-shm-usage'      // Use /tmp instead of /dev/shm
    ]
});
```

2. **Reduce File Sizes:**
```javascript
const LOAD_TEST_CONFIG = {
    heavy: {
        fileSizeMB: 2,  // Reduced from 5MB
    }
};
```

3. **Run Tests Sequentially:**
```bash
# Run one test at a time
npx playwright test concurrent-upload-load.spec.js --grep "Light Load"
npx playwright test concurrent-upload-load.spec.js --grep "Moderate Load"
# etc.
```

### Network Errors

**Symptoms:**
- Connection refused errors
- Request timeouts
- SSL/TLS errors

**Solutions:**

1. **Verify Server Running:**
```bash
# Check server status
docker ps

# Restart if needed
make down && make up
```

2. **Check SSL Certificates:**
```bash
# Navigate to https://localhost in browser
# Accept self-signed certificate if prompted
```

3. **Increase Network Timeouts:**
```javascript
await page.goto('https://localhost/livecomponents/test/upload', {
    timeout: 60000  // 60 seconds
});
```

## CI/CD Integration

### GitHub Actions

```yaml
name: Load Tests

on:
  schedule:
    - cron: '0 2 * * *'  # Daily at 2 AM
  workflow_dispatch:  # Manual trigger

jobs:
  load-tests:
    runs-on: ubuntu-latest
    timeout-minutes: 30

    steps:
      - uses: actions/checkout@v3

      - uses: actions/setup-node@v3
        with:
          node-version: '18'

      - name: Install dependencies
        run: npm ci

      - name: Install Playwright
        run: npx playwright install --with-deps chromium

      - name: Start dev server
        run: |
          make up
          sleep 10  # Wait for server to be ready

      - name: Run Light Load Test
        run: npx playwright test concurrent-upload-load.spec.js --grep "Light Load"

      - name: Run Moderate Load Test
        run: npx playwright test concurrent-upload-load.spec.js --grep "Moderate Load"

      - name: Upload test results
        if: always()
        uses: actions/upload-artifact@v3
        with:
          name: load-test-results
          path: test-results/

      - name: Notify on failure
        if: failure()
        uses: actions/github-script@v6
        with:
          script: |
            github.rest.issues.create({
              owner: context.repo.owner,
              repo: context.repo.repo,
              title: 'Load Tests Failed',
              body: 'Load tests failed. Check workflow run for details.'
            });
```

### Performance Regression Detection

```bash
# Run baseline test and save results
npm run test:load > baseline-results.txt

# After code changes, run again and compare
npm run test:load > current-results.txt
diff baseline-results.txt current-results.txt

# Automated regression check
npm run test:load:regression
```

## Best Practices

### 1. Test Environment

- **Dedicated Server**: Use dedicated test server to avoid interference
- **Consistent Resources**: Same hardware/container specs for reproducibility
- **Isolated Network**: Minimize network variability
- **Clean State**: Reset database/cache between test runs

### 2. Test Execution

- **Start Small**: Begin with light load, increase gradually
- **Monitor Resources**: Watch server CPU, memory, disk during tests
- **Sequential Heavy Tests**: Don't run heavy/stress tests in parallel
- **Adequate Timeouts**: Set realistic timeouts based on load

### 3. Result Analysis

- **Track Trends**: Compare results over time, not single runs
- **Statistical Significance**: Multiple runs for reliable metrics
- **Identify Patterns**: Look for consistent failure patterns
- **Correlate Metrics**: Memory spikes with response time increases

### 4. Continuous Improvement

- **Regular Baselines**: Update baselines as system improves
- **Performance Budget**: Define and enforce performance budgets
- **Proactive Monitoring**: Catch regressions early
- **Capacity Planning**: Use load test data for scaling decisions

## Performance Optimization Tips

### Server-Side

1. **Increase Worker Processes:**
```nginx
# nginx.conf
worker_processes auto;
worker_connections 2048;
```

2. **Optimize PHP-FPM:**
```ini
; php-fpm pool config
pm = dynamic
pm.max_children = 50
pm.start_servers = 10
pm.min_spare_servers = 5
pm.max_spare_servers = 20
```

3. **Enable Caching:**
```php
// Cache upload sessions
$this->cache->set(
    "upload-session:{$sessionId}",
    $sessionData,
    ttl: 3600  // 1 hour
);
```

### Client-Side

1. **Optimize Chunk Size:**
```javascript
// Larger chunks = fewer requests but more memory
const CHUNK_SIZE = 1024 * 1024; // 1MB (default: 512KB)
```

2. **Increase Parallel Uploads:**
```javascript
// More parallelism = faster completion but more memory
const MAX_PARALLEL_CHUNKS = 5; // (default: 3)
```

3. **Implement Request Pooling:**
```javascript
// Reuse HTTP connections
const keepAlive = true;
const maxSockets = 10;
```

## Resources

- [Playwright Performance Testing](https://playwright.dev/docs/test-advanced#measuring-performance)
- [Load Testing Best Practices](https://martinfowler.com/articles/practical-test-pyramid.html#LoadTesting)
- [System Scalability Patterns](https://docs.microsoft.com/en-us/azure/architecture/patterns/category/performance-scalability)

## Support

For issues or questions:
1. Review this documentation and troubleshooting section
2. Check server logs and resource usage
3. Analyze test results for patterns
4. Consult LiveComponents upload documentation
5. Create GitHub issue with full test output