- Add comprehensive health check system with multiple endpoints - Add Prometheus metrics endpoint - Add production logging configurations (5 strategies) - Add complete deployment documentation suite: * QUICKSTART.md - 30-minute deployment guide * DEPLOYMENT_CHECKLIST.md - Printable verification checklist * DEPLOYMENT_WORKFLOW.md - Complete deployment lifecycle * PRODUCTION_DEPLOYMENT.md - Comprehensive technical reference * production-logging.md - Logging configuration guide * ANSIBLE_DEPLOYMENT.md - Infrastructure as Code automation * README.md - Navigation hub * DEPLOYMENT_SUMMARY.md - Executive summary - Add deployment scripts and automation - Add DEPLOYMENT_PLAN.md - Concrete plan for immediate deployment - Update README with production-ready features All production infrastructure is now complete and ready for deployment.
15 KiB
LiveComponents Concurrent Upload Load Tests
Performance and scalability testing for the LiveComponents chunked upload system under high concurrent load.
Overview
This test suite validates system behavior under various load conditions:
- Light Load: 5 concurrent users, 2 files each (1MB) - Baseline performance
- Moderate Load: 10 concurrent users, 3 files each (2MB) - Typical production load
- Heavy Load: 20 concurrent users, 5 files each (5MB) - Peak traffic simulation
- Stress Test: 50 concurrent users, 2 files each (1MB) - System limits discovery
Quick Start
Prerequisites
# Ensure Playwright is installed
npm install
# Install Chromium browser
npx playwright install chromium
# Ensure development server is running with adequate resources
make up
# For heavy/stress tests, ensure server has sufficient resources:
# - At least 4GB RAM available
# - At least 2 CPU cores
# - Adequate disk I/O capacity
Running Load Tests
# Run all load tests (WARNING: Resource intensive!)
npm run test:load
# Run specific load test
npx playwright test concurrent-upload-load.spec.js --grep "Light Load"
# Run with visible browser (for debugging)
npm run test:load:headed
# Run in headless mode (recommended for CI/CD)
npm run test:load
Load Test Scenarios
1. Light Load Test
Configuration:
- Users: 5 concurrent
- Files per User: 2
- File Size: 1MB each
- Total Data: 10MB
- Expected Duration: <30 seconds
Performance Thresholds:
- Max Duration: 30 seconds
- Max Memory: 200MB
- Max Avg Response Time: 1 second
- Min Success Rate: 95%
Use Case: Baseline performance validation, continuous integration tests
Example Results:
=== Light Load Test Results ===
Total Duration: 18,234ms
Total Uploads: 10
Successful: 10
Failed: 0
Success Rate: 100.00%
Avg Response Time: 823.45ms
Max Response Time: 1,452ms
Avg Memory: 125.34MB
Max Memory: 178.21MB
2. Moderate Load Test
Configuration:
- Users: 10 concurrent
- Files per User: 3
- File Size: 2MB each
- Total Data: 60MB
- Expected Duration: <60 seconds
Performance Thresholds:
- Max Duration: 60 seconds
- Max Memory: 500MB
- Max Avg Response Time: 2 seconds
- Min Success Rate: 90%
Use Case: Typical production load simulation, daily performance monitoring
Example Results:
=== Moderate Load Test Results ===
Total Duration: 47,892ms
Total Uploads: 30
Successful: 28
Failed: 2
Success Rate: 93.33%
Avg Response Time: 1,567.23ms
Max Response Time: 2,891ms
Avg Memory: 342.56MB
Max Memory: 467.89MB
3. Heavy Load Test
Configuration:
- Users: 20 concurrent
- Files per User: 5
- File Size: 5MB each
- Total Data: 500MB
- Expected Duration: <120 seconds
Performance Thresholds:
- Max Duration: 120 seconds
- Max Memory: 1GB (1024MB)
- Max Avg Response Time: 3 seconds
- Min Success Rate: 85%
Use Case: Peak traffic simulation, capacity planning
Example Results:
=== Heavy Load Test Results ===
Total Duration: 102,456ms
Total Uploads: 100
Successful: 87
Failed: 13
Success Rate: 87.00%
Avg Response Time: 2,734.12ms
Max Response Time: 4,567ms
Avg Memory: 723.45MB
Max Memory: 956.78MB
4. Stress Test
Configuration:
- Users: 50 concurrent
- Files per User: 2
- File Size: 1MB each
- Total Data: 100MB
- Expected Duration: <180 seconds
Performance Thresholds:
- Max Duration: 180 seconds
- Max Memory: 2GB (2048MB)
- Max Avg Response Time: 5 seconds
- Min Success Rate: 80%
Use Case: System limits discovery, failure mode analysis
Example Results:
=== Stress Test Results ===
Total Duration: 156,789ms
Total Uploads: 100
Successful: 82
Failed: 18
Success Rate: 82.00%
Avg Response Time: 4,234.56ms
Max Response Time: 7,891ms
Avg Memory: 1,456.78MB
Max Memory: 1,923.45MB
Total Errors: 18
5. Queue Management Test
Tests: Concurrent upload queue handling with proper limits
Validates:
- Maximum concurrent uploads respected (default: 3)
- Queue properly manages waiting uploads
- All uploads eventually complete
- No queue starvation or deadlocks
Configuration:
- Files: 10 files uploaded simultaneously
- Expected Max Concurrent: 3 uploads at any time
Example Output:
Queue States Captured: 18
Max Concurrent Uploads: 3
Final Completed: 10
Queue Properly Managed: ✅
6. Resource Cleanup Test
Tests: Memory cleanup after concurrent uploads complete
Validates:
- Memory properly released after uploads
- No memory leaks
- Garbage collection effective
- System returns to baseline state
Measurement Points:
- Baseline Memory: Before any uploads
- After Upload Memory: Immediately after all uploads complete
- After Cleanup Memory: After garbage collection
Expected Behavior:
- Memory after cleanup should be <50% of memory increase during uploads
Example Output:
Memory Usage:
Baseline: 45.23MB
After Uploads: 156.78MB (Δ +111.55MB)
After Cleanup: 52.34MB (Δ +7.11MB from baseline)
Cleanup Effectiveness: 93.6%
7. Error Recovery Test
Tests: System recovery from failures during concurrent uploads
Validates:
- Automatic retry on failures
- Failed uploads eventually succeed
- No corruption from partial failures
- Graceful degradation under stress
Simulation:
- Every 3rd chunk request fails
- System must retry and complete all uploads
Expected Behavior:
- All uploads complete successfully despite failures
- Retry logic handles failures transparently
8. Throughput Test
Tests: Sustained upload throughput measurement
Configuration:
- Files: 20 files
- File Size: 5MB each
- Total Data: 100MB
Metrics:
- Throughput: Total MB / Total Time (seconds)
- Expected Minimum: >1 MB/s on localhost
Example Output:
Throughput Test Results:
Total Data: 100MB
Total Duration: 67.89s
Throughput: 1.47 MB/s
Performance Metrics Explained
Duration Metrics
- Total Duration: Time from test start to all uploads complete
- Avg Response Time: Average time for single upload completion
- Max Response Time: Slowest single upload time
- Min Response Time: Fastest single upload time
Success Metrics
- Total Uploads: Number of attempted uploads
- Successful Uploads: Successfully completed uploads
- Failed Uploads: Uploads that failed even after retries
- Success Rate: Successful / Total (as percentage)
Resource Metrics
- Avg Memory: Average browser memory usage during test
- Max Memory: Peak browser memory usage
- Memory Delta: Difference between baseline and peak
Throughput Metrics
- Throughput (MB/s): Data uploaded per second
- Formula: Total MB / Total Duration (seconds)
- Expected Range: 1-10 MB/s depending on system
Understanding Test Results
Success Criteria
✅ PASS - All thresholds met:
- Duration within expected maximum
- Memory usage within limits
- Success rate above minimum
- Avg response time acceptable
⚠️ WARNING - Some thresholds exceeded:
- Review specific metrics
- Check server resources
- Analyze error patterns
❌ FAIL - Critical thresholds exceeded:
- System unable to handle load
- Investigate bottlenecks
- Scale resources or optimize code
Interpreting Results
High Success Rate (>95%)
- System handling load well
- Retry logic effective
- Infrastructure adequate
Moderate Success Rate (85-95%)
- Occasional failures acceptable
- Monitor error patterns
- May need optimization
Low Success Rate (<85%)
- System struggling under load
- Critical bottlenecks present
- Immediate action required
High Memory Usage (>75% of threshold)
- Potential memory leak
- Inefficient resource management
- Review memory cleanup logic
Slow Response Times (>75% of threshold)
- Server bottleneck
- Network congestion
- Database query optimization needed
Troubleshooting
Tests Timing Out
Symptoms:
- Load tests exceed timeout limits
- Uploads never complete
- Browser hangs or crashes
Solutions:
- Increase Test Timeout:
test('Heavy Load', async ({ browser }) => {
test.setTimeout(180000); // 3 minutes
// ... test code
});
- Reduce Load:
const LOAD_TEST_CONFIG = {
heavy: {
users: 10, // Reduced from 20
filesPerUser: 3, // Reduced from 5
fileSizeMB: 2, // Reduced from 5
expectedDuration: 90000 // Adjusted
}
};
- Check Server Resources:
# Monitor server resources during test
docker stats
# Increase Docker resources if needed
# In Docker Desktop: Settings → Resources
High Failure Rate
Symptoms:
- Success rate below threshold
- Many upload failures
- Timeout errors
Solutions:
- Check Server Logs:
# View PHP error logs
docker logs php
# View Nginx logs
docker logs nginx
- Increase Server Resources:
# Check current limits
docker exec php php -i | grep memory_limit
# Update php.ini or .env
UPLOAD_MAX_FILESIZE=100M
POST_MAX_SIZE=100M
MEMORY_LIMIT=512M
- Optimize Queue Configuration:
UPLOAD_PARALLEL_CHUNKS=5 # Increase concurrent chunks
UPLOAD_CHUNK_SIZE=1048576 # Increase chunk size to 1MB
Memory Issues
Symptoms:
- Browser out of memory errors
- System becomes unresponsive
- Test crashes
Solutions:
- Increase Browser Memory:
const browser = await chromium.launch({
args: [
'--max-old-space-size=4096', // 4GB Node heap
'--disable-dev-shm-usage' // Use /tmp instead of /dev/shm
]
});
- Reduce File Sizes:
const LOAD_TEST_CONFIG = {
heavy: {
fileSizeMB: 2, // Reduced from 5MB
}
};
- Run Tests Sequentially:
# Run one test at a time
npx playwright test concurrent-upload-load.spec.js --grep "Light Load"
npx playwright test concurrent-upload-load.spec.js --grep "Moderate Load"
# etc.
Network Errors
Symptoms:
- Connection refused errors
- Request timeouts
- SSL/TLS errors
Solutions:
- Verify Server Running:
# Check server status
docker ps
# Restart if needed
make down && make up
- Check SSL Certificates:
# Navigate to https://localhost in browser
# Accept self-signed certificate if prompted
- Increase Network Timeouts:
await page.goto('https://localhost/livecomponents/test/upload', {
timeout: 60000 // 60 seconds
});
CI/CD Integration
GitHub Actions
name: Load Tests
on:
schedule:
- cron: '0 2 * * *' # Daily at 2 AM
workflow_dispatch: # Manual trigger
jobs:
load-tests:
runs-on: ubuntu-latest
timeout-minutes: 30
steps:
- uses: actions/checkout@v3
- uses: actions/setup-node@v3
with:
node-version: '18'
- name: Install dependencies
run: npm ci
- name: Install Playwright
run: npx playwright install --with-deps chromium
- name: Start dev server
run: |
make up
sleep 10 # Wait for server to be ready
- name: Run Light Load Test
run: npx playwright test concurrent-upload-load.spec.js --grep "Light Load"
- name: Run Moderate Load Test
run: npx playwright test concurrent-upload-load.spec.js --grep "Moderate Load"
- name: Upload test results
if: always()
uses: actions/upload-artifact@v3
with:
name: load-test-results
path: test-results/
- name: Notify on failure
if: failure()
uses: actions/github-script@v6
with:
script: |
github.rest.issues.create({
owner: context.repo.owner,
repo: context.repo.repo,
title: 'Load Tests Failed',
body: 'Load tests failed. Check workflow run for details.'
});
Performance Regression Detection
# Run baseline test and save results
npm run test:load > baseline-results.txt
# After code changes, run again and compare
npm run test:load > current-results.txt
diff baseline-results.txt current-results.txt
# Automated regression check
npm run test:load:regression
Best Practices
1. Test Environment
- Dedicated Server: Use dedicated test server to avoid interference
- Consistent Resources: Same hardware/container specs for reproducibility
- Isolated Network: Minimize network variability
- Clean State: Reset database/cache between test runs
2. Test Execution
- Start Small: Begin with light load, increase gradually
- Monitor Resources: Watch server CPU, memory, disk during tests
- Sequential Heavy Tests: Don't run heavy/stress tests in parallel
- Adequate Timeouts: Set realistic timeouts based on load
3. Result Analysis
- Track Trends: Compare results over time, not single runs
- Statistical Significance: Multiple runs for reliable metrics
- Identify Patterns: Look for consistent failure patterns
- Correlate Metrics: Memory spikes with response time increases
4. Continuous Improvement
- Regular Baselines: Update baselines as system improves
- Performance Budget: Define and enforce performance budgets
- Proactive Monitoring: Catch regressions early
- Capacity Planning: Use load test data for scaling decisions
Performance Optimization Tips
Server-Side
- Increase Worker Processes:
# nginx.conf
worker_processes auto;
worker_connections 2048;
- Optimize PHP-FPM:
; php-fpm pool config
pm = dynamic
pm.max_children = 50
pm.start_servers = 10
pm.min_spare_servers = 5
pm.max_spare_servers = 20
- Enable Caching:
// Cache upload sessions
$this->cache->set(
"upload-session:{$sessionId}",
$sessionData,
ttl: 3600 // 1 hour
);
Client-Side
- Optimize Chunk Size:
// Larger chunks = fewer requests but more memory
const CHUNK_SIZE = 1024 * 1024; // 1MB (default: 512KB)
- Increase Parallel Uploads:
// More parallelism = faster completion but more memory
const MAX_PARALLEL_CHUNKS = 5; // (default: 3)
- Implement Request Pooling:
// Reuse HTTP connections
const keepAlive = true;
const maxSockets = 10;
Resources
Support
For issues or questions:
- Review this documentation and troubleshooting section
- Check server logs and resource usage
- Analyze test results for patterns
- Consult LiveComponents upload documentation
- Create GitHub issue with full test output