- Add comprehensive health check system with multiple endpoints - Add Prometheus metrics endpoint - Add production logging configurations (5 strategies) - Add complete deployment documentation suite: * QUICKSTART.md - 30-minute deployment guide * DEPLOYMENT_CHECKLIST.md - Printable verification checklist * DEPLOYMENT_WORKFLOW.md - Complete deployment lifecycle * PRODUCTION_DEPLOYMENT.md - Comprehensive technical reference * production-logging.md - Logging configuration guide * ANSIBLE_DEPLOYMENT.md - Infrastructure as Code automation * README.md - Navigation hub * DEPLOYMENT_SUMMARY.md - Executive summary - Add deployment scripts and automation - Add DEPLOYMENT_PLAN.md - Concrete plan for immediate deployment - Update README with production-ready features All production infrastructure is now complete and ready for deployment.
652 lines
15 KiB
Markdown
652 lines
15 KiB
Markdown
# LiveComponents Concurrent Upload Load Tests
|
|
|
|
Performance and scalability testing for the LiveComponents chunked upload system under high concurrent load.
|
|
|
|
## Overview
|
|
|
|
This test suite validates system behavior under various load conditions:
|
|
- **Light Load**: 5 concurrent users, 2 files each (1MB) - Baseline performance
|
|
- **Moderate Load**: 10 concurrent users, 3 files each (2MB) - Typical production load
|
|
- **Heavy Load**: 20 concurrent users, 5 files each (5MB) - Peak traffic simulation
|
|
- **Stress Test**: 50 concurrent users, 2 files each (1MB) - System limits discovery
|
|
|
|
## Quick Start
|
|
|
|
### Prerequisites
|
|
|
|
```bash
|
|
# Ensure Playwright is installed
|
|
npm install
|
|
|
|
# Install Chromium browser
|
|
npx playwright install chromium
|
|
|
|
# Ensure development server is running with adequate resources
|
|
make up
|
|
|
|
# For heavy/stress tests, ensure server has sufficient resources:
|
|
# - At least 4GB RAM available
|
|
# - At least 2 CPU cores
|
|
# - Adequate disk I/O capacity
|
|
```
|
|
|
|
### Running Load Tests
|
|
|
|
```bash
|
|
# Run all load tests (WARNING: Resource intensive!)
|
|
npm run test:load
|
|
|
|
# Run specific load test
|
|
npx playwright test concurrent-upload-load.spec.js --grep "Light Load"
|
|
|
|
# Run with visible browser (for debugging)
|
|
npm run test:load:headed
|
|
|
|
# Run in headless mode (recommended for CI/CD)
|
|
npm run test:load
|
|
```
|
|
|
|
## Load Test Scenarios
|
|
|
|
### 1. Light Load Test
|
|
|
|
**Configuration:**
|
|
- **Users**: 5 concurrent
|
|
- **Files per User**: 2
|
|
- **File Size**: 1MB each
|
|
- **Total Data**: 10MB
|
|
- **Expected Duration**: <30 seconds
|
|
|
|
**Performance Thresholds:**
|
|
- **Max Duration**: 30 seconds
|
|
- **Max Memory**: 200MB
|
|
- **Max Avg Response Time**: 1 second
|
|
- **Min Success Rate**: 95%
|
|
|
|
**Use Case:** Baseline performance validation, continuous integration tests
|
|
|
|
**Example Results:**
|
|
```
|
|
=== Light Load Test Results ===
|
|
Total Duration: 18,234ms
|
|
Total Uploads: 10
|
|
Successful: 10
|
|
Failed: 0
|
|
Success Rate: 100.00%
|
|
Avg Response Time: 823.45ms
|
|
Max Response Time: 1,452ms
|
|
Avg Memory: 125.34MB
|
|
Max Memory: 178.21MB
|
|
```
|
|
|
|
### 2. Moderate Load Test
|
|
|
|
**Configuration:**
|
|
- **Users**: 10 concurrent
|
|
- **Files per User**: 3
|
|
- **File Size**: 2MB each
|
|
- **Total Data**: 60MB
|
|
- **Expected Duration**: <60 seconds
|
|
|
|
**Performance Thresholds:**
|
|
- **Max Duration**: 60 seconds
|
|
- **Max Memory**: 500MB
|
|
- **Max Avg Response Time**: 2 seconds
|
|
- **Min Success Rate**: 90%
|
|
|
|
**Use Case:** Typical production load simulation, daily performance monitoring
|
|
|
|
**Example Results:**
|
|
```
|
|
=== Moderate Load Test Results ===
|
|
Total Duration: 47,892ms
|
|
Total Uploads: 30
|
|
Successful: 28
|
|
Failed: 2
|
|
Success Rate: 93.33%
|
|
Avg Response Time: 1,567.23ms
|
|
Max Response Time: 2,891ms
|
|
Avg Memory: 342.56MB
|
|
Max Memory: 467.89MB
|
|
```
|
|
|
|
### 3. Heavy Load Test
|
|
|
|
**Configuration:**
|
|
- **Users**: 20 concurrent
|
|
- **Files per User**: 5
|
|
- **File Size**: 5MB each
|
|
- **Total Data**: 500MB
|
|
- **Expected Duration**: <120 seconds
|
|
|
|
**Performance Thresholds:**
|
|
- **Max Duration**: 120 seconds
|
|
- **Max Memory**: 1GB (1024MB)
|
|
- **Max Avg Response Time**: 3 seconds
|
|
- **Min Success Rate**: 85%
|
|
|
|
**Use Case:** Peak traffic simulation, capacity planning
|
|
|
|
**Example Results:**
|
|
```
|
|
=== Heavy Load Test Results ===
|
|
Total Duration: 102,456ms
|
|
Total Uploads: 100
|
|
Successful: 87
|
|
Failed: 13
|
|
Success Rate: 87.00%
|
|
Avg Response Time: 2,734.12ms
|
|
Max Response Time: 4,567ms
|
|
Avg Memory: 723.45MB
|
|
Max Memory: 956.78MB
|
|
```
|
|
|
|
### 4. Stress Test
|
|
|
|
**Configuration:**
|
|
- **Users**: 50 concurrent
|
|
- **Files per User**: 2
|
|
- **File Size**: 1MB each
|
|
- **Total Data**: 100MB
|
|
- **Expected Duration**: <180 seconds
|
|
|
|
**Performance Thresholds:**
|
|
- **Max Duration**: 180 seconds
|
|
- **Max Memory**: 2GB (2048MB)
|
|
- **Max Avg Response Time**: 5 seconds
|
|
- **Min Success Rate**: 80%
|
|
|
|
**Use Case:** System limits discovery, failure mode analysis
|
|
|
|
**Example Results:**
|
|
```
|
|
=== Stress Test Results ===
|
|
Total Duration: 156,789ms
|
|
Total Uploads: 100
|
|
Successful: 82
|
|
Failed: 18
|
|
Success Rate: 82.00%
|
|
Avg Response Time: 4,234.56ms
|
|
Max Response Time: 7,891ms
|
|
Avg Memory: 1,456.78MB
|
|
Max Memory: 1,923.45MB
|
|
Total Errors: 18
|
|
```
|
|
|
|
### 5. Queue Management Test
|
|
|
|
**Tests:** Concurrent upload queue handling with proper limits
|
|
|
|
**Validates:**
|
|
- Maximum concurrent uploads respected (default: 3)
|
|
- Queue properly manages waiting uploads
|
|
- All uploads eventually complete
|
|
- No queue starvation or deadlocks
|
|
|
|
**Configuration:**
|
|
- **Files**: 10 files uploaded simultaneously
|
|
- **Expected Max Concurrent**: 3 uploads at any time
|
|
|
|
**Example Output:**
|
|
```
|
|
Queue States Captured: 18
|
|
Max Concurrent Uploads: 3
|
|
Final Completed: 10
|
|
Queue Properly Managed: ✅
|
|
```
|
|
|
|
### 6. Resource Cleanup Test
|
|
|
|
**Tests:** Memory cleanup after concurrent uploads complete
|
|
|
|
**Validates:**
|
|
- Memory properly released after uploads
|
|
- No memory leaks
|
|
- Garbage collection effective
|
|
- System returns to baseline state
|
|
|
|
**Measurement Points:**
|
|
- **Baseline Memory**: Before any uploads
|
|
- **After Upload Memory**: Immediately after all uploads complete
|
|
- **After Cleanup Memory**: After garbage collection
|
|
|
|
**Expected Behavior:**
|
|
- Memory after cleanup should be <50% of memory increase during uploads
|
|
|
|
**Example Output:**
|
|
```
|
|
Memory Usage:
|
|
Baseline: 45.23MB
|
|
After Uploads: 156.78MB (Δ +111.55MB)
|
|
After Cleanup: 52.34MB (Δ +7.11MB from baseline)
|
|
Cleanup Effectiveness: 93.6%
|
|
```
|
|
|
|
### 7. Error Recovery Test
|
|
|
|
**Tests:** System recovery from failures during concurrent uploads
|
|
|
|
**Validates:**
|
|
- Automatic retry on failures
|
|
- Failed uploads eventually succeed
|
|
- No corruption from partial failures
|
|
- Graceful degradation under stress
|
|
|
|
**Simulation:**
|
|
- Every 3rd chunk request fails
|
|
- System must retry and complete all uploads
|
|
|
|
**Expected Behavior:**
|
|
- All uploads complete successfully despite failures
|
|
- Retry logic handles failures transparently
|
|
|
|
### 8. Throughput Test
|
|
|
|
**Tests:** Sustained upload throughput measurement
|
|
|
|
**Configuration:**
|
|
- **Files**: 20 files
|
|
- **File Size**: 5MB each
|
|
- **Total Data**: 100MB
|
|
|
|
**Metrics:**
|
|
- **Throughput**: Total MB / Total Time (seconds)
|
|
- **Expected Minimum**: >1 MB/s on localhost
|
|
|
|
**Example Output:**
|
|
```
|
|
Throughput Test Results:
|
|
Total Data: 100MB
|
|
Total Duration: 67.89s
|
|
Throughput: 1.47 MB/s
|
|
```
|
|
|
|
## Performance Metrics Explained
|
|
|
|
### Duration Metrics
|
|
|
|
- **Total Duration**: Time from test start to all uploads complete
|
|
- **Avg Response Time**: Average time for single upload completion
|
|
- **Max Response Time**: Slowest single upload time
|
|
- **Min Response Time**: Fastest single upload time
|
|
|
|
### Success Metrics
|
|
|
|
- **Total Uploads**: Number of attempted uploads
|
|
- **Successful Uploads**: Successfully completed uploads
|
|
- **Failed Uploads**: Uploads that failed even after retries
|
|
- **Success Rate**: Successful / Total (as percentage)
|
|
|
|
### Resource Metrics
|
|
|
|
- **Avg Memory**: Average browser memory usage during test
|
|
- **Max Memory**: Peak browser memory usage
|
|
- **Memory Delta**: Difference between baseline and peak
|
|
|
|
### Throughput Metrics
|
|
|
|
- **Throughput (MB/s)**: Data uploaded per second
|
|
- **Formula**: Total MB / Total Duration (seconds)
|
|
- **Expected Range**: 1-10 MB/s depending on system
|
|
|
|
## Understanding Test Results
|
|
|
|
### Success Criteria
|
|
|
|
✅ **PASS** - All thresholds met:
|
|
- Duration within expected maximum
|
|
- Memory usage within limits
|
|
- Success rate above minimum
|
|
- Avg response time acceptable
|
|
|
|
⚠️ **WARNING** - Some thresholds exceeded:
|
|
- Review specific metrics
|
|
- Check server resources
|
|
- Analyze error patterns
|
|
|
|
❌ **FAIL** - Critical thresholds exceeded:
|
|
- System unable to handle load
|
|
- Investigate bottlenecks
|
|
- Scale resources or optimize code
|
|
|
|
### Interpreting Results
|
|
|
|
**High Success Rate (>95%)**
|
|
- System handling load well
|
|
- Retry logic effective
|
|
- Infrastructure adequate
|
|
|
|
**Moderate Success Rate (85-95%)**
|
|
- Occasional failures acceptable
|
|
- Monitor error patterns
|
|
- May need optimization
|
|
|
|
**Low Success Rate (<85%)**
|
|
- System struggling under load
|
|
- Critical bottlenecks present
|
|
- Immediate action required
|
|
|
|
**High Memory Usage (>75% of threshold)**
|
|
- Potential memory leak
|
|
- Inefficient resource management
|
|
- Review memory cleanup logic
|
|
|
|
**Slow Response Times (>75% of threshold)**
|
|
- Server bottleneck
|
|
- Network congestion
|
|
- Database query optimization needed
|
|
|
|
## Troubleshooting
|
|
|
|
### Tests Timing Out
|
|
|
|
**Symptoms:**
|
|
- Load tests exceed timeout limits
|
|
- Uploads never complete
|
|
- Browser hangs or crashes
|
|
|
|
**Solutions:**
|
|
|
|
1. **Increase Test Timeout:**
|
|
```javascript
|
|
test('Heavy Load', async ({ browser }) => {
|
|
test.setTimeout(180000); // 3 minutes
|
|
// ... test code
|
|
});
|
|
```
|
|
|
|
2. **Reduce Load:**
|
|
```javascript
|
|
const LOAD_TEST_CONFIG = {
|
|
heavy: {
|
|
users: 10, // Reduced from 20
|
|
filesPerUser: 3, // Reduced from 5
|
|
fileSizeMB: 2, // Reduced from 5
|
|
expectedDuration: 90000 // Adjusted
|
|
}
|
|
};
|
|
```
|
|
|
|
3. **Check Server Resources:**
|
|
```bash
|
|
# Monitor server resources during test
|
|
docker stats
|
|
|
|
# Increase Docker resources if needed
|
|
# In Docker Desktop: Settings → Resources
|
|
```
|
|
|
|
### High Failure Rate
|
|
|
|
**Symptoms:**
|
|
- Success rate below threshold
|
|
- Many upload failures
|
|
- Timeout errors
|
|
|
|
**Solutions:**
|
|
|
|
1. **Check Server Logs:**
|
|
```bash
|
|
# View PHP error logs
|
|
docker logs php
|
|
|
|
# View Nginx logs
|
|
docker logs nginx
|
|
```
|
|
|
|
2. **Increase Server Resources:**
|
|
```bash
|
|
# Check current limits
|
|
docker exec php php -i | grep memory_limit
|
|
|
|
# Update php.ini or .env
|
|
UPLOAD_MAX_FILESIZE=100M
|
|
POST_MAX_SIZE=100M
|
|
MEMORY_LIMIT=512M
|
|
```
|
|
|
|
3. **Optimize Queue Configuration:**
|
|
```env
|
|
UPLOAD_PARALLEL_CHUNKS=5 # Increase concurrent chunks
|
|
UPLOAD_CHUNK_SIZE=1048576 # Increase chunk size to 1MB
|
|
```
|
|
|
|
### Memory Issues
|
|
|
|
**Symptoms:**
|
|
- Browser out of memory errors
|
|
- System becomes unresponsive
|
|
- Test crashes
|
|
|
|
**Solutions:**
|
|
|
|
1. **Increase Browser Memory:**
|
|
```javascript
|
|
const browser = await chromium.launch({
|
|
args: [
|
|
'--max-old-space-size=4096', // 4GB Node heap
|
|
'--disable-dev-shm-usage' // Use /tmp instead of /dev/shm
|
|
]
|
|
});
|
|
```
|
|
|
|
2. **Reduce File Sizes:**
|
|
```javascript
|
|
const LOAD_TEST_CONFIG = {
|
|
heavy: {
|
|
fileSizeMB: 2, // Reduced from 5MB
|
|
}
|
|
};
|
|
```
|
|
|
|
3. **Run Tests Sequentially:**
|
|
```bash
|
|
# Run one test at a time
|
|
npx playwright test concurrent-upload-load.spec.js --grep "Light Load"
|
|
npx playwright test concurrent-upload-load.spec.js --grep "Moderate Load"
|
|
# etc.
|
|
```
|
|
|
|
### Network Errors
|
|
|
|
**Symptoms:**
|
|
- Connection refused errors
|
|
- Request timeouts
|
|
- SSL/TLS errors
|
|
|
|
**Solutions:**
|
|
|
|
1. **Verify Server Running:**
|
|
```bash
|
|
# Check server status
|
|
docker ps
|
|
|
|
# Restart if needed
|
|
make down && make up
|
|
```
|
|
|
|
2. **Check SSL Certificates:**
|
|
```bash
|
|
# Navigate to https://localhost in browser
|
|
# Accept self-signed certificate if prompted
|
|
```
|
|
|
|
3. **Increase Network Timeouts:**
|
|
```javascript
|
|
await page.goto('https://localhost/livecomponents/test/upload', {
|
|
timeout: 60000 // 60 seconds
|
|
});
|
|
```
|
|
|
|
## CI/CD Integration
|
|
|
|
### GitHub Actions
|
|
|
|
```yaml
|
|
name: Load Tests
|
|
|
|
on:
|
|
schedule:
|
|
- cron: '0 2 * * *' # Daily at 2 AM
|
|
workflow_dispatch: # Manual trigger
|
|
|
|
jobs:
|
|
load-tests:
|
|
runs-on: ubuntu-latest
|
|
timeout-minutes: 30
|
|
|
|
steps:
|
|
- uses: actions/checkout@v3
|
|
|
|
- uses: actions/setup-node@v3
|
|
with:
|
|
node-version: '18'
|
|
|
|
- name: Install dependencies
|
|
run: npm ci
|
|
|
|
- name: Install Playwright
|
|
run: npx playwright install --with-deps chromium
|
|
|
|
- name: Start dev server
|
|
run: |
|
|
make up
|
|
sleep 10 # Wait for server to be ready
|
|
|
|
- name: Run Light Load Test
|
|
run: npx playwright test concurrent-upload-load.spec.js --grep "Light Load"
|
|
|
|
- name: Run Moderate Load Test
|
|
run: npx playwright test concurrent-upload-load.spec.js --grep "Moderate Load"
|
|
|
|
- name: Upload test results
|
|
if: always()
|
|
uses: actions/upload-artifact@v3
|
|
with:
|
|
name: load-test-results
|
|
path: test-results/
|
|
|
|
- name: Notify on failure
|
|
if: failure()
|
|
uses: actions/github-script@v6
|
|
with:
|
|
script: |
|
|
github.rest.issues.create({
|
|
owner: context.repo.owner,
|
|
repo: context.repo.repo,
|
|
title: 'Load Tests Failed',
|
|
body: 'Load tests failed. Check workflow run for details.'
|
|
});
|
|
```
|
|
|
|
### Performance Regression Detection
|
|
|
|
```bash
|
|
# Run baseline test and save results
|
|
npm run test:load > baseline-results.txt
|
|
|
|
# After code changes, run again and compare
|
|
npm run test:load > current-results.txt
|
|
diff baseline-results.txt current-results.txt
|
|
|
|
# Automated regression check
|
|
npm run test:load:regression
|
|
```
|
|
|
|
## Best Practices
|
|
|
|
### 1. Test Environment
|
|
|
|
- **Dedicated Server**: Use dedicated test server to avoid interference
|
|
- **Consistent Resources**: Same hardware/container specs for reproducibility
|
|
- **Isolated Network**: Minimize network variability
|
|
- **Clean State**: Reset database/cache between test runs
|
|
|
|
### 2. Test Execution
|
|
|
|
- **Start Small**: Begin with light load, increase gradually
|
|
- **Monitor Resources**: Watch server CPU, memory, disk during tests
|
|
- **Sequential Heavy Tests**: Don't run heavy/stress tests in parallel
|
|
- **Adequate Timeouts**: Set realistic timeouts based on load
|
|
|
|
### 3. Result Analysis
|
|
|
|
- **Track Trends**: Compare results over time, not single runs
|
|
- **Statistical Significance**: Multiple runs for reliable metrics
|
|
- **Identify Patterns**: Look for consistent failure patterns
|
|
- **Correlate Metrics**: Memory spikes with response time increases
|
|
|
|
### 4. Continuous Improvement
|
|
|
|
- **Regular Baselines**: Update baselines as system improves
|
|
- **Performance Budget**: Define and enforce performance budgets
|
|
- **Proactive Monitoring**: Catch regressions early
|
|
- **Capacity Planning**: Use load test data for scaling decisions
|
|
|
|
## Performance Optimization Tips
|
|
|
|
### Server-Side
|
|
|
|
1. **Increase Worker Processes:**
|
|
```nginx
|
|
# nginx.conf
|
|
worker_processes auto;
|
|
worker_connections 2048;
|
|
```
|
|
|
|
2. **Optimize PHP-FPM:**
|
|
```ini
|
|
; php-fpm pool config
|
|
pm = dynamic
|
|
pm.max_children = 50
|
|
pm.start_servers = 10
|
|
pm.min_spare_servers = 5
|
|
pm.max_spare_servers = 20
|
|
```
|
|
|
|
3. **Enable Caching:**
|
|
```php
|
|
// Cache upload sessions
|
|
$this->cache->set(
|
|
"upload-session:{$sessionId}",
|
|
$sessionData,
|
|
ttl: 3600 // 1 hour
|
|
);
|
|
```
|
|
|
|
### Client-Side
|
|
|
|
1. **Optimize Chunk Size:**
|
|
```javascript
|
|
// Larger chunks = fewer requests but more memory
|
|
const CHUNK_SIZE = 1024 * 1024; // 1MB (default: 512KB)
|
|
```
|
|
|
|
2. **Increase Parallel Uploads:**
|
|
```javascript
|
|
// More parallelism = faster completion but more memory
|
|
const MAX_PARALLEL_CHUNKS = 5; // (default: 3)
|
|
```
|
|
|
|
3. **Implement Request Pooling:**
|
|
```javascript
|
|
// Reuse HTTP connections
|
|
const keepAlive = true;
|
|
const maxSockets = 10;
|
|
```
|
|
|
|
## Resources
|
|
|
|
- [Playwright Performance Testing](https://playwright.dev/docs/test-advanced#measuring-performance)
|
|
- [Load Testing Best Practices](https://martinfowler.com/articles/practical-test-pyramid.html#LoadTesting)
|
|
- [System Scalability Patterns](https://docs.microsoft.com/en-us/azure/architecture/patterns/category/performance-scalability)
|
|
|
|
## Support
|
|
|
|
For issues or questions:
|
|
1. Review this documentation and troubleshooting section
|
|
2. Check server logs and resource usage
|
|
3. Analyze test results for patterns
|
|
4. Consult LiveComponents upload documentation
|
|
5. Create GitHub issue with full test output
|