-
Notifications
You must be signed in to change notification settings - Fork 8
Open
Labels
enhancementNew feature or requestNew feature or requestgood first issueGood for newcomersGood for newcomershelp wantedExtra attention is neededExtra attention is needed
Description
Feature: Performance Optimization for Large-Scale Generation
Add performance optimizations for generating large datasets (10k+ records) to make Interface-Forge suitable for big data testing scenarios.
Problem Statement
When generating large numbers of records (10,000+), the current implementation can face memory and performance challenges:
- Memory usage grows linearly with batch size
- No built-in progress tracking for long operations
- CPU-bound operations block the event loop
- No way to process data in chunks
Proposed Features
1. Streaming/Chunking API
// Generate data in manageable chunks
const stream = factory.stream({
chunkSize: 1000,
total: 100000
});
stream.on('data', async (chunk: T[]) => {
// Process each chunk (e.g., bulk insert to DB)
await db.batchInsert(chunk);
});
stream.on('end', () => {
console.log('Generation complete');
});
stream.on('error', (error) => {
console.error('Generation failed:', error);
});2. Memory-Efficient Generation
- Implement garbage collection hints between chunks
- Option to generate and immediately persist without holding in memory
- Lazy evaluation for large nested structures
3. Parallel Generation with Worker Threads
const factory = new Factory<User>(/* ... */, {
parallel: {
enabled: true,
workers: 4 // Number of worker threads
}
});
// Utilizes multiple CPU cores
const users = await factory.batchAsync(50000);4. Progress Callbacks
const users = await factory.batchAsync(100000, {
onProgress: (current, total, percentage) => {
console.log(`Generated ${current}/${total} (${percentage}%)`);
},
progressInterval: 1000 // Report every 1000 items
});5. Benchmarking Suite
- Add performance benchmarks to CI
- Track generation speed over time
- Memory usage profiling
- Comparison with other factory libraries
Implementation Details
-
Streaming Implementation
- Use Node.js streams API
- Support backpressure handling
- Allow custom transform streams
-
Memory Management
- Implement chunk-based generation
- Clear internal caches between chunks
- Option to disable caching for large operations
-
Worker Thread Support
- Serialize factory configuration to workers
- Distribute work evenly across threads
- Merge results efficiently
-
Progress Tracking
- Non-blocking progress updates
- Configurable update frequency
- ETA calculation
Performance Goals
- Generate 1M simple records in < 30 seconds
- Memory usage should plateau (not grow linearly)
- Support concurrent generation without blocking
- Maintain type safety throughout
Example Use Cases
// Database seeding
await UserFactory.stream({ chunkSize: 5000 })
.pipe(new DatabaseWriter(db))
.on('finish', () => console.log('Database seeded'));
// CSV export
const csvStream = ProductFactory.stream({ chunkSize: 1000 })
.pipe(new CSVTransform())
.pipe(fs.createWriteStream('products.csv'));
// Real-time generation API
app.get('/generate/:count', async (req, res) => {
res.setHeader('Content-Type', 'application/x-ndjson');
factory.stream({
chunkSize: 100,
total: req.params.count
})
.pipe(new JSONLinesTransform())
.pipe(res);
});Testing Requirements
- Benchmark tests for various data sizes
- Memory leak tests
- Worker thread stability tests
- Stream backpressure handling tests
- Progress accuracy tests
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requestgood first issueGood for newcomersGood for newcomershelp wantedExtra attention is neededExtra attention is needed