ML Acceleration and Performance Optimization
Overview
The heterodyne-analysis package provides machine learning acceleration for parameter optimization through intelligent initial value prediction. This guide covers ML acceleration features, Numba JIT compilation, and performance optimization strategies for analyzing XPCS data efficiently.
ML Acceleration
Purpose
ML acceleration improves optimization convergence by:
Smart parameter initialization: Predicting good starting values based on experimental conditions
Faster convergence: 2-5x reduction in optimization time through better initial guesses
Continuous learning: Models improve as you analyze more datasets
How It Works
The ML system uses ensemble learning to predict optimal parameters based on:
Wavevector magnitude (q)
Time step (dt)
Geometric parameters (gap size, frame range)
Historical optimization results
Python 3.12+ Compatibility
The heterodyne package requires Python 3.12 or later for optimal performance:
# Check Python version
python --version # Should show 3.12 or higher
# Create Python 3.12 environment
conda create -n heterodyne python=3.12
conda activate heterodyne
# Install package
pip install heterodyne-analysis[all]
Compatibility Notes:
Numba JIT: Requires Python 3.12+ for latest performance optimizations
Type hints: Uses modern Python type annotation features
Dependencies: NumPy 1.24+, SciPy 1.9+ optimized for Python 3.12+
Numba JIT Compilation
Performance Benefits
Numba provides 3-5x speedup for core computational kernels through just-in-time (JIT) compilation:
| Operation | Pure Python | With Numba JIT | Speedup | |———–|————-|—————-|———| | 100 points | 2.3 s | 0.7 s | 3.3× | | 500 points | 12.1 s | 3.2 s | 3.8× | | 1000 points | 45.2 s | 8.9 s | 5.1× | | 5000 points | 892 s | 178 s | 5.0× |
Automatic JIT Compilation
Numba JIT is enabled by default for all optimized kernels:
from heterodyne.analysis.core import HeterodyneAnalysisCore
import json
# Load configuration
with open("my_config.json", 'r') as f:
config = json.load(f)
# JIT compilation happens automatically on first call
core = HeterodyneAnalysisCore(config)
# First run includes compilation overhead (~1-2s)
# Subsequent runs use compiled code (fast)
Manual Kernel Warmup
For production workflows, pre-compile kernels to eliminate JIT overhead:
from heterodyne.core.kernels import warmup_numba_kernels
# Warmup all computational kernels (one-time cost)
warmup_results = warmup_numba_kernels()
print(f"Kernels compiled in {warmup_results['total_warmup_time']:.3f}s")
# Kernels compiled in 1.245s
# Now all subsequent analyses run at full speed immediately
Configuration Control
Control Numba behavior in your configuration file:
{
"performance_settings": {
"numba_optimization": {
"enable_numba": true,
"warmup_numba": true,
"parallel_numba": true,
"cache_numba": true,
"stability_enhancements": {
"enable_kernel_warmup": true,
"warmup_iterations": 5,
"optimize_memory_layout": true,
"environment_optimization": {
"auto_configure": true,
"max_threads": 4
}
}
}
}
}
Memory Usage Considerations
Large Dataset Handling
For datasets exceeding 4 million data points, careful memory management is essential:
Dataset Size Guidelines:
<100k points: No special configuration needed
100k-1M points: Enable time subsampling (4x-8x reduction)
1M-4M points: Enable aggressive subsampling (16x-32x reduction)
>4M points: Not recommended - over-subsampling effects reduce accuracy
Memory Optimization Strategies
1. Subsampling Configuration:
{
"optimization_config": {
"classical_optimization": {
"subsampling": {
"enabled": true,
"max_data_points": 100000,
"time_subsample_factor": 4,
"angle_subsample_factor": 2
}
}
}
}
Impact:
2000×2000 dataset with 2 angles = 8M points
With factors (4, 2): 8M → ~250k points (32x reduction)
Speedup: 20-50x faster with <10% chi-squared degradation
2. Ellipsoidal Optimization Memory Limit:
For robust optimization with very large datasets:
from heterodyne.optimization.robust import RobustHeterodyneOptimizer
# Configure memory limit for ellipsoidal method
config['optimization_config']['robust_optimization']['memory_limit'] = 0.9 # 90% limit
robust = RobustHeterodyneOptimizer(core, config)
results = robust.optimize(
phi_angles=phi_angles,
c2_experimental=c2_data,
method="ellipsoidal"
)
3. Low Memory Mode:
For memory-constrained environments:
{
"performance_settings": {
"memory_management": {
"low_memory_mode": true,
"garbage_collection_frequency": 5,
"memory_monitoring": true
},
"caching": {
"cache_size_limit_mb": 500
}
}
}
Memory Benchmarks
| Dataset Size | Memory Usage (Default) | Memory Usage (Optimized) | Improvement | |————–|————————|————————–|————-| | 4M points | 85% RAM | 75% RAM | 12% reduction | | 8M points | Memory error | 90% limit success | Enabled large datasets |
Performance Monitoring
Built-in Performance Tracking
Monitor optimization performance:
from heterodyne.core.config import performance_monitor
# Time-specific operations
with performance_monitor.time_function("data_loading"):
c2_data, time_length, phi_angles, num_angles = load_xpcs_data(config_file)
with performance_monitor.time_function("classical_optimization"):
params, results = classical.run_classical_optimization_optimized(
phi_angles=phi_angles,
c2_experimental=c2_data
)
# Get performance summary
stats = performance_monitor.get_timing_summary()
print(f"Data loading: {stats['data_loading']:.3f}s")
print(f"Optimization: {stats['classical_optimization']:.3f}s")
Stable Benchmarking
For research and reproducibility:
from heterodyne.core.profiler import stable_benchmark, adaptive_stable_benchmark
def my_optimization():
return classical.run_classical_optimization_optimized(
phi_angles=phi_angles,
c2_experimental=c2_data
)
# Standard benchmarking with outlier filtering
results = stable_benchmark(my_optimization, warmup_runs=5, measurement_runs=15)
cv = results['std'] / results['mean']
print(f"Mean time: {results['mean']:.4f}s ± {cv:.3f} CV")
# Adaptive benchmarking (automatically finds optimal measurement count)
results = adaptive_stable_benchmark(my_optimization, target_cv=0.10)
print(f"Achieved {results['cv']:.3f} CV in {results['total_runs']} runs")
Performance Stability Achievements:
97% reduction in chi-squared calculation variability (CV < 0.31)
Balanced optimization settings for numerical stability
Conservative threading (max 4 cores) for consistent results
Environment Optimization
Thread Configuration
Set optimal threading for your hardware:
# Recommended for most systems
export OMP_NUM_THREADS=4
export NUMBA_NUM_THREADS=4
# For high-core systems (>16 cores)
export OMP_NUM_THREADS=8
export NUMBA_NUM_THREADS=8
# Run analysis
heterodyne --config my_config.json --method classical
Performance Mode
Enable high-performance mode:
# Set environment variable
export HETERODYNE_PERFORMANCE_MODE=1
# With Numba parallel optimization
export NUMBA_ENABLE_CUDASIM=0
export NUMBA_DISABLE_JIT=0
# Run analysis
heterodyne --config my_config.json --method all
Troubleshooting Performance Issues
Slow Optimization
Problem: Optimization takes too long (>30 minutes)
Solutions:
Enable subsampling:
{ "optimization_config": { "classical_optimization": { "subsampling": { "enabled": true, "time_subsample_factor": 4, "angle_subsample_factor": 2 } } } }
Reduce active parameters: Start with 4-6 essential parameters instead of all 14:
{ "initial_parameters": { "active_parameters": ["D0_ref", "D0_sample", "v0", "f0"] } }
Use angle filtering:
{ "optimization_config": { "angle_filtering": { "enabled": true, "target_ranges": [ {"min_angle": -10.0, "max_angle": 10.0}, {"min_angle": 170.0, "max_angle": 190.0} ] } } }
Memory Errors
Problem: “MemoryError” or system slowdown
Solutions:
Enable aggressive subsampling:
Increase
time_subsample_factorto 8 or 16Increase
angle_subsample_factorto 4Lower
max_data_pointsthreshold
Use low memory mode: Set
low_memory_mode: truein performance settingsReduce cache size: Lower
cache_size_limit_mbto 500 or less
Numba Compilation Errors
Problem: Numba fails to compile kernels
Solutions:
Verify Python version:
python --version # Should be 3.12+
Update Numba:
pip install --upgrade numba
Disable Numba temporarily:
{ "performance_settings": { "numba_optimization": { "enable_numba": false } } }
Best Practices
For Maximum Performance
Use Python 3.12+: Latest optimizations and type hints
Enable Numba JIT: 3-5x speedup for core calculations
Warmup kernels: Pre-compile for production workflows
Configure threading: Match OMP_NUM_THREADS to your CPU
Enable subsampling: For datasets >100k points
Use angle filtering: 3-5x speedup for anisotropic analysis
For Maximum Accuracy
Disable subsampling: For small datasets (<100k points)
Use all parameters: Optimize all 14 parameters when data quality permits
Multiple optimization methods: Run both classical and robust
Increase iteration limits: For complex parameter spaces
Higher precision: Use
float64data type
For Large Datasets (>1M points)
Mandatory subsampling: Set
time_subsample_factor≥ 4Lower memory limits: Set
cache_size_limit_mb≤ 1000Conservative threading: Use 4-8 threads maximum
Monitor memory: Enable
memory_monitoring: trueConsider robust ellipsoidal: Uses 90% memory limit
Performance Reference
Typical Analysis Times
| Dataset Size | Configuration | Time (Classical) | Time (Robust) | |————–|—————|——————|—————| | 100k points | Default | 2-5 min | 5-10 min | | 500k points | With subsampling | 5-10 min | 10-20 min | | 1M points | Aggressive subsampling | 10-15 min | 20-30 min | | 4M points | Maximum optimization | 20-30 min | 40-60 min |
Hardware: Intel Xeon 8-core, 32GB RAM
Expected Speedups
| Optimization | Speedup Factor | Notes | |————–|—————-|——-| | Numba JIT | 3-5× | Automatic for core kernels | | Subsampling (4×) | 16-20× | <10% chi-squared degradation | | Angle filtering | 3-5× | For anisotropic analysis | | Combined | 50-100× | All optimizations enabled |
Next Steps
See Configuration Guide for detailed parameter settings
Review API Reference for programmatic access
Explore Examples for real-world workflows