Performance Optimization
This guide covers performance optimization strategies for the heterodyne package.
Note
NEW: See the comprehensive Performance Guide guide for the latest performance improvements, including recent optimizations that delivered 10-17x speedups in key calculations.
Performance Documentation
Performance Overview
The heterodyne package is designed to handle large datasets efficiently. Key performance considerations:
Memory Management: Efficient handling of large correlation matrices
Computational Optimization: Numba JIT compilation and vectorization
Algorithm Selection: Optimizing Nelder-Mead configuration
Optimization Strategies
1. Angle Filtering
The most effective optimization for speed with minimal accuracy loss:
# Performance improvement: 3-5x speedup
config = {
"analysis_settings": {
"enable_angle_filtering": True,
"angle_filter_ranges": [[-5, 5], [175, 185]]
}
}
Benefits: - 3-5x faster computation - < 1% accuracy loss for most systems - Reduced memory usage - Scales well with dataset size
2. Data Type Optimization
Choose appropriate precision for your needs:
# Memory reduction: ~50%
config = {
"performance_settings": {
"data_type": "float32" # vs float64
}
}
Type |
Memory |
Speed |
Precision |
Use Case |
|---|---|---|---|---|
float32 |
50% less |
10-20% faster |
~7 digits |
Most analyses |
float64 |
Standard |
Standard |
~15 digits |
High precision needed |
3. JIT Compilation
Enable Numba for computational functions:
from numba import jit
@jit(nopython=True, cache=True)
def compute_correlation_fast(tau, params, q):
# JIT-compiled computation
# 5-10x speedup for model functions
pass
4. Parallel Optimization
Configure parallel processing for optimization:
config = {
"performance_settings": {
"num_threads": 4, # Match CPU cores
"enable_jit": True, # Numba JIT compilation
"data_type": "float64", # Precision vs memory tradeoff
"memory_limit_gb": 8 # Memory constraints
},
"optimization_config": {
"classical_optimization": {
"methods": ["Nelder-Mead"],
"method_options": {
"Nelder-Mead": {
"maxiter": 5000,
"xatol": 1e-6,
"fatol": 1e-6
}
}
}
}
}
Optimization Performance Tips:
Use angle filtering: 3-4x speedup for anisotropic analysis
Enable JIT compilation: 3-5x speedup for core computations
Reduce precision: Use float32 for 2x memory reduction
Batch processing: Process multiple samples in parallel
Memory Optimization
1. Memory Estimation
Estimate memory requirements before analysis:
from heterodyne.utils import estimate_memory_usage
memory_gb = estimate_memory_usage(
data_shape=(1000, 500), # Time points x angles
num_angles=360,
analysis_mode="heterodyne",
data_type="float64"
)
print(f"Estimated memory: {memory_gb:.1f} GB")
2. Chunked Processing
For very large datasets:
config = {
"performance_settings": {
"chunked_processing": True,
"chunk_size": 1000, # Process in chunks
"memory_limit_gb": 8 # Set memory limit
}
}
3. Memory Monitoring
Monitor memory usage during analysis:
import psutil
def monitor_memory():
process = psutil.Process()
memory_mb = process.memory_info().rss / 1024**2
print(f"Memory usage: {memory_mb:.1f} MB")
# Use during analysis
analysis.load_experimental_data()
monitor_memory()
result = analysis.optimize_classical()
monitor_memory()
CPU Optimization
1. Thread Configuration
Optimize thread usage:
import os
# Set thread counts
os.environ['OMP_NUM_THREADS'] = '4'
os.environ['NUMBA_NUM_THREADS'] = '4'
config = {
"performance_settings": {
"num_threads": 4 # Match your CPU cores
}
}
2. BLAS/LAPACK Optimization
Use optimized linear algebra libraries:
# Install optimized BLAS
conda install mkl
# or
pip install intel-mkl
3. CPU Profiling
Profile CPU usage to identify bottlenecks:
import cProfile
import pstats
# Profile analysis
profiler = cProfile.Profile()
profiler.enable()
# Run analysis
result = analysis.optimize_classical()
profiler.disable()
stats = pstats.Stats(profiler)
stats.sort_stats('cumulative').print_stats(10)
Algorithm Optimization
1. Optimization Method Selection
Choose appropriate optimization algorithms:
# Fast for simple landscapes
config = {
"optimization_config": {
"classical": {
"method": "Nelder-Mead", # Fast, robust
"max_iterations": 1000
}
}
}
# For complex landscapes
config = {
"optimization_config": {
"classical_optimization": {
"methods": ["Nelder-Mead"], # Derivative-free simplex method
"method_options": {
"Nelder-Mead": {"maxiter": 500}
}
}
}
}
2. Optimization Strategy for Heterodyne Mode
Configure optimization parameters for the 14-parameter heterodyne model:
# Heterodyne Mode (14 parameters)
config = {
"optimization_config": {
"classical_optimization": {
"methods": ["Nelder-Mead"],
"method_options": {
"Nelder-Mead": {
"maxiter": 10000, # More iterations for 14-parameter space
"xatol": 1e-7, # Tighter tolerance
"fatol": 1e-7
}
}
}
},
"performance_settings": {
"num_threads": 8, # More parallelism
"enable_jit": True,
"data_type": "float64" # Higher precision needed
}
}
Performance Benchmarks
Typical Performance Metrics:
Configuration |
Time |
Memory |
Speedup |
Notes |
|---|---|---|---|---|
Basic heterodyne |
60s |
1.0 GB |
1x |
14-parameter baseline |
+ Angle filtering |
15s |
0.6 GB |
4x |
Most effective |
+ Float32 |
13s |
0.3 GB |
4.6x |
Memory efficient |
+ JIT compilation |
10s |
0.3 GB |
6x |
Full optimization |
Profiling Tools
1. Time Profiling
import time
from functools import wraps
def time_it(func):
@wraps(func)
def wrapper(*args, **kwargs):
start = time.time()
result = func(*args, **kwargs)
end = time.time()
print(f"{func.__name__}: {end - start:.2f}s")
return result
return wrapper
@time_it
def optimize_classical(self):
# Timed function
pass
2. Memory Profiling
from memory_profiler import profile
@profile
def analyze_data():
# Memory-profiled function
pass
3. Line Profiling
# Install line_profiler
pip install line_profiler
# Profile specific functions
kernprof -l -v my_script.py
Performance Best Practices
Configuration:
Enable angle filtering for 3-5x speedup
Use float32 unless high precision needed
Set appropriate thread counts (match CPU cores)
Enable JIT compilation for model functions
Optimization:
Start with classical optimization (Nelder-Mead) for reliable convergence
Use appropriate maxiter based on problem complexity (2000-5000)
Set proper tolerance (xatol/fatol: 1e-6 to 1e-7)
Verify convergence by checking optimization success flag
Use good initial guesses from physical constraints or previous results
Enable parallel processing for multi-sample analyses
Memory:
Estimate memory needs before large analyses
Use chunked processing for very large datasets
Monitor memory usage during long runs
Clean up intermediate results when possible
Development:
Profile before optimizing to find real bottlenecks
Test performance changes with realistic datasets
Balance speed vs. accuracy based on requirements
Document performance characteristics of new features
Troubleshooting Performance Issues
Slow Optimization:
Enable angle filtering
Check initial parameter values
Adjust Nelder-Mead optimization parameters
Reduce tolerance if acceptable
High Memory Usage:
Use float32 data type
Enable chunked processing
Reduce dataset size if possible
Check for memory leaks
Poor Convergence:
Increase maximum iterations (maxiter)
Adjust tolerance parameters (xatol, fatol)
Check parameter bounds and constraints
Use better initial values from physical estimates
Try multiple optimization runs with different initial conditions
Consider robust optimization for noisy data
System-Specific Issues:
Check BLAS/LAPACK installation
Verify thread settings
Monitor CPU/memory resources
Consider cluster computing for very large problems