heterodyne.core.io_utils
IO Utilities for Heterodyne Scattering Analysis
Comprehensive I/O utilities for safe data handling in XPCS analysis workflows. Provides robust file operations, intelligent data serialization, and structured result management with extensive error handling and logging.
Key Features: - Thread-safe directory creation with race condition handling - Timestamped filename generation with configurable formatting - Multi-format data serialization (JSON, NumPy, Pickle, Matplotlib) - Custom JSON serializer for NumPy arrays and complex objects - Comprehensive error handling with detailed logging - Structured result saving for analysis workflows - Frame counting utilities for consistent time_length calculations
Data Formats Supported: - JSON: Configuration files, analysis results, metadata - NumPy (.npz): Correlation functions, parameter arrays, numerical data - Pickle (.pkl): Complex Python objects, model instances (MCMC traces removed) - Matplotlib: Figures and plots with publication-quality settings
Safety Features: - Atomic file operations where possible - Directory creation with appropriate permissions - Comprehensive exception handling for I/O errors - Logging of all operations for debugging and audit trails
Authors: Wei Chen, Hongrui He Institution: Argonne National Laboratory
- heterodyne.core.io_utils.ensure_dir(path, permissions=0o755)[source]
Thread-safe recursive directory creation with comprehensive error handling.
Creates directory hierarchies safely, handling race conditions that can occur in multi-process environments (e.g., parallel optimization runs). Uses atomic operations where possible and validates directory creation success.
Features: - Race condition safety for concurrent directory creation - Recursive parent directory creation - Configurable permissions for security control - Path validation and type checking - Comprehensive error reporting
- Parameters:
- Returns:
Pathlib.Path object of the created/validated directory
- Return type:
Path
- Raises:
OSError – Directory creation failed, path exists but isn’t a directory, or permissions issues
Example
>>> ensure_dir("./heterodyne_results/classical/traces") PosixPath('./heterodyne_results/classical/traces')
>>> ensure_dir("/tmp/analysis", permissions=0o700) # Owner-only access PosixPath('/tmp/analysis')
- heterodyne.core.io_utils.timestamped_filename(base_name, chi2=None, config=None)[source]
Generate intelligently formatted filenames with timestamps and analysis metadata.
Creates structured filenames that include temporal information and analysis quality metrics, facilitating result organization and identification. Supports configurable timestamp formats and optional inclusion of chi-squared values for quick quality assessment.
Filename Components: - Base name: User-specified prefix (e.g., ‘analysis_results’) - Timestamp: Configurable format for temporal ordering - Chi-squared: Optional goodness-of-fit indicator - Config version: Optional configuration identification
Configuration Options: - timestamp_format: strftime format string (default: “%Y%m%d_%H%M%S”) - include_chi_squared: Boolean flag for chi2 inclusion - include_config_name: Boolean flag for configuration version
- Parameters:
- Returns:
Structured filename string ready for file operations
- Return type:
Examples
>>> config = {"output_settings": {"file_naming": { ... "timestamp_format": "%Y%m%d_%H%M%S", ... "include_chi_squared": True, ... "include_config_name": True ... }}} >>> timestamped_filename("classical_results", 1.234e-3, config) 'classical_results_20240315_143022_chi2_0.001234_v5.1'
>>> timestamped_filename("quick_analysis") # Minimal version 'quick_analysis_20240315_143022'
- heterodyne.core.io_utils.save_json(data, filepath, **kwargs)[source]
Save data as JSON with robust error handling and NumPy support.
Provides safe JSON serialization with automatic handling of scientific computing objects like NumPy arrays and scalars. Uses custom serializer to ensure compatibility with analysis results containing numerical data.
Features: - Custom NumPy serializer for arrays and scalars - Automatic directory creation for output path - UTF-8 encoding for international character support - Comprehensive error handling with detailed logging - Configurable JSON formatting options
Default JSON Parameters: - indent=2: Pretty formatting for readability - ensure_ascii=False: Support for Unicode characters - default=_json_serializer: NumPy and object handling
- Parameters:
- Returns:
True if save successful, False if any error occurred
- Return type:
Examples
>>> results = {"parameters": np.array([1.2, 3.4]), "chi2": 1.234e-5} >>> save_json(results, "analysis_results.json") True
>>> save_json(data, "compact.json", indent=None, separators=(',', ':')) True # Compact JSON format
- heterodyne.core.io_utils.save_numpy(data, filepath, compressed=True, **kwargs)[source]
Save NumPy arrays with optimal compression and format selection.
Provides efficient storage of numerical data with automatic format selection based on file extension and compression preferences. Essential for saving correlation functions, parameter arrays, and other numerical results.
Format Selection: - .npz extension or compressed=True: Uses np.savez_compressed (recommended) - Other extensions with compressed=False: Uses np.save (uncompressed) - Automatic directory creation for nested paths
Compression Benefits: - Significantly reduced file sizes (typically 2-10x smaller) - Faster I/O for large arrays due to reduced data transfer - Standard NumPy format compatibility
- Parameters:
- Returns:
True if save successful, False if error occurred
- Return type:
Examples
>>> correlation_data = np.random.rand(1000, 50, 50) # Large 3D array >>> save_numpy(correlation_data, "c2_experimental.npz") True # Compressed format, much smaller file
>>> parameters = np.array([1.2, -0.5, 3.4e-3, 0.1]) >>> save_numpy(parameters, "optimized_params.npy", compressed=False) True # Uncompressed for small arrays
- heterodyne.core.io_utils.save_pickle(data, filepath, protocol=pickle.HIGHEST_PROTOCOL, **kwargs)[source]
Save data using pickle with error handling and logging.
- Parameters:
- Returns:
True if successful, False otherwise
- Return type:
Example
>>> data = {"model": some_complex_object, "parameters": [1, 2, 3]} >>> save_pickle(data, "model_data.pkl") True
- heterodyne.core.io_utils.save_fig(figure, filepath, dpi=300, format=None, **kwargs)[source]
Save matplotlib figure with error handling and logging.
- Parameters:
- Returns:
True if successful, False otherwise
- Return type:
Example
>>> import matplotlib.pyplot as plt >>> fig, ax = plt.subplots() >>> ax.plot([1, 2, 3], [1, 4, 2]) >>> save_fig(fig, "plot.png", dpi=300, bbox_inches='tight') True
- heterodyne.core.io_utils.get_output_directory(config=None)[source]
Get the output directory from configuration, creating it if necessary.
- Parameters:
config (dict | None) – Configuration dictionary
- Returns:
Output directory path
- Return type:
Path
- heterodyne.core.io_utils.save_classical_optimization_results(results, method_results=None, config=None, base_name='classical_results')[source]
Save classical optimization results with method-specific organization.
Creates separate files for each optimization method (Nelder-Mead, Gurobi) to prevent overwriting and enable method comparison. Organizes results in a structured directory layout for easy analysis and plotting.
File Organization: - classical_results_nelder_mead_TIMESTAMP.json - classical_results_gurobi_TIMESTAMP.json - classical_results_all_methods_TIMESTAMP.json (combined)
- Parameters:
- Returns:
Save status for each method and combined results
- Return type:
- heterodyne.core.io_utils.save_analysis_results(results, config=None, base_name='analysis_results')[source]
Orchestrate comprehensive saving of analysis results in multiple formats.
Enhanced to handle method-specific classical optimization results, preventing overwrites between Nelder-Mead and Gurobi methods. Intelligently saves analysis results using optimal formats for different data types.
Save Strategy: - JSON: Main results, parameters, metadata (human-readable) - NumPy (.npz): Correlation data, large numerical arrays (efficient) - Pickle (.pkl): Complex objects, model instances (complete; MCMC traces removed) - Method-specific: Individual files for each classical optimization method
File Organization: - Timestamped base filename for chronological organization - Format-specific suffixes: .json, _data.npz, _full.pkl - Classical-only results saved to classical/ subdirectory - Multi-method results saved to main output directory (MCMC removed) - Automatic directory creation and organization - Consistent naming across all output files
- Parameters:
results (Dict) – Complete analysis results dictionary containing: - Optimization results and parameters - Correlation data arrays - Configuration and metadata
config (dict | None) – Configuration for output directory and naming
base_name (str) – Prefix for all output files (default: “analysis_results”)
- Returns:
- Save status for each format:
”json”: JSON save status
”numpy”: NumPy array save status (if applicable)
”pickle”: Pickle save status (if applicable)
method-specific keys for classical optimization
- Return type:
Example
>>> results = { ... "classical_optimization": {"parameters": [1.2, -0.5, 3.4]}, ... "correlation_data": np.random.rand(100, 50, 50), ... "best_chi_squared": 1.234e-5 ... } >>> status = save_analysis_results(results, config, "experiment_A") >>> print(status) {'json': True, 'numpy': True, 'pickle': True, 'nelder_mead_json': True, 'gurobi_json': True}
- heterodyne.core.io_utils.calculate_time_length(start_frame, end_frame)[source]
Calculate time_length using inclusive frame counting.
This is the canonical formula used throughout the heterodyne package to ensure dimensional consistency between configuration, cached data, and runtime arrays.
Frame Counting Convention:
Config frames are 1-based and inclusive: [start_frame, end_frame]
time_length includes both start and end frames
Formula: time_length = end_frame - start_frame + 1
Examples:
>>> calculate_time_length(1, 100) 100 >>> calculate_time_length(401, 1000) 600 >>> calculate_time_length(1, 1) 1
- param start_frame:
Starting frame number (1-based, inclusive)
- param end_frame:
Ending frame number (1-based, inclusive)
- returns:
Number of frames in the range (time_length)
- rtype:
int
- raises ValueError:
If start_frame > end_frame
Note
This formula was fixed in v1.0.0 to address a critical bug where the original formula (end_frame - start_frame) caused off-by-one errors, dimensional mismatches, and NaN chi-squared values.
See also
heterodyne/analysis/core.py:240 - Core time_length calculation
heterodyne/data/xpcs_loader.py - Data loading with frame slicing
heterodyne/tests/test_time_length_calculation.py - Regression tests
- heterodyne.core.io_utils.config_frames_to_python_slice(start_frame, end_frame)[source]
Convert config frame range (1-based inclusive) to Python slice indices (0-based).
Config Convention:
start_frame: 1-based, inclusive (e.g., 1 means first frame)
end_frame: 1-based, inclusive (e.g., 100 means include frame 100)
Python Slice Convention:
start: 0-based, inclusive
end: 0-based, exclusive (used as data[start:end])
Examples:
>>> config_frames_to_python_slice(1, 100) (0, 100) >>> config_frames_to_python_slice(401, 1000) (400, 1000)
This gives slice [0:100] = 100 frames, [400:1000] = 600 frames, matching time_length = end_frame - start_frame + 1.
- param start_frame:
Starting frame from config (1-based, inclusive)
- param end_frame:
Ending frame from config (1-based, inclusive)
- returns:
(python_start, python_end) for use in data[start:end]
- rtype:
tuple[int, int]
Note
The returned indices are designed for Python array slicing where the end index is exclusive. The slice [python_start:python_end] will give exactly time_length = end_frame - start_frame + 1 elements.
See also
convert_c2_to_npz.py:convert_config_frames_to_python() - Data conversion
heterodyne/data/xpcs_loader.py:637 - Frame slicing in data loader