heterodyne.core.io_utils

IO Utilities for Heterodyne Scattering Analysis

Comprehensive I/O utilities for safe data handling in XPCS analysis workflows. Provides robust file operations, intelligent data serialization, and structured result management with extensive error handling and logging.

Key Features: - Thread-safe directory creation with race condition handling - Timestamped filename generation with configurable formatting - Multi-format data serialization (JSON, NumPy, Pickle, Matplotlib) - Custom JSON serializer for NumPy arrays and complex objects - Comprehensive error handling with detailed logging - Structured result saving for analysis workflows - Frame counting utilities for consistent time_length calculations

Data Formats Supported: - JSON: Configuration files, analysis results, metadata - NumPy (.npz): Correlation functions, parameter arrays, numerical data - Pickle (.pkl): Complex Python objects, model instances (MCMC traces removed) - Matplotlib: Figures and plots with publication-quality settings

Safety Features: - Atomic file operations where possible - Directory creation with appropriate permissions - Comprehensive exception handling for I/O errors - Logging of all operations for debugging and audit trails

Authors: Wei Chen, Hongrui He Institution: Argonne National Laboratory

heterodyne.core.io_utils.ensure_dir(path, permissions=0o755)[source]

Thread-safe recursive directory creation with comprehensive error handling.

Creates directory hierarchies safely, handling race conditions that can occur in multi-process environments (e.g., parallel optimization runs). Uses atomic operations where possible and validates directory creation success.

Features: - Race condition safety for concurrent directory creation - Recursive parent directory creation - Configurable permissions for security control - Path validation and type checking - Comprehensive error reporting

Parameters:

path (str | Path) – Directory path to create (absolute or relative)
permissions (int) – Unix-style permissions (default: 0o755 = rwxr-xr-x)

Returns:

Pathlib.Path object of the created/validated directory

Return type:

Path

Raises:

OSError – Directory creation failed, path exists but isn’t a directory, or permissions issues

Example

>>> ensure_dir("./heterodyne_results/classical/traces")
PosixPath('./heterodyne_results/classical/traces')

>>> ensure_dir("/tmp/analysis", permissions=0o700)  # Owner-only access
PosixPath('/tmp/analysis')

heterodyne.core.io_utils.timestamped_filename(base_name, chi2=None, config=None)[source]

Generate intelligently formatted filenames with timestamps and analysis metadata.

Creates structured filenames that include temporal information and analysis quality metrics, facilitating result organization and identification. Supports configurable timestamp formats and optional inclusion of chi-squared values for quick quality assessment.

Filename Components: - Base name: User-specified prefix (e.g., ‘analysis_results’) - Timestamp: Configurable format for temporal ordering - Chi-squared: Optional goodness-of-fit indicator - Config version: Optional configuration identification

Configuration Options: - timestamp_format: strftime format string (default: “%Y%m%d_%H%M%S”) - include_chi_squared: Boolean flag for chi2 inclusion - include_config_name: Boolean flag for configuration version

Parameters:

base_name (str) – Base filename prefix (without extension)
chi2 (float | None) – Chi-squared value for quality indication
config (dict | None) – Configuration with output_settings/file_naming

Returns:

Structured filename string ready for file operations

Return type:

str

Examples

>>> config = {"output_settings": {"file_naming": {
...     "timestamp_format": "%Y%m%d_%H%M%S",
...     "include_chi_squared": True,
...     "include_config_name": True
... }}}
>>> timestamped_filename("classical_results", 1.234e-3, config)
'classical_results_20240315_143022_chi2_0.001234_v5.1'

>>> timestamped_filename("quick_analysis")  # Minimal version
'quick_analysis_20240315_143022'

heterodyne.core.io_utils.save_json(data, filepath, **kwargs)[source]

Save data as JSON with robust error handling and NumPy support.

Provides safe JSON serialization with automatic handling of scientific computing objects like NumPy arrays and scalars. Uses custom serializer to ensure compatibility with analysis results containing numerical data.

Features: - Custom NumPy serializer for arrays and scalars - Automatic directory creation for output path - UTF-8 encoding for international character support - Comprehensive error handling with detailed logging - Configurable JSON formatting options

Default JSON Parameters: - indent=2: Pretty formatting for readability - ensure_ascii=False: Support for Unicode characters - default=_json_serializer: NumPy and object handling

Parameters:

data (Any) – Data structure to save (dicts, lists, arrays, etc.)
filepath (str | Path) – Output file path (directories created automatically)
**kwargs – Additional json.dump() arguments (override defaults)

Returns:

True if save successful, False if any error occurred

Return type:

bool

Examples

>>> results = {"parameters": np.array([1.2, 3.4]), "chi2": 1.234e-5}
>>> save_json(results, "analysis_results.json")
True

>>> save_json(data, "compact.json", indent=None, separators=(',', ':'))
True  # Compact JSON format

heterodyne.core.io_utils.save_numpy(data, filepath, compressed=True, **kwargs)[source]

Save NumPy arrays with optimal compression and format selection.

Provides efficient storage of numerical data with automatic format selection based on file extension and compression preferences. Essential for saving correlation functions, parameter arrays, and other numerical results.

Format Selection: - .npz extension or compressed=True: Uses np.savez_compressed (recommended) - Other extensions with compressed=False: Uses np.save (uncompressed) - Automatic directory creation for nested paths

Compression Benefits: - Significantly reduced file sizes (typically 2-10x smaller) - Faster I/O for large arrays due to reduced data transfer - Standard NumPy format compatibility

Parameters:

data (np.ndarray) – NumPy array to save (any shape/dtype)
filepath (str | Path) – Output file path (.npz recommended)
compressed (bool) – Enable compression (default: True for efficiency)
**kwargs – Additional arguments for np.savez_compressed/np.save

Returns:

True if save successful, False if error occurred

Return type:

bool

Examples

>>> correlation_data = np.random.rand(1000, 50, 50)  # Large 3D array
>>> save_numpy(correlation_data, "c2_experimental.npz")
True  # Compressed format, much smaller file

>>> parameters = np.array([1.2, -0.5, 3.4e-3, 0.1])
>>> save_numpy(parameters, "optimized_params.npy", compressed=False)
True  # Uncompressed for small arrays

heterodyne.core.io_utils.save_pickle(data, filepath, protocol=pickle.HIGHEST_PROTOCOL, **kwargs)[source]

Save data using pickle with error handling and logging.

Parameters:

data (Any) – Data to pickle
filepath (str | Path) – Output file path
protocol (int) – Pickle protocol version (default: highest available)
**kwargs – Additional arguments (reserved for future use)

Returns:

True if successful, False otherwise

Return type:

bool

Example

>>> data = {"model": some_complex_object, "parameters": [1, 2, 3]}
>>> save_pickle(data, "model_data.pkl")
True

heterodyne.core.io_utils.save_fig(figure, filepath, dpi=300, format=None, **kwargs)[source]

Save matplotlib figure with error handling and logging.

Parameters:

figure (Any) – Matplotlib figure object
filepath (str | Path) – Output file path
dpi (int) – Resolution in dots per inch (default: 300)
format (str | None) – Figure format (inferred from extension if None)
**kwargs – Additional arguments passed to figure.savefig()

Returns:

True if successful, False otherwise

Return type:

bool

Example

>>> import matplotlib.pyplot as plt
>>> fig, ax = plt.subplots()
>>> ax.plot([1, 2, 3], [1, 4, 2])
>>> save_fig(fig, "plot.png", dpi=300, bbox_inches='tight')
True

heterodyne.core.io_utils.get_output_directory(config=None)[source]

Get the output directory from configuration, creating it if necessary.

Parameters:: config (dict | None) – Configuration dictionary
Returns:: Output directory path
Return type:: Path

heterodyne.core.io_utils.save_classical_optimization_results(results, method_results=None, config=None, base_name='classical_results')[source]

Save classical optimization results with method-specific organization.

Creates separate files for each optimization method (Nelder-Mead, Gurobi) to prevent overwriting and enable method comparison. Organizes results in a structured directory layout for easy analysis and plotting.

File Organization: - classical_results_nelder_mead_TIMESTAMP.json - classical_results_gurobi_TIMESTAMP.json - classical_results_all_methods_TIMESTAMP.json (combined)

Parameters:

results (Dict) – Main optimization results
method_results (Dict) – Method-specific results dictionary
config (dict | None) – Configuration for output directory and naming
base_name (str) – Base name for output files

Returns:

Save status for each method and combined results

Return type:

dict[str, bool]

heterodyne.core.io_utils.save_analysis_results(results, config=None, base_name='analysis_results')[source]

Orchestrate comprehensive saving of analysis results in multiple formats.

Enhanced to handle method-specific classical optimization results, preventing overwrites between Nelder-Mead and Gurobi methods. Intelligently saves analysis results using optimal formats for different data types.

Save Strategy: - JSON: Main results, parameters, metadata (human-readable) - NumPy (.npz): Correlation data, large numerical arrays (efficient) - Pickle (.pkl): Complex objects, model instances (complete; MCMC traces removed) - Method-specific: Individual files for each classical optimization method

File Organization: - Timestamped base filename for chronological organization - Format-specific suffixes: .json, _data.npz, _full.pkl - Classical-only results saved to classical/ subdirectory - Multi-method results saved to main output directory (MCMC removed) - Automatic directory creation and organization - Consistent naming across all output files

Parameters:

results (Dict) – Complete analysis results dictionary containing: - Optimization results and parameters - Correlation data arrays - Configuration and metadata
config (dict | None) – Configuration for output directory and naming
base_name (str) – Prefix for all output files (default: “analysis_results”)

Returns:

Save status for each format:

”json”: JSON save status
”numpy”: NumPy array save status (if applicable)
”pickle”: Pickle save status (if applicable)
method-specific keys for classical optimization

Return type:

dict[str, bool]

Example

>>> results = {
...     "classical_optimization": {"parameters": [1.2, -0.5, 3.4]},
...     "correlation_data": np.random.rand(100, 50, 50),
...     "best_chi_squared": 1.234e-5
... }
>>> status = save_analysis_results(results, config, "experiment_A")
>>> print(status)
{'json': True, 'numpy': True, 'pickle': True, 'nelder_mead_json': True, 'gurobi_json': True}

heterodyne.core.io_utils.calculate_time_length(start_frame, end_frame)[source]

Calculate time_length using inclusive frame counting.

This is the canonical formula used throughout the heterodyne package to ensure dimensional consistency between configuration, cached data, and runtime arrays.

Frame Counting Convention:

Config frames are 1-based and inclusive: [start_frame, end_frame]
time_length includes both start and end frames
Formula: time_length = end_frame - start_frame + 1

Examples:

>>> calculate_time_length(1, 100)
100
>>> calculate_time_length(401, 1000)
600
>>> calculate_time_length(1, 1)
1

param start_frame:: Starting frame number (1-based, inclusive)
param end_frame:: Ending frame number (1-based, inclusive)
returns:: Number of frames in the range (time_length)
rtype:: int
raises ValueError:: If start_frame > end_frame

Note

This formula was fixed in v1.0.0 to address a critical bug where the original formula (end_frame - start_frame) caused off-by-one errors, dimensional mismatches, and NaN chi-squared values.

Config Convention:

start_frame: 1-based, inclusive (e.g., 1 means first frame)
end_frame: 1-based, inclusive (e.g., 100 means include frame 100)

Python Slice Convention:

start: 0-based, inclusive
end: 0-based, exclusive (used as data[start:end])

Examples:

>>> config_frames_to_python_slice(1, 100)
(0, 100)
>>> config_frames_to_python_slice(401, 1000)
(400, 1000)

This gives slice [0:100] = 100 frames, [400:1000] = 600 frames, matching time_length = end_frame - start_frame + 1.

param start_frame:: Starting frame from config (1-based, inclusive)
param end_frame:: Ending frame from config (1-based, inclusive)
returns:: (python_start, python_end) for use in data[start:end]
rtype:: tuple[int, int]

Note

The returned indices are designed for Python array slicing where the end index is exclusive. The slice [python_start:python_end] will give exactly time_length = end_frame - start_frame + 1 elements.