ADR-003: HDF5 Facade Pattern with Connection Pooling¶
Status¶
Accepted
Context¶
HDF5 file I/O in XPCS Viewer was historically scattered across 12+ modules, each opening files independently with h5py.File. This created several problems:
No connection reuse: Each module opened and closed HDF5 files independently. For interactive analysis workflows where multiple plots read from the same file, this meant repeated open/close cycles.
Inconsistent error handling: Some modules raised exceptions on missing datasets, others returned
None, and some silently returned empty arrays.No schema validation: HDF5 datasets were read as raw NumPy arrays without checking shapes, dtypes, NaN values, or physical constraints (e.g., non-negative delay times).
Implicit data contracts: The structure of HDF5 groups and datasets was documented in comments but never enforced at runtime. Typos in dataset paths caused
KeyErrorat unpredictable points.No versioning: Schema changes to HDF5 file format had no migration path. Old files could silently produce wrong results with new code.
The codebase already had a connection pool (fileIO/hdf_reader.py:HDF5ConnectionPool) for basic connection reuse, but it was used directly by only a few modules.
Decision¶
We introduced a facade pattern with two complementary layers:
Schema validators (
xpcsviewer/schemas/validators.py): Frozen dataclasses with__post_init__validation for all shared data structures.HDF5 Facade (
xpcsviewer/io/hdf5_facade.py): A unified entry point for all HDF5 operations that combines connection pooling with schema validation.
Architecture¶
xpcsviewer/schemas/
validators.py # QMapSchema, GeometryMetadata, G2Data, PartitionSchema, MaskSchema
__init__.py # Public re-exports
xpcsviewer/io/
hdf5_facade.py # HDF5Facade: read/write with validation + pooling
__init__.py # Public re-exports
Schema Design¶
All schemas are frozen dataclasses (@dataclass(frozen=True)) to enforce immutability after construction. Each schema validates in __post_init__:
Schema |
Validates |
Fields |
|---|---|---|
|
Shape consistency, float64 dtype, no NaN, valid units, mask values 0/1 |
sqmap, dqmap, phis, units, mask, partition_map |
|
Positive det_dist/lambda_/pix_dim, 2-tuple shape, beam center bounds |
bcx, bcy, det_dist, lambda_, pix_dim, shape |
|
Shape consistency, float64 dtype, no NaN in g2/delay_times, non-negative errors, monotonic delay_times |
g2, g2_err, delay_times, q_values |
|
Positive num_pts, integer partition_map, matching list lengths, non-negative num_list |
partition_map, num_pts, val_list, num_list, metadata |
|
2D integer array, values 0/1, shape matches metadata |
mask, metadata, version |
Key validation patterns:
Defensive copies on construction:
object.__setattr__(self, "sqmap", np.copy(self.sqmap))prevents external mutation of frozen dataclass arrays.Immutable collections (BUG-010): Mutable lists inside frozen dataclasses are converted to tuples:
object.__setattr__(self, "q_values", tuple(self.q_values)).dtype coercion in
from_dict()(BUG-011, BUG-058): Float32 HDF5 data is coerced to float64 vianp.asarray(data, dtype=np.float64).NaN/Inf rejection (BUG-048):
GeometryMetadata.from_dict()explicitly checks for NaN and infinite values in critical fields.
Facade Design¶
HDF5Facade provides methods for each data type with consistent patterns:
class HDF5Facade:
def __init__(self, pool=None, validate=True):
self.pool = pool or _connection_pool # Global connection pool
self.validate = validate
def read_qmap(self, file_path, group="/xpcs/qmap") -> QMapSchema: ...
def write_mask(self, file_path, mask_schema, group, compression) -> None: ...
def write_partition(self, file_path, partition_schema, group) -> None: ...
def read_g2_data(self, file_path, q_idx=None, group="/xpcs/g2") -> G2Data: ...
def read_geometry_metadata(self, file_path, group="/xpcs/metadata") -> GeometryMetadata: ...
def get_pool_stats(self) -> dict: ...
def clear_pool(self) -> None: ...
Each read method:
Opens the file via the connection pool (
self.pool.get_connection(file_path, "r")).Reads raw datasets from the HDF5 group.
Handles backward compatibility (missing optional datasets, bytes vs. string attributes).
Constructs and returns a validated schema object.
Wraps validation errors in
HDF5ValidationErrorfor consistent error handling.
The validate=False option (BUG-029) returns raw dictionaries instead of schema objects, bypassing __post_init__ validation for performance-critical paths.
All read/write methods are decorated with @log_timing(threshold_ms=...) for automatic performance monitoring.
Connection Pooling¶
The facade delegates connection management to the existing HDF5ConnectionPool from fileIO/hdf_reader.py. The pool:
Caches open file handles keyed by
(file_path, mode).Provides context manager access via
pool.get_connection(path, mode).Tracks cache hit statistics via
pool.get_pool_stats().Can be cleared via
pool.clear_pool()for application shutdown.
Consequences¶
What became easier¶
Type-safe access:
qmap.sqmapinstead ofqmap_dict["sqmap"]– IDE autocomplete, noKeyErrorrisk.Fail-fast validation: Shape mismatches, NaN values, and invalid units are caught at the I/O boundary, not deep in analysis code.
Consistent error handling: All HDF5 errors are wrapped in
HDF5ValidationError, making error handling uniform across the codebase.Backward compatibility:
from_dict()andto_dict()methods allow gradual migration from legacy dict-passing patterns.Monitoring:
get_pool_stats()exposes cache hit ratios and connection counts for production diagnostics.Versioning:
MaskSchema.versionandPartitionSchema.versionfields enable future schema migration.
What became more difficult¶
Validation overhead: Schema validation adds ~1ms per construction. For high-frequency reads in tight loops,
validate=Falseis available.Frozen dataclass limitations: In-place mutation of arrays is not possible. Operations that modify data must create new schema instances.
Migration effort: Existing code that passes raw dicts must be updated to use schemas. The
from_dict()/to_dict()bridge eases this transition.Unit consistency (BUG-028): The default unit in
QMapSchema.from_dict()was changed from"A^-1"to"nm^-1"to matchhdf5_facade.py. Legacy files with implicit units may need attention.