# XPCS Viewer Dependency Analysis and Integration Catalog **Analysis Date:** 2026-01-06 (updated 2026-02-18) **Codebase Version:** ``master`` branch **Analyzed Components:** Core modules, backends, fitting, simplemask, file I/O .. note:: As of February 2026, JAX, NumPyro, NLSQ, and ArviZ are **core dependencies** (listed in ``pyproject.toml`` under ``dependencies``). References to "optional" JAX installation in this document are historical. --- ## Executive Summary ### Dependency Health Overview | Metric | Status | Notes | |--------|--------|-------| | Circular Dependencies | ✅ None detected | Clean dependency tree in key modules | | High Fan-In Modules | ⚠️ 3 modules | Logging, backends, conversions | | Tight Coupling | ⚠️ Moderate | XpcsFile tightly coupled to analysis modules | | Backend Abstraction | ✅ Well-isolated | JAX/NumPy switching via clean interface | | Integration Points | ⚠️ 12 identified | HDF5 I/O boundaries need standardization | ### Key Findings 1. **No circular dependencies** in core modules - clean separation of concerns 2. **Backend abstraction layer is well-designed** - minimal coupling to JAX/NumPy 3. **High coupling at I/O boundaries** - HDF5 read/write scattered across modules 4. **SimpleMask integration is loosely coupled** - uses signals for data export 5. **Shared data schemas lack formal definition** - implicit contracts, no validation --- ## 1. Internal Module Dependencies ### 1.1 Dependency Graph ``` xpcsviewer/ ├── backends/ [FAN-IN: 8, FAN-OUT: 0] │ ├── _base.py - Protocol definition │ ├── _jax_backend.py - JAX implementation │ ├── _numpy_backend.py - NumPy fallback │ ├── _conversions.py [FAN-IN: 9] - I/O boundary conversion │ └── _device.py - GPU device management │ ├── fitting/ [FAN-IN: 0, FAN-OUT: 2] │ ├── models.py - NumPyro model definitions │ ├── sampler.py - NUTS sampler orchestration │ ├── nlsq.py - JAX-accelerated NLSQ warm-start │ ├── results.py - FitResult, NLSQResult classes │ └── legacy.py - scipy.optimize compatibility layer │ ├── xpcs_file.py [FAN-IN: 2, FAN-OUT: 3] │ └── xpcs_file/ │ ├── cache.py - LRU data cache │ ├── memory.py - Memory monitoring │ └── fitting.py - Legacy fitting functions │ ├── viewer_kernel.py [FAN-IN: 1, FAN-OUT: 0] │ └── Coordinates module loading and analysis orchestration │ ├── module/ [Analysis modules] │ ├── saxs1d.py [FAN-OUT: 2] - 1D SAXS plotting │ ├── saxs2d.py [FAN-OUT: 2] - 2D SAXS plotting │ ├── g2mod.py [FAN-OUT: 2] - G2 correlation analysis │ ├── twotime.py [FAN-OUT: 4] - Two-time correlation │ ├── tauq.py [FAN-OUT: 3] - Tau-Q analysis │ └── intt.py - Intensity-time analysis │ ├── simplemask/ [Self-contained subsystem] │ ├── simplemask_window.py [FAN-OUT: 3] - QMainWindow + signals │ ├── simplemask_kernel.py [FAN-IN: 2, FAN-OUT: 4] │ ├── area_mask.py - Mask assembly with undo/redo │ ├── qmap.py [FAN-IN: 2] - Q-map computation │ ├── utils.py [FAN-IN: 2] - Partition generation │ └── pyqtgraph_mod.py [FAN-IN: 3] - Custom ROI classes │ ├── fileIO/ │ ├── hdf_reader.py [FAN-IN: 2] - HDF5 connection pooling │ ├── qmap_utils.py - Q-map HDF5 I/O │ └── aps_8idi.py - Beamline-specific key mapping │ └── utils/ ├── logging_config.py [FAN-IN: 24] ⚠️ High coupling └── validation.py - Input validation utilities ``` ### 1.2 High Fan-In Modules (Integration Hotspots) | Module | Dependents | Risk | Recommendation | |--------|------------|------|----------------| | `utils.logging_config` | 24 | Low | Stable utility, acceptable coupling | | `backends._conversions` | 9 | Medium | Critical for I/O boundaries, monitor closely | | `backends` | 8 | Medium | Core abstraction, changes cascade widely | | `constants` | 6 | Low | Configuration values, stable | | `simplemask.pyqtgraph_mod` | 3 | Low | UI components, isolated | **Analysis:** `backends._conversions` is the critical integration point between JAX and I/O systems. Any changes to `ensure_numpy()` affect 9 modules. ### 1.3 High Fan-Out Modules (Brittle Dependencies) | Module | Dependencies | Risk | Recommendation | |--------|--------------|------|----------------| | `module.twotime` | 4 | High | Refactor to reduce backend coupling | | `simplemask.simplemask_kernel` | 4 | Medium | Well-structured, acceptable | | `xpcs_file` | 3 | Medium | Consider breaking into smaller services | **Analysis:** `module.twotime` depends on backends, conversions, scipy replacements, and logging. Consider facade pattern to reduce direct dependencies. --- ## 2. External Service Integrations ### 2.1 HDF5 File I/O #### Integration Points (12 identified) | Module | Operation | Schema | Boundary Type | |--------|-----------|--------|---------------| | `xpcs_file.py` | Read/Write | XPCS data files | **Primary I/O** | | `fileIO/hdf_reader.py` | Read with pooling | Generic HDF5 | **Connection management** | | `fileIO/qmap_utils.py` | Read | `/xpcs/qmap` | **Q-map loader** | | `simplemask/simplemask_kernel.py` | Write | Mask HDF5 | **Export boundary** | | `simplemask/area_mask.py` | Read/Write | Mask HDF5 | **Persistence** | | `module/twotime_utils.py` | Read | Two-time data | **Analysis cache** | | `utils/lazy_loader.py` | Read | Deferred loading | **Memory optimization** | | `utils/validation.py` | Read | Schema validation | **Input validation** | #### HDF5 Schema Contracts (Implicit) **XPCS Data File Schema:** ```text /xpcs/ ├── qmap/ │ ├── mask # int32, shape=(H, W) │ ├── sqmap # float64, shape=(H, W) - Static Q │ ├── dqmap # float64, shape=(H, W) - Dynamic Q │ └── partition_map # int32, shape=(H, W) - Q-bin indices ├── g2/ # G2 correlation results ├── saxs1d/ # 1D SAXS profiles └── metadata/ # Geometry, detector config ``` **SimpleMask Partition Schema:** ```python { "version": "1.0.0", "mask": np.ndarray, # int32, shape=(H, W), values 0/1 "partition_map": np.ndarray, # int32, shape=(H, W), bin indices "num_pts": int, # Number of Q-bins "val_list": list[float], # Q-bin center values "num_list": list[int], # Pixels per bin "metadata": dict # Geometry parameters } ``` **Signal Export Schema (SimpleMask → XPCS Viewer):** ```python # Signal: mask_exported(np.ndarray) mask: np.ndarray[int32] # Shape=(H, W), values 0/1 # Signal: qmap_exported(dict) { "partition_map": np.ndarray[int32], # Shape=(H, W) "num_pts": int, "val_list": list[float], "num_list": list[int] } ``` #### ⚠️ Integration Risks 1. **No schema validation** - HDF5 files loaded without runtime checks 2. **Inconsistent error handling** - Some modules raise, others return None 3. **Mixed use of h5py.File contexts** - Connection pooling vs direct access 4. **No versioning** - Schema changes could break backward compatibility #### 🔧 Recommended Facade Pattern ```python # New: xpcsviewer/io/hdf5_facade.py class HDF5Facade: """Unified HDF5 I/O with schema validation.""" def read_qmap(self, file_path: str) -> QMapSchema: """Read Q-map with schema validation.""" pass def write_mask(self, file_path: str, mask: MaskSchema) -> None: """Write mask with versioning.""" pass def read_xpcs_data(self, file_path: str) -> XPCSSchema: """Read XPCS data with connection pooling.""" pass ``` --- ## 3. Shared Data Schemas ### 3.1 Core Data Structures #### Q-Map Dictionary (Cross-Module Contract) ```python # Produced by: simplemask.qmap.compute_qmap() # Consumed by: xpcs_file.py, viewer_kernel.py, module/saxs*.py QMapDict = { "sqmap": np.ndarray, # float64, shape=(H, W) - Static Q "dqmap": np.ndarray, # float64, shape=(H, W) - Dynamic Q "phis": np.ndarray, # float64, shape=(H, W) - Azimuthal angle "sqmap_unit": str, # "nm^-1" or "A^-1" "dqmap_unit": str, "phis_unit": str, # "rad" or "deg" } ``` **Dependency Graph:** ``` simplemask.qmap.compute_qmap() ↓ xpcs_file.get_cropped_qmap() → viewer_kernel.plot_qmap() ↓ ↓ module.saxs1d.plot() module.saxs2d.plot() ``` **⚠️ Coupling Issue:** No formal validation - consumers assume specific keys exist. #### Geometry Metadata (Cross-Module Contract) ```python # Produced by: fileIO/hdf_reader.py (from HDF5 /xpcs/metadata) # Consumed by: simplemask.qmap, xpcs_file.py GeometryMetadata = { "bcx": float, # Beam center X (column, pixels) "bcy": float, # Beam center Y (row, pixels) "det_dist": float, # Detector distance (mm) "lambda_": float, # Wavelength (Å) "pix_dim": float, # Pixel size (mm) "shape": tuple[int, int] # (height, width) } ``` **Dependency Graph:** ``` fileIO.hdf_reader → xpcs_file.metadata ↓ simplemask.qmap.compute_qmap() ↓ viewer_kernel.plot_qmap() ``` **⚠️ Coupling Issue:** Metadata passed as plain dict without type checking. #### G2 Data Structure ```python # Produced by: xpcs_file.get_g2_data() # Consumed by: module.g2mod, viewer_kernel.plot_g2() G2Data = { "g2": np.ndarray, # shape=(n_q, n_delay) "g2_err": np.ndarray, # shape=(n_q, n_delay) "delay_times": np.ndarray, # shape=(n_delay,) "q_values": list[float], # Length n_q } ``` #### Backend Array Protocol ```python # Produced by: backends.get_backend() # Consumed by: All analysis modules using JAX/NumPy BackendArray = Union[np.ndarray, jax.Array] # I/O Boundary Conversion: ensure_numpy(array: BackendArray) -> np.ndarray ensure_backend_array(array: np.ndarray) -> BackendArray ``` **Critical I/O Boundaries:** 1. **PyQtGraph plotting:** Always requires NumPy (use `ensure_numpy()`) 2. **HDF5 writing:** Always requires NumPy (use `ensure_numpy()`) 3. **Matplotlib:** Always requires NumPy (use `ensure_numpy()`) 4. **User input:** Convert to backend array for computation --- ## 4. Cross-Module Data Flows ### 4.1 Primary Data Flow: XPCS Analysis Pipeline ``` [HDF5 File] ↓ (fileIO.hdf_reader) [XpcsFile] ← Memory cache, fitting cache ↓ (viewer_kernel coordinates) [Analysis Modules] ├→ module.saxs1d → [PyQtGraph plot] ├→ module.saxs2d → [PyQtGraph plot] ├→ module.g2mod → [fitting.fit_single_exp] → [FitResult] └→ module.twotime → [backends._conversions] → [Plot] ↓ [Export to HDF5 / GUI signals] ``` **Backend Conversion Points:** - **Entry:** `xpcs_file.py` reads NumPy from HDF5 - **Computation:** Analysis modules use `backends.get_backend()` for array ops - **Exit:** `ensure_numpy()` before PyQtGraph/Matplotlib/HDF5 ### 4.2 SimpleMask Data Flow ``` [User Detector Image] ↓ [SimpleMaskWindow] ↓ [SimpleMaskKernel] ├→ [MaskAssemble] - Mask editing with undo/redo ├→ [qmap.compute_qmap()] - JAX/NumPy backend └→ [utils.generate_partition()] - JAX JIT-compiled ↓ [Export Signals] ├→ mask_exported(np.ndarray) → [XPCS Viewer applies mask] └→ qmap_exported(dict) → [XPCS Viewer updates partition] ``` **Backend Conversion Points:** - **Entry:** SimpleMaskWindow receives NumPy from user - **Computation:** Q-map uses `backends.get_backend()` with JIT cache - **Exit:** `ensure_numpy()` before emitting signals ### 4.3 Fitting Data Flow ``` [module.g2mod requests fit] ↓ [fitting.fit_single_exp] ├→ [nlsq.nlsq_optimize] - JAX gradient descent warm-start │ └→ backends.value_and_grad() ↓ └→ [sampler.run_single_exp_fit] - NumPyro NUTS └→ backends (JAX required) ↓ [FitResult] - ArviZ-compatible posterior samples ↓ [visualization.plot_posterior_predictive] └→ ensure_numpy() before Matplotlib ``` **Backend Requirements:** - **NLSQ:** Works with NumPy or JAX backend - **NumPyro:** **Requires JAX backend** (raises RuntimeError otherwise) - **Visualization:** Always converts to NumPy for Matplotlib --- ## 5. Integration Points Requiring Attention ### 5.1 Critical Facades Needed #### 1. HDF5 I/O Facade (Priority: HIGH) **Problem:** 12 modules directly use `h5py.File`, inconsistent error handling. **Proposed Solution:** ```python # xpcsviewer/io/hdf5_facade.py class HDF5Facade: def __init__(self, connection_pool: HDF5ConnectionPool): self._pool = connection_pool def read_qmap(self, file_path: str) -> QMapDict: """Read Q-map with schema validation and connection pooling.""" with self._pool.get_connection(file_path) as f: return self._validate_qmap_schema(f['/xpcs/qmap']) def write_partition(self, file_path: str, partition: PartitionDict) -> None: """Write partition with versioning and compression.""" pass ``` **Affected Modules:** - `xpcs_file.py` - Migrate to facade for qmap access - `simplemask/simplemask_kernel.py` - Use facade for mask I/O - `fileIO/qmap_utils.py` - Replace with facade - `module/twotime_utils.py` - Use facade for cache reads **Migration Strategy:** 1. Create facade in parallel (no breaking changes) 2. Add deprecation warnings to direct `h5py.File` usage 3. Migrate module by module with tests 4. Remove direct access after 2 releases #### 2. Backend Array Adapter (Priority: MEDIUM) **Problem:** I/O boundary conversions scattered across 9 modules. **Proposed Solution:** ```python # xpcsviewer/backends/io_adapter.py class BackendIOAdapter: """Automatic backend conversion at I/O boundaries.""" @staticmethod def for_pyqtgraph(array: BackendArray) -> np.ndarray: """Convert backend array for PyQtGraph plotting.""" return ensure_numpy(array) @staticmethod def for_hdf5(array: BackendArray) -> np.ndarray: """Convert backend array for HDF5 writing.""" return ensure_numpy(array) @staticmethod def from_user_input(array: np.ndarray) -> BackendArray: """Convert user input to active backend.""" return ensure_backend_array(array) ``` **Affected Modules:** - `module/saxs1d.py` - PyQtGraph boundary - `module/saxs2d.py` - PyQtGraph boundary - `module/twotime.py` - PyQtGraph + HDF5 boundaries - `simplemask/qmap.py` - Signal export boundary - `simplemask/area_mask.py` - HDF5 boundary #### 3. Data Schema Validators (Priority: HIGH) **Problem:** No runtime validation of shared data structures. **Proposed Solution:** ```python # xpcsviewer/schemas/validators.py from dataclasses import dataclass import numpy as np @dataclass class QMapSchema: sqmap: np.ndarray dqmap: np.ndarray phis: np.ndarray sqmap_unit: str dqmap_unit: str phis_unit: str def __post_init__(self): assert self.sqmap.shape == self.dqmap.shape == self.phis.shape assert self.sqmap_unit in ["nm^-1", "A^-1"] # ... more validation def validate_qmap(data: dict) -> QMapSchema: """Validate Q-map dictionary schema.""" return QMapSchema(**data) ``` **Affected Modules:** - `simplemask.qmap` - Producer validation - `xpcs_file.py` - Consumer validation - `viewer_kernel.py` - Consumer validation ### 5.2 Circular Dependency Risks **Current Status: ✅ No circular dependencies detected** **Potential Future Risks:** 1. **fitting ← → backends:** - `fitting` depends on `backends` for JAX access - Risk: If `backends` adds fitting utilities, creates cycle - **Mitigation:** Keep backends purely computational (no domain logic) 2. **xpcs_file ← → module/*:** - `xpcs_file` loads data, modules analyze - Risk: Modules might cache results in XpcsFile - **Mitigation:** Use separate cache service (already done via `xpcs_file.cache`) 3. **simplemask ← → viewer_kernel:** - Currently decoupled via signals - Risk: Direct method calls for convenience - **Mitigation:** Enforce signal-only communication in code reviews ### 5.3 Tight Coupling Analysis #### High Coupling: XpcsFile ↔ Analysis Modules **Evidence:** - `xpcs_file.py` has 39 public methods, many module-specific - `fit_g2()`, `fit_tauq()`, `get_g2_data()` - tightly coupled to analysis **Recommendation:** ```python # Current (tight coupling): xf = XpcsFile(path) xf.fit_g2(q_idx=0, model='single_exp') # XpcsFile knows about fitting # Proposed (loose coupling via services): xf = XpcsFile(path) g2_data = xf.get_g2_data(q_idx=0) result = fitting.fit_single_exp(g2_data['delay_times'], g2_data['g2']) ``` **Migration Strategy:** 1. Create service layer: `services/g2_analysis.py`, `services/saxs_analysis.py` 2. Move analysis logic from `xpcs_file.py` to services 3. Keep convenience methods in `xpcs_file.py` as thin wrappers (deprecated) 4. Fully migrate after 3 releases --- ## 6. Recommended Architecture Patterns ### 6.1 Adapter Pattern for Backend I/O ```python # xpcsviewer/backends/io_adapter.py class PyQtGraphAdapter: """Adapter for PyQtGraph I/O boundary.""" def __init__(self, backend): self._backend = backend def to_pyqtgraph(self, array): """Convert backend array to PyQtGraph-compatible format.""" return ensure_numpy(array) def from_pyqtgraph(self, array): """Convert PyQtGraph array to backend format.""" return self._backend.array(array) class HDF5Adapter: """Adapter for HDF5 I/O boundary.""" def to_hdf5(self, array): return ensure_numpy(array) def from_hdf5(self, array): backend = get_backend() return backend.array(array) ``` **Usage in module/saxs1d.py:** ```python from xpcsviewer.backends.io_adapter import PyQtGraphAdapter adapter = PyQtGraphAdapter(backend) plot_item.setData(adapter.to_pyqtgraph(x), adapter.to_pyqtgraph(y)) ``` ### 6.2 Repository Pattern for HDF5 Access ```python # xpcsviewer/repositories/xpcs_repository.py class XPCSRepository: """Repository for XPCS data access.""" def __init__(self, facade: HDF5Facade): self._facade = facade def get_qmap(self, file_path: str) -> QMapSchema: return self._facade.read_qmap(file_path) def get_g2_data(self, file_path: str, q_idx: int) -> G2Data: return self._facade.read_g2(file_path, q_idx) def save_fit_result(self, file_path: str, result: FitResult) -> None: self._facade.write_fit_result(file_path, result) ``` **Benefits:** - Single source of truth for data access patterns - Easy to add caching, validation, logging - Testable without actual HDF5 files (mock the facade) ### 6.3 Event-Driven Integration (Already Implemented ✅) **SimpleMask Integration (Good Example):** ```python # simplemask/simplemask_window.py class SimpleMaskWindow(QMainWindow): mask_exported = Signal(np.ndarray) qmap_exported = Signal(dict) def export_to_viewer(self): self.mask_exported.emit(self.kernel.get_mask()) # xpcs_viewer.py def connect_simplemask_signals(self): self.simplemask_window.mask_exported.connect(self.apply_mask) self.simplemask_window.qmap_exported.connect(self.apply_qmap_result) ``` **Why This Works:** - **Loose coupling:** SimpleMask doesn't know about XPCS Viewer internals - **Testable:** Can emit signals in tests without GUI - **Extensible:** Multiple listeners can subscribe to same signal **Recommendation:** Apply this pattern to other module integrations. --- ## 7. Migration Roadmap ### Phase 1: Non-Breaking Additions (Weeks 1-4) **Goal:** Add new patterns in parallel without breaking existing code. 1. **Week 1-2:** - Create `io/hdf5_facade.py` with basic read/write methods - Create `backends/io_adapter.py` with PyQtGraph/HDF5 adapters - Add unit tests for new components 2. **Week 3-4:** - Create schema validators in `schemas/validators.py` - Add `repositories/xpcs_repository.py` - Integration tests for facade + repository **Deliverables:** - ✅ New modules coexist with old code - ✅ 100% test coverage for new code - ✅ Documentation with migration examples ### Phase 2: Gradual Migration (Weeks 5-12) **Goal:** Migrate modules one at a time with feature parity. **Priority Order:** 1. `simplemask/simplemask_kernel.py` - Self-contained, low risk 2. `module/saxs1d.py` - Low complexity, good reference 3. `module/saxs2d.py` - Similar to saxs1d 4. `module/twotime.py` - Higher complexity, more I/O points 5. `xpcs_file.py` - Core module, migrate last **Per-Module Process:** 1. Add facade/adapter usage alongside existing code 2. Run full test suite + integration tests 3. Add deprecation warnings to old code paths 4. Monitor for 2 weeks in production 5. Remove old code if no issues **Deliverables:** - ✅ All analysis modules use facade pattern - ✅ Backward compatibility maintained - ✅ Deprecation warnings in logs ### Phase 3: Cleanup and Optimization (Weeks 13-16) **Goal:** Remove deprecated code, optimize patterns. 1. **Week 13-14:** - Remove deprecated direct `h5py.File` usage - Remove deprecated `ensure_numpy()` calls outside adapters - Update documentation 2. **Week 15-16:** - Performance benchmarks for facade vs. direct access - Optimize connection pooling based on metrics - Final integration tests **Deliverables:** - ✅ Clean dependency graph with facades - ✅ Performance parity or improvement - ✅ Updated architecture documentation --- ## 8. Risk Assessment ### High-Risk Integration Points | Integration Point | Risk Level | Impact | Mitigation | |-------------------|------------|--------|------------| | `backends._conversions` | HIGH | 9 modules depend on it | Extensive tests, version lock | | HDF5 schema changes | HIGH | Breaking changes to file format | Schema versioning, migration tools | | `xpcs_file.py` refactor | MEDIUM | Core module, many dependents | Gradual migration, parallel paths | | NumPyro JAX requirement | MEDIUM | Fitting requires JAX | Clear error messages, fallback to NLSQ | | PyQtGraph array format | LOW | Well-tested, stable API | Use adapters consistently | ### Mitigation Strategies 1. **Schema Versioning:** ```python # Add to all HDF5 writes f.attrs['schema_version'] = '2.0.0' # Migration utilities def migrate_v1_to_v2(file_path): # Upgrade old files automatically pass ``` 2. **Feature Flags:** ```python # Enable new facade gradually USE_HDF5_FACADE = os.environ.get('XPCS_USE_FACADE', 'false').lower() == 'true' if USE_HDF5_FACADE: qmap = facade.read_qmap(path) else: qmap = legacy_qmap_loader(path) ``` 3. **Extensive Integration Tests:** ```python # tests/integration/test_hdf5_facade.py def test_facade_backward_compatibility(): """Ensure facade reads old HDF5 files correctly.""" old_file = "tests/data/legacy_v1.hdf5" result = facade.read_qmap(old_file) assert result.schema_version == "1.0.0" # ... validation ``` --- ## 9. Performance Implications ### Connection Pooling Impact **Current:** `fileIO/hdf_reader.py` has connection pooling (good ✅) **With Facade:** - All HDF5 access goes through single pool - Easier to monitor/tune pool size - **Estimated overhead:** < 1% (facade layer is thin) ### Backend Conversion Overhead **Measured in JAX migration tests:** - `ensure_numpy()` on GPU array: ~0.5ms for 1024x1024 array - `ensure_backend_array()`: ~0.1ms (view creation) **With Adapter Pattern:** - Same conversions, but centralized - Easier to optimize (e.g., caching small arrays) - **Estimated overhead:** 0% (same underlying operations) ### JIT Compilation Benefits **SimpleMask Q-map with JAX:** - First call: ~200ms (compilation) - Subsequent calls: ~5ms (10x faster than NumPy) - Cache hit rate: ~95% in typical usage **Recommendation:** Keep JIT caching in facade layer for best performance. --- ## 10. Conclusion ### Strengths 1. ✅ **No circular dependencies** - Clean modular architecture 2. ✅ **Backend abstraction is well-designed** - JAX/NumPy switching works seamlessly 3. ✅ **SimpleMask integration uses loose coupling** - Signal-based, testable 4. ✅ **Connection pooling exists** - HDF5 access is already optimized ### Weaknesses 1. ⚠️ **No schema validation** - Runtime errors possible from malformed HDF5 2. ⚠️ **High coupling at I/O boundaries** - 9 modules directly use `ensure_numpy()` 3. ⚠️ **Implicit data contracts** - Shared dicts without type checking 4. ⚠️ **XpcsFile is a "god object"** - 39 public methods, knows too much ### Priority Actions **Immediate (Next Sprint):** 1. Create `io/hdf5_facade.py` with schema validation 2. Add `schemas/validators.py` for QMapSchema, GeometryMetadata 3. Document all shared data structures in this file **Short-Term (Next Quarter):** 1. Migrate SimpleMask to use HDF5 facade 2. Create backend I/O adapters 3. Add integration tests for facades **Long-Term (Next Release):** 1. Refactor `xpcs_file.py` into service layer 2. Remove deprecated direct HDF5 access 3. Enforce facade pattern via linting rules --- ## Appendix A: Data Structure Reference ### A.1 Complete QMapDict Schema ```python QMapDict = TypedDict('QMapDict', { # Required fields 'sqmap': np.ndarray, # float64, shape=(H, W), Static Q magnitude 'dqmap': np.ndarray, # float64, shape=(H, W), Dynamic Q magnitude 'phis': np.ndarray, # float64, shape=(H, W), Azimuthal angle # Units (required) 'sqmap_unit': Literal["nm^-1", "A^-1"], 'dqmap_unit': Literal["nm^-1", "A^-1"], 'phis_unit': Literal["rad", "deg"], # Optional fields 'mask': Optional[np.ndarray], # int32, shape=(H, W), 0=masked, 1=valid 'partition_map': Optional[np.ndarray], # int32, Q-bin indices }) ``` ### A.2 Complete GeometryMetadata Schema ```python GeometryMetadata = TypedDict('GeometryMetadata', { # Beam center (pixels, 0-indexed) 'bcx': float, # Column (X) 'bcy': float, # Row (Y) # Detector configuration 'det_dist': float, # mm, detector-to-sample distance 'pix_dim': float, # mm, pixel size 'shape': Tuple[int, int], # (height, width) in pixels # X-ray properties 'lambda_': float, # Angstroms, wavelength # Optional 'det_rotation': Optional[float], # degrees, detector rotation 'incident_angle': Optional[float], # degrees, grazing incidence }) ``` ### A.3 Partition Dictionary Schema ```python PartitionDict = TypedDict('PartitionDict', { 'version': str, # Schema version, e.g., "1.0.0" 'partition_map': np.ndarray, # int32, shape=(H, W), bin indices 'num_pts': int, # Number of Q-bins 'val_list': List[float], # Q-bin center values (length=num_pts) 'num_list': List[int], # Pixels per bin (length=num_pts) 'metadata': GeometryMetadata, # Geometry used for computation # Optional 'mask': Optional[np.ndarray], # Mask used during partitioning 'method': Optional[Literal["linear", "log"]], # Binning method }) ``` --- **Document Version:** 1.0 **Author:** Architecture Review **Next Review Date:** 2026-02-06 (1 month)