ADR-002: Migration from scipy.optimize to NLSQ 0.6.0¶
Status¶
Accepted
Context¶
The XPCS Viewer fitting module originally used scipy.optimize.curve_fit for nonlinear least squares (NLSQ) fitting of G2 correlation functions. This approach had several limitations:
No JAX integration:
scipy.optimize.curve_fitoperates on NumPy arrays only. When the backend is JAX, data must be copied to CPU, fit with scipy, then results copied back – defeating the purpose of GPU acceleration.No native diagnostics: scipy provides only
poptandpcov. Computing R-squared, AIC, BIC, prediction intervals, or model health diagnostics required manual implementation (hundreds of LOC of boilerplate).No automatic stability fixes: Users encountering ill-conditioned problems had to manually adjust bounds, scaling, or initial guesses.
No fallback strategies: When fitting failed, scipy raised an exception with no recovery path.
Duplicated statistics code: The codebase contained ~400 LOC of custom statistical computation that replicated what a proper fitting library should provide natively.
NLSQ 0.6.0 was selected because it provides:
Native
CurveFitResultwith built-in statistical properties (R-squared, AIC, BIC, RMSE, MAE, prediction intervals).Preset configurations (
fast,robust,global,streaming,large) for different use cases.auto_bounds=Truefor automatic bound inference from data.stability='auto'for automatic numerical fixes.fallback=Truefor automatic fallback strategies on failure.compute_diagnostics=TrueforModelHealthReportwith health scores.
Decision¶
We migrated the fitting module to use NLSQ 0.6.0’s native API with a delegation pattern for the NLSQResult class.
Architecture¶
xpcsviewer/fitting/
__init__.py # Public API: fit_single_exp, nlsq_fit, etc.
models.py # NumPyro probabilistic models + JIT-compiled curve functions
nlsq.py # nlsq_optimize(): NLSQ 0.6.0 curve fitting wrapper
sampler.py # NumPyro NUTS sampling with NLSQ warm-start
results.py # NLSQResult (delegation), FitResult (Bayesian), SamplerConfig
visualization.py # Plotting: posterior predictive, NLSQ fit, ArviZ diagnostics
legacy.py # scipy.optimize compatibility layer (deprecated)
Key Design Choices¶
Property delegation pattern:
NLSQResultstores the nativeCurveFitResultobject and delegates statistical property access to it. When the native result isNone(legacy code path or fitting failure), private fallback fields (_r_squared,_adj_r_squared, etc.) are used instead.@property def r_squared(self) -> float: if self.native_result is not None: return self.native_result.r_squared # Delegation return self._r_squared # Legacy fallback
Setter no-ops when delegation is active (BUG-039): Property setters on
NLSQResultlog a warning and do nothing whennative_resultis set. This prevents accidental overwriting of the authoritative native values.Sentinel fallback for failed fits (BUG-003): When NLSQ fitting fails,
nlsq_optimize()returns a sentinelNLSQResultwithis_fallback=True,converged=False, and all metrics set tofloat('nan'). Callers detect failure viamath.isnan(result.r_squared)rather than catching exceptions.NaN defaults for failed fits (BUG-023): All statistical metrics default to
float('nan')rather than0.0, making failed fits distinguishable from poor-but-converged fits.Covariance validation (FR-021): The
validate_pcov()function checks the covariance matrix for infinities, NaN values, and negative diagonal elements, settingpcov_validandpcov_messageaccordingly.Warm-start pipeline: The complete fitting workflow is:
nlsq_optimize()runs NLSQ 0.6.0 to get point estimates.Point estimates initialize NumPyro NUTS sampling as the warm-start.
run_single_exp_fit()insampler.pyorchestrates the full pipeline.BFMI (Bayesian Fraction of Missing Information) is computed for convergence diagnostics per the Technical Guidelines.
JIT-compiled model functions: Curve functions (
single_exp_func,double_exp_func, etc.) inmodels.pyare decorated with@_maybe_jit, which appliesjax.jitwhen JAX is available and is a no-op otherwise.
Statistical Properties (Delegated from CurveFitResult)¶
Property |
Description |
Source |
|---|---|---|
|
Coefficient of determination |
|
|
Adjusted R-squared |
|
|
Root mean squared error |
|
|
Mean absolute error |
|
|
Akaike Information Criterion |
|
|
Bayesian Information Criterion |
|
|
Fit residuals |
|
|
Model predictions |
|
|
Parameter covariance matrix |
|
|
95% CI per parameter |
|
|
ModelHealthReport |
|
|
0-100 health score |
|
|
Identifiability metric |
|
Consequences¶
What became easier¶
~400 LOC reduction: Eliminated custom statistical computation code. All metrics come from the native
CurveFitResult.Robust fitting:
stability='auto'andfallback=Truehandle ill-conditioned problems automatically.Model selection: AIC and BIC are available natively, enabling principled model comparison (single vs. double vs. stretched exponential).
Health diagnostics:
compute_diagnostics=Trueprovides aModelHealthReportwith health scores and identifiability analysis, surfaced directly throughNLSQResult.health_scoreandNLSQResult.condition_number.Prediction intervals:
get_prediction_interval(x_new)delegates to the native result, providing proper prediction intervals rather than the rough+/- 2*RMSEapproximation.
What became more difficult¶
Additional dependency: NLSQ 0.6.0 is now a required dependency for fitting. If NLSQ is not installed, the fitting module raises an
ImportError.Delegation complexity: The property delegation pattern in
NLSQResultrequires careful handling of the dual path (native vs. legacy). New properties must implement both paths.Version coupling: The codebase depends on NLSQ 0.6.0’s specific API surface (
CurveFitResultproperties,ModelHealthReportstructure). Major NLSQ API changes would require updates to the delegation layer.Health check fragility (BUG-040): Determining
is_healthyrequires checking multiple possible API patterns (is_healthyattribute,health_score >= 70, enum value comparison) to handle different NLSQ versions gracefully.