Fitting G2 Correlation Functions

This tutorial covers the full fitting pipeline for XPCS G2 correlation functions: from quick NLSQ point estimates to full Bayesian inference with NumPyro NUTS.

Overview of the Fitting Pipeline

The xpcsviewer fitting module provides two complementary approaches:

  1. NLSQ (Non-Linear Least Squares) – Fast point estimates using nlsq 0.6.0. Returns NLSQResult with R², confidence intervals, and model diagnostics.

  2. Bayesian NUTS (No-U-Turn Sampler) – Full posterior via NumPyro. Uses NLSQ warm-start for initialization. Returns FitResult with posterior samples, HDI intervals, and ArviZ diagnostics.

Physical Models

The G2 autocorrelation for a single relaxation process is:

\[g_2(\tau) = \text{baseline} + \text{contrast} \cdot \exp(-2\tau / \tau_c)\]

where \(\tau_c\) is the characteristic relaxation time, baseline is typically near 1.0, and contrast is the Siegert factor.

For systems with two relaxation modes:

\[g_2(\tau) = \text{baseline} + c_1 \exp(-2\tau / \tau_1) + c_2 \exp(-2\tau / \tau_2)\]

Available model functions in xpcsviewer.fitting.models:

  • single_exp_func()baseline + contrast * exp(-2x/tau)

  • double_exp_func() – Two-component decay

  • stretched_exp_func() – Stretched exponential

  • power_law_func() – Power-law decay

Part 1: NLSQ Fitting

Quick Point Estimates

The nlsq_fit() function provides NLSQ fitting with the native nlsq 0.6.0 API:

import numpy as np
from xpcsviewer.fitting import nlsq_fit
from xpcsviewer.fitting.models import single_exp_func

# Synthetic G2 data
np.random.seed(42)
tau_true = 5.0
baseline_true = 1.0
contrast_true = 0.3
t = np.logspace(-2, 3, 100)
g2_clean = baseline_true + contrast_true * np.exp(-2 * t / tau_true)
g2_noise = g2_clean + np.random.normal(0, 0.005, size=t.shape)
g2_err = np.full_like(t, 0.005)

result = nlsq_fit(
    model_fn=single_exp_func,
    x=t,
    y=g2_noise,
    yerr=g2_err,
    p0={"tau": 1.0, "baseline": 1.0, "contrast": 0.3},
    bounds={
        "tau": (1e-6, 1e6),
        "baseline": (0.5, 2.0),
        "contrast": (0.0, 1.0),
    },
)

print(f"Converged: {result.converged}")
print(f"tau = {result.params['tau']:.3f}")
print(f"baseline = {result.params['baseline']:.4f}")
print(f"contrast = {result.params['contrast']:.4f}")

Examining Fit Quality

NLSQResult provides comprehensive fit quality metrics:

# Statistical metrics (delegated to native nlsq 0.6.0 CurveFitResult)
print(f"R-squared:      {result.r_squared:.6f}")
print(f"Adjusted R-sq:  {result.adj_r_squared:.6f}")
print(f"RMSE:           {result.rmse:.6e}")
print(f"MAE:            {result.mae:.6e}")
print(f"Chi-squared:    {result.chi_squared:.4f}")

# Model selection criteria
print(f"AIC:            {result.aic:.2f}")
print(f"BIC:            {result.bic:.2f}")

# Parameter uncertainties from covariance
for name in result.params:
    std_err = result.get_param_uncertainty(name)
    ci_lo, ci_hi = result.get_confidence_interval(name)
    print(f"  {name}: {result.params[name]:.4f} +/- {std_err:.4f}"
          f"  95% CI: [{ci_lo:.4f}, {ci_hi:.4f}]")

# Full formatted summary
print(result.summary())

Robust Fitting Options

Use presets, stability checks, and fallback strategies for difficult data:

result = nlsq_fit(
    model_fn=single_exp_func,
    x=t, y=g2_noise, yerr=g2_err,
    p0={"tau": 1.0, "baseline": 1.0, "contrast": 0.3},
    bounds={"tau": (1e-6, 1e6), "baseline": (0.5, 2.0), "contrast": (0, 1)},
    workflow='auto_global',    # Multi-start global optimization
    stability="auto",         # Automatic numerical fixes
    fallback=True,            # Fallback strategies for hard problems
    compute_diagnostics=True, # Enable model health report
)

# Check model health
if result.diagnostics is not None:
    print(f"Health score: {result.health_score}/100")
    print(f"Is healthy: {result.is_healthy}")
    print(f"Condition number: {result.condition_number:.2f}")

Prediction Intervals

Generate prediction intervals at new x values:

x_fine = np.logspace(-2, 3, 500)
pi_lower, pi_upper = result.get_prediction_interval(x_fine)

import matplotlib.pyplot as plt

fig, ax = plt.subplots(figsize=(8, 5))
ax.errorbar(t, g2_noise, yerr=g2_err, fmt="o", ms=3, label="Data")
ax.fill_between(x_fine, pi_lower, pi_upper, alpha=0.3, label="95% PI")
ax.set_xscale("log")
ax.set_xlabel("Delay time (s)")
ax.set_ylabel(r"$g_2(\tau)$")
ax.legend()
plt.tight_layout()
plt.show()

Part 2: Bayesian Fitting

The fit_single_exp() function runs the full Bayesian pipeline: NLSQ warm-start followed by NumPyro NUTS sampling.

Running the Sampler

from xpcsviewer.fitting import fit_single_exp

# Requires NumPyro and JAX
result = fit_single_exp(
    x=t,
    y=g2_noise,
    yerr=g2_err,
    # SamplerConfig options
    num_warmup=500,
    num_samples=1000,
    num_chains=4,
    target_accept_prob=0.8,
    random_seed=42,
)

# Posterior summaries
print(f"tau = {result.get_mean('tau'):.3f} +/- {result.get_std('tau'):.3f}")
print(f"baseline = {result.get_mean('baseline'):.4f} +/- {result.get_std('baseline'):.4f}")
print(f"contrast = {result.get_mean('contrast'):.4f} +/- {result.get_std('contrast'):.4f}")

The result includes the NLSQ warm-start values used for initialization:

print(f"NLSQ warm-start: {result.nlsq_init}")

Convergence Diagnostics

Always check diagnostics before trusting posterior estimates:

diag = result.diagnostics

print(f"Converged: {diag.converged}")
print(f"R-hat: {diag.r_hat}")
print(f"ESS (bulk): {diag.ess_bulk}")
print(f"ESS (tail): {diag.ess_tail}")
print(f"Divergences: {diag.divergences}")
print(f"BFMI: {diag.bfmi}")

# Rule of thumb:
# - R-hat < 1.01 for all parameters
# - ESS > 400 for reliable estimates
# - Divergences = 0
# - BFMI > 0.3

Highest Density Intervals

# 94% HDI (default)
hdi_lo, hdi_hi = result.get_hdi("tau")
print(f"tau 94% HDI: [{hdi_lo:.3f}, {hdi_hi:.3f}]")

# Custom probability
hdi_lo, hdi_hi = result.get_hdi("tau", prob=0.89)
print(f"tau 89% HDI: [{hdi_lo:.3f}, {hdi_hi:.3f}]")

ArviZ Integration

The result contains an xarray DataTree object (ArviZ 1.0+ format) for advanced diagnostics:

import arviz as az

# Trace plot
az.plot_trace(result.arviz_data)

# Forest plot
az.plot_forest(result.arviz_data)

# Generate full diagnostic suite
diagnostics = result.generate_diagnostics(
    output_dir="./diagnostics",
    formats=("pdf", "png"),
)

Posterior Predictive Checks

from xpcsviewer.fitting.models import single_exp_func

fig = result.plot_posterior_predictive(
    model=single_exp_func,
    x_data=t,
    y_data=g2_noise,
)
plt.show()

Serialization

Export the full result including reproducibility metadata:

result_dict = result.to_dict()

# Includes:
# - samples: posterior draws
# - nlsq_init: warm-start values
# - diagnostics: R-hat, ESS, BFMI
# - versions: package versions (xpcsviewer, numpyro, jax, arviz)
# - sampler_config: num_warmup, num_samples, etc.
# - data_metadata: n_points, x_range

import json
with open("fit_result.json", "w") as f:
    json.dump(result_dict, f, indent=2)

Part 3: Fitting from XpcsFile

The fit_g2() method integrates fitting directly into the data loading workflow:

from xpcsviewer.xpcs_file import XpcsFile

with XpcsFile("/path/to/data.hdf") as xf:
    fit_summary = xf.fit_g2(
        q_range=(0.01, 0.05),
        bounds=[[0.9, 1e-6, 0.0, 0.5], [1.1, 1e3, 0.5, 1.5]],
        fit_func="single",     # or "double"
        robust_fitting=False,   # Set True for enhanced reliability
        use_parallel=True,      # Parallel fitting across Q bins
    )

    # Access results
    print(f"Fit function: {fit_summary['fit_func']}")
    print(f"Q values: {fit_summary['q_val']}")
    print(f"Fit parameters: {fit_summary['fit_val'].shape}")

    # fit_summary['fit_line'] contains the smooth fit curves
    # fit_summary['fit_x'] contains the x values for the fit curves

Model Comparison

Compare single vs. double exponential using AIC/BIC:

from xpcsviewer.fitting import nlsq_fit
from xpcsviewer.fitting.models import single_exp_func, double_exp_func

# Fit single exponential
result_single = nlsq_fit(
    model_fn=single_exp_func,
    x=t, y=g2_noise, yerr=g2_err,
    p0={"tau": 1.0, "baseline": 1.0, "contrast": 0.3},
    bounds={"tau": (1e-6, 1e6), "baseline": (0.5, 2.0), "contrast": (0, 1)},
)

# Fit double exponential
result_double = nlsq_fit(
    model_fn=double_exp_func,
    x=t, y=g2_noise, yerr=g2_err,
    p0={"tau1": 1.0, "tau2": 50.0, "baseline": 1.0,
        "contrast1": 0.2, "contrast2": 0.1},
    bounds={
        "tau1": (1e-6, 1e4), "tau2": (1e-6, 1e6),
        "baseline": (0.5, 2.0),
        "contrast1": (0, 1), "contrast2": (0, 1),
    },
)

print(f"Single exp -- AIC: {result_single.aic:.2f}, BIC: {result_single.bic:.2f}")
print(f"Double exp -- AIC: {result_double.aic:.2f}, BIC: {result_double.bic:.2f}")
print(f"Prefer: {'single' if result_single.bic < result_double.bic else 'double'}"
      f" (lower BIC)")

Next Steps