Fit PDF with conditional variable
In this example, we show an unusual fit where the total sample is not drawn form a single probability distribution, but each individual sample
In our example, we are drawing samples
This tutorial reproduces a corresponding one from RooFit.
[ ]:
%config InlineBackend.figure_formats = ['svg']
import iminuit
from iminuit.cost import UnbinnedNLL
from iminuit import Minuit
import numpy as np
import numba as nb
import boost_histogram as bh
import matplotlib.pyplot as plt
from scipy.stats import norm
from numba_stats import norm as norm_nb
print("iminuit version", iminuit.__version__)
iminuit version 2.28.0
[ ]:
rng = np.random.default_rng(1)
# conditional variable: each sample is paired with a random y parameter
y = rng.normal(0, 10, size=10000)
y = y[np.abs(y) < 10] # truncate at 10
# location of each gaussian is a function of y
def mu(y, a, b):
return a + b * y
# draw samples from Gaussians whose locations depend on y
truth = {"a": 0, "b": 0.5, "sigma": 1.0}
x = rng.normal(mu(y, truth["a"], truth["b"]), truth["sigma"])
The distribution in
[ ]:
ax_x = bh.axis.Regular(100, -10, 10)
ax_y = bh.axis.Regular(5, -10, 10)
h = bh.Histogram(ax_x, ax_y)
h.fill(x, y)
for i, (a, b) in enumerate(ax_y):
plt.stairs(h.values()[:, i], ax_x.edges, label=f"[{a}, {b})", fill=True, alpha=0.2)
h1 = h[:, sum]
plt.stairs(h1.values(), ax_x.edges, color="k", label="total")
plt.xlabel("x")
plt.ylabel("events")
plt.legend(title="y interval", frameon=False, handlelength=1.2);
Fit with conditional variable
The random distribution of
[ ]:
def model(xy, a, b, sigma):
x, y = xy
mu = a + b * y
# cannot use norm.pdf from numba_stats here, because it is not vectorized in mu
return norm.pdf(x, mu, sigma)
nll = UnbinnedNLL((x, y), model)
m = Minuit(nll, 0.0, 0.0, 2.0)
m.limits["sigma"] = (0, None)
m.migrad()
Migrad | |
---|---|
FCN = 1.93e+04 | Nfcn = 130 |
EDM = 2.25e-06 (Goal: 0.0002) | time = 0.1 sec |
Valid Minimum | Below EDM threshold (goal x 10) |
No parameters at limit | Below call limit |
Hesse ok | Covariance accurate |
Name | Value | Hesse Error | Minos Error- | Minos Error+ | Limit- | Limit+ | Fixed | |
---|---|---|---|---|---|---|---|---|
0 | a | -0.010 | 0.012 | |||||
1 | b | 0.4993 | 0.0022 | |||||
2 | sigma | 0.986 | 0.008 | 0 |
a | b | sigma | |
---|---|---|---|
a | 0.000142 | 0e-6 | 0 |
b | 0e-6 | 4.76e-06 | 0e-6 |
sigma | 0 | 0e-6 | 7.08e-05 |
[ ]:
# construct model representation for comparison with data histogram
a, b, sigma = m.values
# get expected content per bin from cdf, sum over the individual cdfs
v = np.diff(np.sum(norm.cdf(ax_x.edges[:, np.newaxis], mu(y, a, b), sigma), axis=1))
plt.stairs(v, ax_x.edges, label="model", zorder=5, lw=2)
plt.errorbar(ax_x.centers, h1.values(), h1.variances() ** 0.5, fmt="ok", label="data")
plt.xlabel("x")
plt.ylabel("events")
plt.legend(frameon=False);
Fit without conditional variable
We can also ignore the dependence of
On top of that, the calculation is much slower, because building the pdf is more expensive. We parallelise the computation with numba.
[ ]:
nb.config.THREADING_LAYER = "workqueue"
@nb.njit(parallel=True, fastmath=True)
def model(x, a, b, sigma):
mu = a + b * y
total = np.zeros_like(x)
for i in nb.prange(len(mu)):
total += norm_nb.pdf(x, mu[i], sigma)
return total
nll = UnbinnedNLL(x, model)
m2 = Minuit(nll, 0.0, 0.0, 2.0)
m2.limits["sigma"] = (0, None)
m2.migrad()
Migrad | |
---|---|
FCN = -8.774e+04 | Nfcn = 95 |
EDM = 1.33e-06 (Goal: 0.0002) | time = 12.5 sec |
Valid Minimum | Below EDM threshold (goal x 10) |
No parameters at limit | Below call limit |
Hesse ok | Covariance accurate |
Name | Value | Hesse Error | Minos Error- | Minos Error+ | Limit- | Limit+ | Fixed | |
---|---|---|---|---|---|---|---|---|
0 | a | 0.002 | 0.029 | |||||
1 | b | 0.500 | 0.005 | |||||
2 | sigma | 0.98 | 0.04 | 0 |
a | b | sigma | |
---|---|---|---|
a | 0.000839 | 0.004e-3 (0.030) | -0 (-0.027) |
b | 0.004e-3 (0.030) | 2.43e-05 | -0.141e-3 (-0.718) |
sigma | -0 (-0.027) | -0.141e-3 (-0.718) | 0.0016 |
[ ]:
fig, ax = plt.subplots(1, 3, figsize=(8, 2), constrained_layout=True)
for par, axi in zip(m.parameters, ax):
axi.set_title(par)
t = truth[par]
axi.axhline(t, ls="--", color="0.5")
axi.errorbar(["with\n conditional"], m.values[par], m.errors[par], fmt="ok")
axi.errorbar(["without\n conditional"], m2.values[par], m2.errors[par], fmt="or")
axi.set_xlim(-0.5, 1.5)
dt = 2 * m2.errors[par]
axi.set_ylim(t - dt, t + dt)