Cost functions

We give an in-depth guide on how to use the built-in cost functions.

The iminuit package comes with a couple of common cost functions that you can import from iminuit.cost for convenience. Of course, you can write your own cost functions to use with iminuit, but most of the cost function is always the same. What really varies is the statistical model which predicts the probability density as a function of the parameter values. This you still have to provide yourself and the iminuit package will not include machinery to build statistical models (that is out of scope).

Using the built-in cost functions is not only convenient, they also have some extra features.

Support of fitted weighted histograms.
Technical tricks improve numerical stability.
Optional numba acceleration (if numba is installed).
Cost functions can be added to fit data sets with shared parameters.
Temporarily mask data.

We demonstrate each cost function on a standard example from high-energy physics, the fit of a peak over some smooth background (here taken to be constant).

[1]:

%config InlineBackend.figure_formats = ['svg']
from iminuit import cost, Minuit

# faster than scipy.stats functions
from numba_stats import truncnorm, truncexpon, norm, expon
import numpy as np
from matplotlib import pyplot as plt

# accurate numerical derivatives
from jacobi import jacobi

We generate our data by sampling from a Gaussian peak and from exponential background in the range 0 to 2. The original data is then binned. One can fit the original or the binned data.

[2]:

xr = (0, 2)  # xrange

rng = np.random.default_rng(1)

xdata = rng.normal(1, 0.1, size=1000)
ydata = rng.exponential(size=len(xdata))
xmix = np.append(xdata, ydata)
xmix = xmix[(xr[0] < xmix) & (xmix < xr[1])]

n, xe = np.histogram(xmix, bins=20, range=xr)
cx = 0.5 * (xe[1:] + xe[:-1])
dx = np.diff(xe)

plt.errorbar(cx, n, n**0.5, fmt="ok")
plt.plot(xmix, np.zeros_like(xmix), "|", alpha=0.1);

../_images/notebooks_cost_functions_3_0.svg

We also generate some 2D data to demonstrate multivariate fits. In this case, a Gaussian along axis 1 and independently an exponential along axis 2. In this case, the distributions are not restricted to some range in x and y.

[3]:

n2, _, ye = np.histogram2d(xdata, ydata, bins=(20, 5), range=(xr, (0, np.max(ydata))))

plt.pcolormesh(xe, ye, n2.T)
plt.scatter(xdata, ydata, marker=".", color="w", s=1);

../_images/notebooks_cost_functions_5_0.svg

Maximum-likelihood fits

Maximum-likelihood fits are the state-of-the-art when it comes to fitting models to data. They can be applied to unbinned and binned data (histograms).

Unbinned fits are the easiest to use, because no data binning is needed. They become slow when the sample size is large.
Binned fits require you to appropriately bin the data. The binning has to be fine enough to retain all essential information. Binned fits are much faster when the sample size is large.

Unbinned fit

Unbinned fits are ideal when the data samples are not too large or very high dimensional. There is no need to worry about the appropriate binning of the data. Unbinned fits are inefficient when the samples are very large and can become numerically unstable, too. Binned fits are a better choice then.

The cost function for an unbinned maximum-likelihood fit is really simple, it is the sum of the logarithm of the pdf evaluated at each sample point (times -1 to turn maximization into minimization). You can easily write this yourself, but a naive implementation will suffer from instabilities when the pdf becomes locally zero. Our implementation mitigates the instabilities to some extent.

To perform the unbinned fit you need to provide the pdf of the model, which must be vectorized (a Numpy ufunc). The pdf must be normalized, which means that the integral over the sample value range must be a constant for any combination of model parameters.

The model pdf in this case is a linear combination of the normal and the exponential pdfs. The parameters are $z$ (the weight), $μ$ and $σ$ of the normal distribution and $τ$ of the exponential. The cost function detects the parameter names.

It is important to put appropriate limits on the parameters, so that the problem does not become mathematically undefined.

$0 < z < 1$ ,
$σ > 0$ ,
$τ > 0$ .

In addition, it can be beneficial to use $0 < μ < 2$ , but it is not required. We use truncnorm and truncexpon, which are normalized inside the data range (0, 2).

[4]:

def pdf(x, z, mu, sigma, tau):
    return z * truncnorm.pdf(x, *xr, mu, sigma) + (1 - z) * truncexpon.pdf(
        x, *xr, 0.0, tau
    )


c = cost.UnbinnedNLL(xmix, pdf)

m = Minuit(c, z=0.4, mu=1, sigma=0.2, tau=1)
m.limits["z"] = (0, 1)
m.limits["mu"] = (0, 2)
m.limits["sigma", "tau"] = (0, None)
m.migrad()

[4]:

Migrad
FCN = 768.1	Nfcn = 111
EDM = 3.76e-06 (Goal: 0.0002)	time = 1.7 sec
Valid Minimum	Below EDM threshold (goal x 10)
No parameters at limit	Below call limit
Hesse ok	Covariance accurate

	Name	Value	Hesse Error	Minos Error-	Minos Error+	Limit-	Limit+	Fixed
0	z	0.537	0.015			0	1
1	mu	0.996	0.004			0	2
2	sigma	0.1006	0.0035			0
3	tau	1.05	0.08			0

	z	mu	sigma	tau
z	0.000231	-0.003e-3 (-0.052)	0.019e-3 (0.353)	-0.24e-3 (-0.206)
mu	-0.003e-3 (-0.052)	1.5e-05	-0.001e-3 (-0.069)	-0.015e-3 (-0.053)
sigma	0.019e-3 (0.353)	-0.001e-3 (-0.069)	1.25e-05	-0.043e-3 (-0.163)
tau	-0.24e-3 (-0.206)	-0.015e-3 (-0.053)	-0.043e-3 (-0.163)	0.00568

If the gradient of the model is available, it can be passed to the cost function to enable the computation of its gradient, which Minuit then uses to potentially improve the minimization. We use a numerically computed gradient here obtained from the jacobi library. This is for demonstration purpose only and generally not recommended, since jacobi computes the gradient much more accurately than what is required for Minuit.

[5]:

def grad(x, *par):
    return jacobi(lambda par: pdf(x, *par), par)[0].T


c = cost.UnbinnedNLL(xmix, pdf, grad=grad)

m = Minuit(c, z=0.4, mu=1, sigma=0.2, tau=1)
m.limits["z"] = (0, 1)
m.limits["mu"] = (0, 2)
m.limits["sigma", "tau"] = (0, None)
m.migrad()

[5]:

Migrad
FCN = 768.1	Nfcn = 87, Ngrad = 5
EDM = 3.31e-05 (Goal: 0.0002)	time = 0.1 sec
Valid Minimum	Below EDM threshold (goal x 10)
No parameters at limit	Below call limit
Hesse ok	Covariance accurate

	Name	Value	Hesse Error	Minos Error-	Minos Error+	Limit-	Limit+	Fixed
0	z	0.537	0.015			0	1
1	mu	0.996	0.004			0	2
2	sigma	0.1006	0.0035			0
3	tau	1.05	0.08			0

	z	mu	sigma	tau
z	0.000231	-0.003e-3 (-0.052)	0.019e-3 (0.353)	-0.24e-3 (-0.206)
mu	-0.003e-3 (-0.052)	1.5e-05	-0.001e-3 (-0.069)	-0.015e-3 (-0.053)
sigma	0.019e-3 (0.353)	-0.001e-3 (-0.069)	1.25e-05	-0.043e-3 (-0.163)
tau	-0.24e-3 (-0.206)	-0.015e-3 (-0.053)	-0.043e-3 (-0.163)	0.00569

We can also fit a multivariate model to multivariate data. We pass model as a logpdf this time, which works well because the pdfs factorize.

[6]:

def logpdf(xy, mu, sigma, tau):
    x, y = xy
    return norm.logpdf(x, mu, sigma) + expon.logpdf(y, 0, tau)


c = cost.UnbinnedNLL((xdata, ydata), logpdf, log=True)
m = Minuit(c, mu=1, sigma=2, tau=2)
m.limits["sigma", "tau"] = (0, None)
m.migrad()

[6]:

Migrad
FCN = 147.6	Nfcn = 134
EDM = 2.1e-06 (Goal: 0.0002)
Valid Minimum	Below EDM threshold (goal x 10)
No parameters at limit	Below call limit
Hesse ok	Covariance accurate

	Name	Value	Hesse Error	Minos Error-	Minos Error+	Limit-	Limit+	Fixed
0	mu	0.9946	0.0031
1	sigma	0.0986	0.0022			0
2	tau	0.972	0.031			0

	mu	sigma	tau
mu	9.73e-06	0e-6	-0e-6
sigma	0e-6	4.86e-06	-0e-6
tau	-0e-6	-0e-6	0.000944

And we can also use a gradient as before.

[7]:

def grad(xy, *par):
    return jacobi(lambda p: logpdf(xy, *p), par)[0].T


c = cost.UnbinnedNLL((xdata, ydata), logpdf, log=True, grad=grad)
m = Minuit(c, mu=1, sigma=2, tau=2)
m.limits["sigma", "tau"] = (0, None)
m.migrad()

[7]:

Migrad
FCN = 147.6	Nfcn = 81, Ngrad = 9
EDM = 3.73e-05 (Goal: 0.0002)
Valid Minimum	Below EDM threshold (goal x 10)
No parameters at limit	Below call limit
Hesse ok	Covariance accurate

	Name	Value	Hesse Error	Minos Error-	Minos Error+	Limit-	Limit+	Fixed
0	mu	0.9946	0.0031
1	sigma	0.0986	0.0022			0
2	tau	0.972	0.031			0

	mu	sigma	tau
mu	9.73e-06	0e-6	-0e-6
sigma	0e-6	4.87e-06	-0e-6
tau	-0e-6	-0e-6	0.000944

Extended unbinned fit

An important variant of the unbinned ML fit is described by Roger Barlow, Nucl.Instrum.Meth.A 297 (1990) 496-506. Use this if both the shape and the integral of the density are of interest. In practice, this is often the case, for example, if you want to estimate a cross-section or yield.

The model in this case has to return the integral of the density and the density itself (which must be vectorized). The parameters in this case are those already discussed in the previous section and in addition $s$ (integral of the signal density), $b$ (integral of the uniform density). The additional limits are:

$s > 0$ ,
$b > 0$ .

Compared to the previous case, we have one more parameter to fit.

[8]:

def density(x, s, b, mu, sigma, tau):
    return s + b, (
        s * truncnorm.pdf(x, *xr, mu, sigma) + b * truncexpon.pdf(x, *xr, 0, tau)
    )


c = cost.ExtendedUnbinnedNLL(xmix, density)

m = Minuit(c, s=300, b=1500, mu=0, sigma=0.2, tau=2)
m.limits["s", "b", "sigma", "tau"] = (0, None)
m.migrad()

[8]:

Migrad
FCN = -2.388e+04	Nfcn = 362
EDM = 2.8e-07 (Goal: 0.0002)
Valid Minimum	Below EDM threshold (goal x 10)
No parameters at limit	Below call limit
Hesse ok	Covariance accurate

	Name	Value	Hesse Error	Minos Error-	Minos Error+	Limit-	Limit+	Fixed
0	s	1.01e3	0.04e3			0
1	b	872	35			0
2	mu	0.996	0.004
3	sigma	0.1006	0.0035			0
4	tau	1.05	0.08			0

	s	b	mu	sigma	tau
s	1.36e+03	-0.4e3 (-0.272)	-5.740e-3 (-0.040)	35.547e-3 (0.272)	-0.440 (-0.158)
b	-0.4e3 (-0.272)	1.22e+03	5.739e-3 (0.042)	-35.543e-3 (-0.288)	0.440 (0.167)
mu	-5.740e-3 (-0.040)	5.739e-3 (0.042)	1.5e-05	-0.001e-3 (-0.068)	-0.015e-3 (-0.053)
sigma	35.547e-3 (0.272)	-35.543e-3 (-0.288)	-0.001e-3 (-0.068)	1.25e-05	-0.043e-3 (-0.161)
tau	-0.440 (-0.158)	0.440 (0.167)	-0.015e-3 (-0.053)	-0.043e-3 (-0.161)	0.00568

The fitted values and the uncertainty estimates for the shape parameters are identical to the previous fit.

[9]:

m.visualize()

../_images/notebooks_cost_functions_17_0.svg

Once again, we fit 2D data, using the log-density mode.

[10]:

def logdensity(xy, n, mu, sigma, tau):
    x, y = xy
    return n, np.log(n) + norm.logpdf(x, mu, sigma) + expon.logpdf(y, 0, tau)


c = cost.ExtendedUnbinnedNLL((xdata, ydata), logdensity, log=True)
m = Minuit(c, n=1, mu=1, sigma=2, tau=2)
m.limits["n", "sigma", "tau"] = (0, None)
m.migrad()

[10]:

Migrad
FCN = -1.167e+04	Nfcn = 328
EDM = 4.42e-06 (Goal: 0.0002)
Valid Minimum	Below EDM threshold (goal x 10)
No parameters at limit	Below call limit
Hesse ok	Covariance accurate

	Name	Value	Hesse Error	Minos Error-	Minos Error+	Limit-	Limit+	Fixed
0	n	1000	32			0
1	mu	0.9946	0.0031
2	sigma	0.0986	0.0022			0
3	tau	0.972	0.031			0

	n	mu	sigma	tau
n	1e+03	0e-6	0e-6	-0
mu	0e-6	9.73e-06	0e-6 (0.001)	-0e-6
sigma	0e-6	0e-6 (0.001)	4.86e-06	-0e-6
tau	-0	-0e-6	-0e-6	0.000944

Binned Fit

Binned fits are computationally more efficient and numerically more stable when samples are large. The caveat is that one has to choose an appropriate binning. The binning should be fine enough so that the essential information in the original is retained. Using large bins does not introduce a bias, but the parameters have a larger-than-minimal variance.

In this case, 50 bins are fine enough to retain all information. Using many bins is safe, since the maximum-likelihood method correctly takes Poisson statistics into account, which works even if bins have zero entries. Using more bins than necessary just increases the computational cost.

Instead of a pdf, you need to provide a cdf for a binned fit (which must be vectorized).

[11]:

def cdf(xe, z, mu, sigma, tau):
    return z * truncnorm.cdf(xe, *xr, mu, sigma) + (1 - z) * truncexpon.cdf(
        xe, *xr, 0, tau
    )


c = cost.BinnedNLL(n, xe, cdf)
m = Minuit(c, z=0.4, mu=0, sigma=0.2, tau=2)
m.limits["z"] = (0, 1)
m.limits["sigma", "tau"] = (0.01, None)
m.migrad()

[11]:

Migrad
FCN = 15.03 (χ²/ndof = 0.9)	Nfcn = 270
EDM = 5.28e-06 (Goal: 0.0002)	time = 6.1 sec
Valid Minimum	Below EDM threshold (goal x 10)
No parameters at limit	Below call limit
Hesse ok	Covariance accurate

	Name	Value	Hesse Error	Minos Error-	Minos Error+	Limit-	Limit+	Fixed
0	z	0.540	0.015			0	1
1	mu	0.995	0.004
2	sigma	0.100	0.004			0.01
3	tau	1.05	0.08			0.01

	z	mu	sigma	tau
z	0.000235	-0.004e-3 (-0.067)	0.020e-3 (0.354)	-0.24e-3 (-0.209)
mu	-0.004e-3 (-0.067)	1.63e-05	-0.001e-3 (-0.090)	-0.015e-3 (-0.050)
sigma	0.020e-3 (0.354)	-0.001e-3 (-0.090)	1.43e-05	-0.045e-3 (-0.160)
tau	-0.24e-3 (-0.209)	-0.015e-3 (-0.050)	-0.045e-3 (-0.160)	0.00564

iminuit also shows the chi-square goodness-of-fit test statistic when the data are binned. It is calculated for free in the binned case.

Sometimes the cdf is expensive to calculate. In this case, you can approximate it via the cumulated sum of “bin-width times pdf evaluated at center”. This approxmiation may lead to a bias. Using an accurate cdf avoids this bias.

Here is the same example fitted with an approximate cdf.

[12]:

def pdf(x, z, mu, sigma, tau):
    return z * truncnorm.pdf(x, *xr, mu, sigma) + (1 - z) * truncexpon.pdf(
        x, *xr, 0, tau
    )


def approximate_cdf(xe, z, mu, sigma, tau):
    dx = np.diff(xe)
    cx = xe[:-1] + 0.5 * dx
    p = pdf(cx, z, mu, sigma, tau)
    return np.append([0], np.cumsum(p * dx))


c = cost.BinnedNLL(n, xe, approximate_cdf)
m = Minuit(c, z=0.4, mu=0, sigma=0.2, tau=2)
m.limits["z"] = (0, 1)
m.limits["sigma", "tau"] = (0.01, None)
m.migrad()

[12]:

Migrad
FCN = 15.65 (χ²/ndof = 1.0)	Nfcn = 189
EDM = 1.07e-05 (Goal: 0.0002)
Valid Minimum	Below EDM threshold (goal x 10)
No parameters at limit	Below call limit
Hesse ok	Covariance accurate

	Name	Value	Hesse Error	Minos Error-	Minos Error+	Limit-	Limit+	Fixed
0	z	0.540	0.015			0	1
1	mu	0.995	0.004
2	sigma	0.104	0.004			0.01
3	tau	1.05	0.08			0.01

	z	mu	sigma	tau
z	0.000235	-0.004e-3 (-0.067)	0.020e-3 (0.353)	-0.24e-3 (-0.208)
mu	-0.004e-3 (-0.067)	1.63e-05	-0.001e-3 (-0.090)	-0.015e-3 (-0.050)
sigma	0.020e-3 (0.353)	-0.001e-3 (-0.090)	1.31e-05	-0.043e-3 (-0.159)
tau	-0.24e-3 (-0.208)	-0.015e-3 (-0.050)	-0.043e-3 (-0.159)	0.00568

The fitted values and the uncertainty estimates for $μ$ and $σ$ are not identical to the unbinned fit, but very close. For practical purposes, the results are equivalent. This shows that the binning is fine enough to retain the essential information in the original data.

Since this approximation is useful in practice, the BinnedNLL computes it automatically if you pass the keyword use_pdf="approximate".

[13]:

c = cost.BinnedNLL(n, xe, pdf, use_pdf="approximate")
m = Minuit(c, z=0.4, mu=0, sigma=0.2, tau=2)
m.limits["z"] = (0, 1)
m.limits["sigma", "tau"] = (0.01, None)
m.migrad()

[13]:

Migrad
FCN = 15.65 (χ²/ndof = 1.0)	Nfcn = 189
EDM = 1.06e-05 (Goal: 0.0002)
Valid Minimum	Below EDM threshold (goal x 10)
No parameters at limit	Below call limit
Hesse ok	Covariance accurate

	Name	Value	Hesse Error	Minos Error-	Minos Error+	Limit-	Limit+	Fixed
0	z	0.540	0.015			0	1
1	mu	0.995	0.004
2	sigma	0.104	0.004			0.01
3	tau	1.05	0.08			0.01

	z	mu	sigma	tau
z	0.000235	-0.004e-3 (-0.067)	0.020e-3 (0.353)	-0.24e-3 (-0.208)
mu	-0.004e-3 (-0.067)	1.63e-05	-0.001e-3 (-0.090)	-0.015e-3 (-0.050)
sigma	0.020e-3 (0.353)	-0.001e-3 (-0.090)	1.31e-05	-0.043e-3 (-0.159)
tau	-0.24e-3 (-0.208)	-0.015e-3 (-0.050)	-0.043e-3 (-0.159)	0.00568

Another option is to compute the cdf numerically with use_pdf="numerical", but this tends to be expensive and is only supported for 1D histograms.

[14]:

c = cost.BinnedNLL(n, xe, pdf, use_pdf="numerical")
m = Minuit(c, z=0.4, mu=0, sigma=0.2, tau=2)
m.limits["z"] = (0, 1)
m.limits["sigma", "tau"] = (0.01, None)
m.migrad()

[14]:

Migrad
FCN = 15.03 (χ²/ndof = 0.9)	Nfcn = 270
EDM = 5.28e-06 (Goal: 0.0002)	time = 1.5 sec
Valid Minimum	Below EDM threshold (goal x 10)
No parameters at limit	Below call limit
Hesse ok	Covariance accurate

	Name	Value	Hesse Error	Minos Error-	Minos Error+	Limit-	Limit+	Fixed
0	z	0.540	0.015			0	1
1	mu	0.995	0.004
2	sigma	0.100	0.004			0.01
3	tau	1.05	0.08			0.01

	z	mu	sigma	tau
z	0.000235	-0.004e-3 (-0.067)	0.020e-3 (0.354)	-0.24e-3 (-0.209)
mu	-0.004e-3 (-0.067)	1.63e-05	-0.001e-3 (-0.090)	-0.015e-3 (-0.050)
sigma	0.020e-3 (0.354)	-0.001e-3 (-0.090)	1.43e-05	-0.045e-3 (-0.160)
tau	-0.24e-3 (-0.209)	-0.015e-3 (-0.050)	-0.045e-3 (-0.160)	0.00564

Fitting a multidimensional histogram is equally easy. Since the pdfs in this example factorise, the cdf of the 2D model is the product of the cdfs along each axis.

[15]:

def cdf(xe_ye, mu, sigma, tau):
    xe, ye = xe_ye
    return norm.cdf(xe, mu, sigma) * expon.cdf(ye, 0, tau)


c = cost.BinnedNLL(n2, (xe, ye), cdf)
m = Minuit(c, mu=0.1, sigma=0.2, tau=2)
m.limits["sigma", "tau"] = (0, None)
m.migrad()

[15]:

Migrad
FCN = 25.22 (χ²/ndof = 0.3)	Nfcn = 154
EDM = 4.1e-07 (Goal: 0.0002)
Valid Minimum	Below EDM threshold (goal x 10)
No parameters at limit	Below call limit
Hesse ok	Covariance accurate

	Name	Value	Hesse Error	Minos Error-	Minos Error+	Limit-	Limit+	Fixed
0	mu	0.9932	0.0032
1	sigma	0.0984	0.0024			0
2	tau	0.940	0.033			0

	mu	sigma	tau
mu	1.05e-05	0e-6	-0
sigma	0e-6	5.68e-06	-0e-6
tau	-0	-0e-6	0.0011

The automatically provided visualization for multidimensional data set is often not very pretty, but still helps to judge whether the fit is reasonable. There is no obvious way to draw higher dimensional data with error bars in comparison to a model, and so the automatic visualization shows all data bins as a single sequence. You can override the default visualization by calling Minuit.visualize with your own plotting function, or by assigning a plot function to the cost function BinnedNLL (monkey patching), or by deriving your own class from BinnedNLL.

Extended binned maximum-likelihood fit

As in the unbinned case, the binned extended maximum-likelihood fit should be used when also the amplitudes of the pdfs are of interest.

Instead of a density, you need to provide the integrated density in this case (which must be vectorized). There is no need to separately return the total integral of the density, like in the unbinned case. The parameters are the same as in the unbinned extended fit.

[16]:

def integral(xe, s, b, mu, sigma, tau):
    return s * truncnorm.cdf(xe, *xr, mu, sigma) + b * truncexpon.cdf(xe, *xr, 0, tau)


c = cost.ExtendedBinnedNLL(n, xe, integral)
m = Minuit(c, s=300, b=1500, mu=0, sigma=0.2, tau=2)
m.limits["s", "b", "sigma", "tau"] = (0, None)
m.migrad()

[16]:

Migrad
FCN = 15.03 (χ²/ndof = 1.0)	Nfcn = 680
EDM = 1.36e-07 (Goal: 0.0002)	time = 1.4 sec
Valid Minimum	Below EDM threshold (goal x 10)
No parameters at limit	Below call limit
Hesse ok	Covariance accurate

	Name	Value	Hesse Error	Minos Error-	Minos Error+	Limit-	Limit+	Fixed
0	s	1.02e3	0.04e3			0
1	b	867	35			0
2	mu	0.995	0.004
3	sigma	0.100	0.004			0
4	tau	1.05	0.08			0

	s	b	mu	sigma	tau
s	1.38e+03	-0.4e3 (-0.280)	-7.764e-3 (-0.052)	38.617e-3 (0.275)	-0.452 (-0.162)
b	-0.4e3 (-0.280)	1.23e+03	7.764e-3 (0.055)	-38.616e-3 (-0.291)	0.452 (0.172)
mu	-7.764e-3 (-0.052)	7.764e-3 (0.055)	1.63e-05	-0.001e-3 (-0.090)	-0.015e-3 (-0.050)
sigma	38.617e-3 (0.275)	-38.616e-3 (-0.291)	-0.001e-3 (-0.090)	1.43e-05	-0.045e-3 (-0.160)
tau	-0.452 (-0.162)	0.452 (0.172)	-0.015e-3 (-0.050)	-0.045e-3 (-0.160)	0.00564

Again, we can also fit multivariate data.

[17]:

def integral(xe_ye, n, mu, sigma, tau):
    xe, ye = xe_ye
    return n * norm.cdf(xe, mu, sigma) * expon.cdf(ye, 0, tau)


c = cost.ExtendedBinnedNLL(n2, (xe, ye), integral)
m = Minuit(c, n=1500, mu=0.1, sigma=0.2, tau=2)
m.limits["n", "sigma", "tau"] = (0, None)
m.migrad()

[17]:

Migrad
FCN = 24.64 (χ²/ndof = 0.3)	Nfcn = 183
EDM = 3.43e-06 (Goal: 0.0002)
Valid Minimum	Below EDM threshold (goal x 10)
No parameters at limit	Below call limit
Hesse ok	Covariance accurate

	Name	Value	Hesse Error	Minos Error-	Minos Error+	Limit-	Limit+	Fixed
0	n	1.000e3	0.029e3			0
1	mu	0.9932	0.0032
2	sigma	0.0984	0.0023			0
3	tau	0.943	0.034			0

	n	mu	sigma	tau
n	870	-5.542e-3 (-0.059)	-2.984e-3 (-0.044)	0.0297
mu	-5.542e-3 (-0.059)	1.02e-05	-0e-6 (-0.015)	0.001e-3 (0.011)
sigma	-2.984e-3 (-0.044)	-0e-6 (-0.015)	5.24e-06	-0e-6
tau	0.0297	0.001e-3 (0.011)	-0e-6	0.00113

Temporary masking

In complicated binned fits with peak and background, it is sometimes useful to fit in several stages. One typically starts by masking the signal region, to fit only the background region.

The cost functions have a mask attribute to that end. We demonstrate the use of the mask with an extended binned fit.

[18]:

def integral(xe, s, b, mu, sigma, tau):
    return s * truncnorm.cdf(xe, *xr, mu, sigma) + b * truncexpon.cdf(xe, *xr, 0, tau)


c = cost.ExtendedBinnedNLL(n, xe, integral)

# we set the signal amplitude to zero and fix all signal parameters
m = Minuit(c, s=0, b=1500, mu=1, sigma=0.2, tau=2)

m.limits["s", "b", "sigma", "tau"] = (0, None)
m.fixed["s", "mu", "sigma"] = True

# we temporarily mask out the signal
c.mask = (cx < 0.5) | (1.5 < cx)

m.migrad()

[18]:

Migrad
FCN = 6.623 (χ²/ndof = 0.8)	Nfcn = 55
EDM = 3.75e-07 (Goal: 0.0002)
Valid Minimum	Below EDM threshold (goal x 10)
No parameters at limit	Below call limit
Hesse ok	Covariance accurate

	Name	Value	Hesse Error	Minos Error-	Minos Error+	Limit-	Limit+	Fixed
0	s	0.0	0.1			0		yes
1	b	870	40			0
2	mu	1.00	0.01					yes
3	sigma	0.200	0.002			0		yes
4	tau	1.02	0.08			0

	s	b	mu	sigma	tau
s	0	0	0	0	0.000
b	0	1.71e+03	0	0	0.950 (0.289)
mu	0	0	0	0	0.000
sigma	0	0	0	0	0.000
tau	0.000	0.950 (0.289)	0.000	0.000	0.00632

We plot the intermediate result. Points which have been masked out are shown with open markers.

[19]:

for ma, co in ((c.mask, "k"), (~c.mask, "w")):
    plt.errorbar(cx[ma], n[ma], n[ma] ** 0.5, fmt="o", color=co, mec="k", ecolor="k")
plt.stairs(
    np.diff(integral(xe, *[p.value for p in m.init_params])), xe, ls=":", label="init"
)
plt.stairs(np.diff(integral(xe, *m.values)), xe, label="fit")
plt.legend();

../_images/notebooks_cost_functions_38_0.svg

Now we fix the background and fit only the signal parameters.

[20]:

c.mask = None  # remove mask
m.fixed = False  # release all parameters
m.fixed["b"] = True  # fix background amplitude
m.values["s"] = 100  # do not start at the limit
m.migrad()

[20]:

Migrad
FCN = 15.03 (χ²/ndof = 0.9)	Nfcn = 252
EDM = 6.92e-09 (Goal: 0.0002)
Valid Minimum	Below EDM threshold (goal x 10)
No parameters at limit	Below call limit
Hesse ok	Covariance accurate

	Name	Value	Hesse Error	Minos Error-	Minos Error+	Limit-	Limit+	Fixed
0	s	1.017e3	0.035e3			0
1	b	870	40			0		yes
2	mu	0.995	0.004
3	sigma	0.100	0.004			0
4	tau	1.05	0.07			0

	s	b	mu	sigma	tau
s	1.2e+03	0	-4.088e-3 (-0.030)	27.177e-3 (0.218)	-0.485 (-0.196)
b	0	0	0	0	0.000
mu	-4.088e-3 (-0.030)	0	1.55e-05	-0.001e-3 (-0.059)	-0.013e-3 (-0.047)
sigma	27.177e-3 (0.218)	0	-0.001e-3 (-0.059)	1.3e-05	-0.032e-3 (-0.126)
tau	-0.485 (-0.196)	0.000	-0.013e-3 (-0.047)	-0.032e-3 (-0.126)	0.00509

Finally, we release all parameters and fit again to get the correct uncertainty estimates.

[21]:

m.fixed = None
m.migrad()

[21]:

Migrad
FCN = 15.03 (χ²/ndof = 1.0)	Nfcn = 323
EDM = 3.22e-07 (Goal: 0.0002)
Valid Minimum	Below EDM threshold (goal x 10)
No parameters at limit	Below call limit
Hesse ok	Covariance accurate

	Name	Value	Hesse Error	Minos Error-	Minos Error+	Limit-	Limit+	Fixed
0	s	1.02e3	0.04e3			0
1	b	867	35			0
2	mu	0.995	0.004
3	sigma	0.100	0.004			0
4	tau	1.05	0.08			0

	s	b	mu	sigma	tau
s	1.38e+03	-0.4e3 (-0.280)	-7.764e-3 (-0.052)	38.617e-3 (0.275)	-0.452 (-0.162)
b	-0.4e3 (-0.280)	1.23e+03	7.764e-3 (0.055)	-38.616e-3 (-0.291)	0.452 (0.172)
mu	-7.764e-3 (-0.052)	7.764e-3 (0.055)	1.63e-05	-0.001e-3 (-0.090)	-0.015e-3 (-0.050)
sigma	38.617e-3 (0.275)	-38.616e-3 (-0.291)	-0.001e-3 (-0.090)	1.43e-05	-0.045e-3 (-0.160)
tau	-0.452 (-0.162)	0.452 (0.172)	-0.015e-3 (-0.050)	-0.045e-3 (-0.160)	0.00564

We get the same result as before. Since this was an easy problem, we did not need these extra steps, but doing this can be helpful to fit lots of histograms without adjusting each fit manually.

Weighted histograms

The cost functions for binned data also support weighted histograms. Just pass an array with the shape (n, 2) instead of (n,) as the first argument, where the first number of each pair is the sum of weights and the second is the sum of weights squared (an estimate of the variance of that bin value).

Least-squares fits

A cost function for a general weighted least-squares fit (aka chi-square fit) is also included. In statistics this is called non-linear regression.

In this case you need to provide a model that predicts the y-values as a function of the x-values and the parameters. The fit needs estimates of the y-errors. If those are wrong, the fit may be biased. If your data has errors on the x-values as well, checkout the tutorial about automatic differentiation, which includes an application of that to such fits.

[22]:

def model(x, a, b):
    return a + b * x**2


rng = np.random.default_rng(4)

truth = 1, 2
x = np.linspace(0, 1, 20)
yt = model(x, *truth)
ye = 0.4 * x**5 + 0.1
y = rng.normal(yt, ye)

plt.plot(x, yt, ls="--", label="truth")
plt.errorbar(x, y, ye, fmt="ok", label="data")
plt.legend(loc="upper left");

../_images/notebooks_cost_functions_46_0.svg

[23]:

c = cost.LeastSquares(x, y, ye, model)
m1 = Minuit(c, a=0, b=0)
m1.migrad()

[23]:

Migrad
FCN = 25.29 (χ²/ndof = 1.4)	Nfcn = 29
EDM = 2.27e-22 (Goal: 0.0002)
Valid Minimum	Below EDM threshold (goal x 10)
No parameters at limit	Below call limit
Hesse ok	Covariance accurate

	Name	Value	Hesse Error	Minos Error-	Minos Error+	Limit-	Limit+	Fixed
0	a	0.99	0.04
1	b	2.04	0.15

	a	b
a	0.00139	-0.0037 (-0.658)
b	-0.0037 (-0.658)	0.0226

We can also plot the standard visualization manually and add further graphs to the figure.

[24]:

m1.visualize()
plt.plot(c.x, model(c.x, *truth), ls="--", label="truth");

../_images/notebooks_cost_functions_49_0.svg

We can also fit a multivariate model, in this case we fit a plane in 2D.

[25]:

def model2(x_y, a, bx, by):
    x, y = x_y
    return a + bx * x + by * y


# generate a regular grid in x and y
x = np.linspace(-1, 1, 10)
y = np.linspace(-1, 1, 10)
X, Y = np.meshgrid(x, y)
x = X.flatten()
y = Y.flatten()

# model truth
Z = model2((x, y), 1, 2, 3)

# add some noise
rng = np.random.default_rng(1)
Zerr = 1
Z = rng.normal(Z, Zerr)

plt.scatter(x, y, c=Z)
plt.colorbar();

../_images/notebooks_cost_functions_51_0.svg

[26]:

c2 = cost.LeastSquares((x, y), Z, Zerr, model2)
m2 = Minuit(c2, 0, 0, 0)
m2.migrad()

[26]:

Migrad
FCN = 71.61 (χ²/ndof = 0.7)	Nfcn = 34
EDM = 5.66e-16 (Goal: 0.0002)
Valid Minimum	Below EDM threshold (goal x 10)
No parameters at limit	Below call limit
Hesse ok	Covariance accurate

	Name	Value	Hesse Error	Minos Error-	Minos Error+	Limit-	Limit+	Fixed
0	a	0.93	0.10
1	bx	1.87	0.16
2	by	2.93	0.16

	a	bx	by
a	0.01	0.000	0.000
bx	0.000	0.0245	0.000
by	0.000	0.000	0.0245

Multivariate fits are difficult to check by eye. Here we use color to indicate the function value.

To guarantee that plot of the function and the plot of the data use the same color scale, we use the same normalising function for pyplot.pcolormesh and pyplot.scatter.

[27]:

xm = np.linspace(-1, 1, 100)
ym = np.linspace(-1, 1, 100)
Xm, Ym = np.meshgrid(xm, ym)
xm = Xm.flatten()
ym = Ym.flatten()

qm = plt.pcolormesh(Xm, Ym, model2((xm, ym), *m2.values).reshape(Xm.shape))
plt.scatter(c2.x[0], c2.x[1], c=c2.y, edgecolors="w", norm=qm.norm)
plt.colorbar();

../_images/notebooks_cost_functions_54_0.svg

Robust least-squares

The built-in least-squares function also supports robust fitting with an alternative loss functions. See the documentation of iminuit.cost.LeastSquares for details. Users can pass their own loss functions. Builtin loss functions are:

linear (default): gives ordinary weighted least-squares
soft_l1: quadratic ordinary loss for small deviations ( $≪ 1 σ$ ), linear loss for large deviations ( $≫ 1 σ$ ), and smooth interpolation in between

Let’s create one outlier and see what happens with ordinary loss.

[28]:

c.y[3] = 3  # generate an outlier

m3 = Minuit(c, a=0, b=0)
m3.migrad()

[28]:

Migrad
FCN = 364.9 (χ²/ndof = 20.3)	Nfcn = 29
EDM = 9.85e-22 (Goal: 0.0002)
Valid Minimum	Below EDM threshold (goal x 10)
No parameters at limit	Below call limit
Hesse ok	Covariance accurate

	Name	Value	Hesse Error	Minos Error-	Minos Error+	Limit-	Limit+	Fixed
0	a	1.23	0.04
1	b	1.45	0.15

	a	b
a	0.00139	-0.0037 (-0.658)
b	-0.0037 (-0.658)	0.0226

[29]:

m3.visualize()
plt.plot(c.x, model(c.x, 1, 2), ls="--", label="truth");

../_images/notebooks_cost_functions_57_0.svg

The result is distorted by the outlier. Note that the error did not increase! The size of the error computed by Minuit does not include mismodelling.

We can repair this with by switching to “soft_l1” loss.

[30]:

c.loss = "soft_l1"
m3.migrad()

[30]:

Migrad
FCN = 54.09 (χ²/ndof = 3.0)	Nfcn = 69
EDM = 4.31e-06 (Goal: 0.0002)	time = 0.3 sec
Valid Minimum	Below EDM threshold (goal x 10)
No parameters at limit	Below call limit
Hesse ok	Covariance accurate

	Name	Value	Hesse Error	Minos Error-	Minos Error+	Limit-	Limit+	Fixed
0	a	1.00	0.05
1	b	2.04	0.23

	a	b
a	0.00285	-0.0086 (-0.707)
b	-0.0086 (-0.707)	0.0524

[31]:

m3.visualize()
plt.plot(c.x, model(c.x, *truth), ls="--", label="truth");

../_images/notebooks_cost_functions_60_0.svg

The result is almost identical as in the previous case without an outlier.

Robust fitting is very useful if the data are contaminated with small amounts of outliers. It comes with a price, however, the uncertainties are in general larger and the errors computed by Minuit are not correct anymore.

Calculating the parameter uncertainty properly for this case requires a so-called sandwich estimator, which is currently not implemented. As an alternative, one can use the bootstrap to compute parameter uncertainties. We use the resample library to do this.

[32]:

from resample.bootstrap import variance as bvar


def fit(x, y, ye):
    c = cost.LeastSquares(x, y, ye, model, loss="soft_l1")
    m = Minuit(c, a=0, b=0)
    m.migrad()
    return m.values


berr = bvar(fit, c.x, c.y, c.yerror, size=1000, random_state=1) ** 0.5

fig, ax = plt.subplots(1, 2, figsize=(9, 4))
for i, axi in enumerate(ax):
    axi.errorbar(0, m1.values[i], m1.errors[i], fmt="o")
    axi.errorbar(1, m3.values[i], m3.errors[i], fmt="o")
    axi.errorbar(2, m3.values[i], berr[i], fmt="o")
    axi.set_xticks(np.arange(3), ("no outlier", "Minuit, soft_l1", "bootstrap"))

../_images/notebooks_cost_functions_62_0.svg

In this case, Minuit’s estimate is similar to the bootstrap estimate, but that is not generally true when the “soft_l1” loss is used.

Robust fits are very powerful when the outliers cannot be removed by other means. If one can identify outliers by other means, it is better to remove them. We manually remove the point (using the mask attribute) and switch back to ordinary loss.

[33]:

c.mask = np.arange(len(c.x)) != 3
c.loss = "linear"
m4 = Minuit(c, a=0, b=0)
m4.migrad()

[33]:

Migrad
FCN = 24.67 (χ²/ndof = 1.5)	Nfcn = 29
EDM = 1.37e-22 (Goal: 0.0002)
Valid Minimum	Below EDM threshold (goal x 10)
No parameters at limit	Below call limit
Hesse ok	Covariance accurate

	Name	Value	Hesse Error	Minos Error-	Minos Error+	Limit-	Limit+	Fixed
0	a	0.98	0.04
1	b	2.07	0.15

	a	b
a	0.00158	-0.0041 (-0.678)
b	-0.0041 (-0.678)	0.0238

Now the uncertainties are essentially the same as before adding the outlier.

[34]:

fig, ax = plt.subplots(1, 2, figsize=(9, 4))
for i, axi in enumerate(ax):
    axi.errorbar(0, m1.values[i], m1.errors[i], fmt="o")
    axi.errorbar(1, m3.values[i], m3.errors[i], fmt="o")
    axi.errorbar(2, m3.values[i], berr[i], fmt="o")
    axi.errorbar(3, m4.values[i], m4.errors[i], fmt="o")
    axi.set_xticks(
        np.arange(4), ("no outlier", "Minuit, soft_l1", "bootstrap", "outlier removed")
    )

../_images/notebooks_cost_functions_66_0.svg