Template fits

In applications we are interested in separating a signal component from background components, we often fit parameteric models to data. Sometimes constructing a parametric model for some component is difficult. In that case, one fits a template instead which may be obtained from simulation or from a calibration sample in which a pure component can be isolated.

The challenge then is to propagate the uncertainty of the template into the result. The template is now also estimated from a sample (be it simulated or a calibration sample), and the uncertainty associated to that can be substantial. We investigate different approaches for template fits, including the Barlow-Beeston and Barlow-Beeston-lite methods.

Note: This work has been published: H. Dembinski, A. Abdelmotteleb, Eur.Phys.J.C 82 (2022) 11, 1043

[ ]:

%config InlineBackend.figure_formats = ['svg']
from iminuit import Minuit
from iminuit.cost import poisson_chi2, Template
import numpy as np
from scipy.stats import norm, truncexpon
from scipy.optimize import minimize
import matplotlib.pyplot as plt

As a toy example, we generate a mixture of two components: a normally distributed signal and exponentially distributed background.

[ ]:

def generate(rng, nmc, truth, bins):
    xe = np.linspace(0, 2, bins + 1)
    b = np.diff(truncexpon(1, 0, 2).cdf(xe))
    s = np.diff(norm(1, 0.1).cdf(xe))
    n = rng.poisson(b * truth[0]) + rng.poisson(s * truth[1])
    t = np.array([rng.poisson(b * nmc), rng.poisson(s * nmc)])
    return xe, n, t


rng = np.random.default_rng(1)
truth = 750, 250
xe, n, t = generate(rng, 500, truth, 15)

Data is visualized on the left-hand side. The templates are shown on the right-hand side. To show the effect of uncertainties in the template, this example intentially uses templates with poor statistical resolution.

[ ]:

fig, ax = plt.subplots(1, 2, figsize=(10, 4), sharex=True)
ax[0].stairs(n, xe, fill=True, color="k", alpha=0.5, label="data")
for i, ti in enumerate(t):
    ax[1].stairs(ti, xe, fill=True, alpha=0.5, label=f"template {i}")

../_images/notebooks_template_fits_5_0.svg

Bootstrapping template uncertainties

Bootstrapping is a general purpose technique to include uncertainties backed up by bootstrap theory, so it can be applied to this problem. We perform a standard fit and pretend that the templates have no uncertainties. Then, we repeat this fit many times with templates that are fluctuated around the actual values assuming a Poisson distribution.

Here is the cost function.

[ ]:

def cost(yields):
    mu = 0
    for y, c in zip(yields, t):
        mu += y * c / np.sum(c)
    r = poisson_chi2(n, mu)
    return r


cost.errordef = Minuit.LEAST_SQUARES
cost.ndata = np.prod(n.shape)

starts = np.ones(2)
m = Minuit(cost, starts)
m.limits = (0, None)
m.migrad()
m.hesse()

Migrad
FCN = -991.9 (χ²/ndof = -76.3)	Nfcn = 119
EDM = 0.000125 (Goal: 0.0002)
Valid Minimum	Below EDM threshold (goal x 10)
No parameters at limit	Below call limit
Hesse ok	Covariance accurate

	Name	Value	Hesse Error	Minos Error-	Minos Error+	Limit-	Limit+	Fixed
0	x0	782	31			0
1	x1	201	20			0

	x0	x1
x0	985	-0.2e3 (-0.322)
x1	-0.2e3 (-0.322)	404

The uncertainties reported by the fit correspond to the uncertainty in the data, but not the uncertainty in the templates. The chi2/ndof is also very large, since the uncertainties in the template are not considered in the fit.

We bootstrap the templates 1000 times and compute the covariance of the fitted results.

[ ]:

b = 100
rng = np.random.default_rng(1)
pars = []
for ib in range(b):
    ti = rng.poisson(t)

    def cost(yields):
        mu = 0
        for y, c in zip(yields, ti):
            mu += y * c / np.sum(c)
        r = poisson_chi2(n, mu)
        return r

    mi = Minuit(cost, m.values[:])
    mi.errordef = Minuit.LEAST_SQUARES
    mi.limits = (0, None)
    mi.strategy = 0
    mi.migrad()
    assert mi.valid
    pars.append(mi.values[:])

cov2 = np.cov(np.transpose(pars), ddof=1)

We print the uncertainties from the different stages and the correlation between the two yields.

To obtain the full error, we add the independent covariance matrices from the original fit and the bootstrap.

[ ]:

cov1 = m.covariance

for title, cov in zip(("fit", "bootstrap", "fit+bootstrap"), (cov1, cov2, cov1 + cov2)):
    print(title)
    for label, p, e in zip(("b", "s"), m.values, np.diag(cov) ** 0.5):
        print(f"  {label} {p:.0f} +- {e:.0f}")
    print(f"  correlation {cov[0, 1] / np.prod(np.diag(cov)) ** 0.5:.2f}")

fit
  b 782 +- 31
  s 201 +- 20
  correlation -0.32
bootstrap
  b 782 +- 18
  s 201 +- 18
  correlation -1.00
fit+bootstrap
  b 782 +- 36
  s 201 +- 27
  correlation -0.53

The bootstrapped template errors are much larger than the fit errors in this case, since the sample used to generate the templates is much smaller than the data sample.

The bootstrapped errors for both yields are nearly equal (they become exactly equal if the template sample is large) and the correlation is close to -1 (and becomes exactly -1 in large samples). This is expected, since the data sample is fixed in each iteration. Under these conditions, a change in the templates can only increase the yield of one component at an equal loss for the other component.

Template fit with nuisance parameters

As described in Barlow and Beeston, Comput.Phys.Commun. 77 (1993) 219-228, the correct treatment from first principles is to write down the likelihood function for this case, in which the observed values and unknown parameters are clearly stated. The insight is that the true contents of the bins for the templates are unknown and we need to introduce a nuisance parameter for each bin entry in the template. The combined likelihood for the problem is then combines the estimation of the template yields with the estimation of unknown templates.

This problem can be handled straight-forwardly with Minuit, but it leads to the introduction of a large number of nuisance parameters, one for each entry in each template. We again write a cost function for this case (here a class for convenience).

As a technical detail, it is necessary to increase the call limit in Migrad for the fit to fully converge, since the limit set by Minuit’s default heuristic is too tight for this application.

[ ]:

class BB:
    def __init__(self, xe, n, t):
        self.xe = xe
        self.data = n, t

    def _pred(self, par):
        bins = len(self.xe) - 1
        yields = par[:2]
        nuisances = np.array(par[2:])
        b = nuisances[:bins]
        s = nuisances[bins:]
        mu = 0
        for y, c in zip(yields, (b, s)):
            mu += y * np.array(c) / np.sum(c)
        return mu, b, s

    def __call__(self, par):
        n, t = self.data
        mu, b, s = self._pred(par)
        r = poisson_chi2(n, mu) + poisson_chi2(t[0], b) + poisson_chi2(t[1], s)
        return r

    @property
    def ndata(self):
        n, t = self.data
        return np.prod(n.shape) + np.prod(t.shape)

    def visualize(self, args):
        n, t = self.data
        ne = n**0.5
        xe = self.xe
        cx = 0.5 * (xe[1:] + xe[:-1])
        plt.errorbar(cx, n, ne, fmt="ok")
        mu = 0
        mu_var = 0
        for y, c in zip(args[:2], t):
            f = 1 / np.sum(c)
            mu += c * y * f
            mu_var += c * (f * y) ** 2
        mu_err = mu_var**0.5
        plt.stairs(mu + mu_err, xe, baseline=mu - mu_err, fill=True, color="C0")
        # plt.stairs(mu, xe, color="C0")


m1 = Minuit(BB(xe, n, t), np.concatenate([truth, t[0], t[1]]))
m1.limits = (0, None)
m1.migrad(ncall=100000)
m1.hesse()

Migrad
FCN = -2138 (χ²/ndof = -164.5)	Nfcn = 1859
EDM = 1.87e-05 (Goal: 0.0002)
Valid Minimum	Below EDM threshold (goal x 10)
SOME parameters at limit	Below call limit
Hesse ok	Covariance accurate

	Name	Value	Hesse Error	Minos Error-	Minos Error+	Limit-	Limit+	Fixed
0	x0	780	40			0
1	x1	199	26			0
2	x2	47	5			0
3	x3	50	5			0
4	x4	48	5			0
5	x5	43	4			0
6	x6	43	4			0
7	x7	39	4			0
8	x8	42	5			0
9	x9	29	5			0
10	x10	33	5			0
11	x11	28	4			0
12	x12	24.6	3.3			0
13	x13	21.5	3.0			0
14	x14	22.7	3.1			0
15	x15	23.1	3.2			0
16	x16	21.9	3.1			0
17	x17	0.0	0.4			0
18	x18	0.0	0.4			0
19	x19	0.0	0.4			0
20	x20	0.0	0.4			0
21	x21	1.0	0.9			0
22	x22	7.8	2.7			0
23	x23	109	10			0
24	x24	255	16			0
25	x25	111	10			0
26	x26	13	4			0
27	x27	0.0	0.4			0
28	x28	0.0	0.4			0
29	x29	0.0	0.4			0
30	x30	0.0	0.4			0
31	x31	0.0	0.4			0

	x0	x1	x2	x3	x4	x5	x6	x7	x8	x9	x10	x11	x12	x13	x14	x15	x16	x17	x18	x19	x20	x21	x22	x23	x24	x25	x26	x27	x28	x29	x30	x31
x0	1.26e+03	-0.5e3 (-0.516)	-17 (-0.104)	-18 (-0.106)	-18 (-0.105)	-16 (-0.100)	-15 (-0.096)	-10 (-0.065)	40 (0.213)	60 (0.331)	37 (0.221)	-2 (-0.012)	-9 (-0.078)	-8 (-0.073)	-8 (-0.075)	-8 (-0.075)	-8 (-0.074)	-0	-0	-0	-0	-0.5 (-0.013)	-3 (-0.028)	-0.03e3 (-0.070)	0.05e3 (0.085)	-0.01e3 (-0.034)	-6 (-0.050)	-0	-0	-0	-0	-0
x1	-0.5e3 (-0.516)	675	17 (0.142)	18 (0.145)	18 (0.143)	16 (0.137)	15 (0.131)	10 (0.088)	-40 (-0.292)	-60 (-0.453)	-37 (-0.302)	2 (0.016)	9 (0.106)	8 (0.100)	8 (0.102)	8 (0.103)	8 (0.100)	0	0	0	0	0.5 (0.018)	3 (0.039)	0.03e3 (0.095)	-0.05e3 (-0.116)	0.01e3 (0.047)	6 (0.068)	0	0	0	0	0
x2	-17 (-0.104)	17 (0.142)	22	3 (0.150)	3 (0.148)	3 (0.141)	3 (0.140)	3 (0.126)	1 (0.036)	-1 (-0.024)	0 (0.022)	2 (0.093)	2 (0.110)	1 (0.103)	2 (0.106)	2 (0.107)	1 (0.104)	-0	0	0	0	0 (0.004)	0 (0.008)	1 (0.019)	-2 (-0.023)	0 (0.009)	0 (0.014)	0	0	0	0	0
x3	-18 (-0.106)	18 (0.145)	3 (0.150)	23.3	3 (0.152)	3 (0.144)	3 (0.143)	3 (0.128)	1 (0.037)	-1 (-0.025)	1 (0.022)	2 (0.095)	2 (0.112)	2 (0.105)	2 (0.108)	2 (0.109)	2 (0.106)	0	-0	0	0	0.0 (0.004)	0 (0.008)	1 (0.020)	-2 (-0.024)	0 (0.010)	0 (0.014)	0	0	0	0	0
x4	-18 (-0.105)	18 (0.143)	3 (0.148)	3 (0.152)	22.7	3 (0.143)	3 (0.141)	3 (0.127)	1 (0.037)	-1 (-0.024)	0 (0.022)	2 (0.094)	2 (0.111)	2 (0.104)	2 (0.107)	2 (0.108)	2 (0.105)	0	0	-0	0	0.0 (0.004)	0 (0.008)	1 (0.019)	-2 (-0.024)	0 (0.010)	0 (0.014)	0	0	0	0	0
x5	-16 (-0.100)	16 (0.137)	3 (0.141)	3 (0.144)	3 (0.143)	19.9	3 (0.135)	2 (0.121)	1 (0.035)	-1 (-0.023)	0 (0.021)	1 (0.090)	2 (0.106)	1 (0.099)	1 (0.102)	1 (0.103)	1 (0.100)	0	0	0	-0	0.0 (0.004)	0 (0.008)	1 (0.018)	-2 (-0.022)	0 (0.009)	0 (0.013)	0	0	0	0	0
x6	-15 (-0.096)	15 (0.131)	3 (0.140)	3 (0.143)	3 (0.141)	3 (0.135)	19.9	2 (0.120)	1 (0.036)	-0 (-0.021)	0 (0.022)	1 (0.089)	2 (0.105)	1 (0.098)	1 (0.101)	1 (0.102)	1 (0.099)	0	0	0	0	-0.1 (-0.031)	0 (0.007)	1 (0.019)	-1 (-0.021)	0 (0.010)	0 (0.013)	0	0	0	0	0
x7	-10 (-0.065)	10 (0.088)	3 (0.126)	3 (0.128)	3 (0.127)	2 (0.121)	2 (0.120)	18.1	1 (0.042)	-0 (-0.006)	1 (0.029)	1 (0.082)	1 (0.094)	1 (0.088)	1 (0.090)	1 (0.091)	1 (0.089)	0	0	0	0	0.0 (0.003)	-1 (-0.086)	1 (0.019)	-1 (-0.008)	1 (0.012)	0 (0.012)	0	0	0	0	0
x8	40 (0.213)	-40 (-0.292)	1 (0.036)	1 (0.037)	1 (0.037)	1 (0.035)	1 (0.036)	1 (0.042)	27.6	4 (0.137)	3 (0.113)	1 (0.048)	0 (0.027)	0 (0.025)	0 (0.026)	0 (0.026)	0 (0.026)	-0	0	0	0	0.0 (0.001)	0 (0.006)	-12 (-0.217)	9 (0.109)	2 (0.046)	0 (0.004)	-0	-0	0	0	0
x9	60 (0.331)	-60 (-0.453)	-1 (-0.024)	-1 (-0.025)	-1 (-0.024)	-1 (-0.023)	-0 (-0.021)	-0 (-0.006)	4 (0.137)	26.2	3 (0.138)	0 (0.017)	-0 (-0.018)	-0 (-0.017)	-0 (-0.017)	-0 (-0.017)	-0 (-0.017)	-0	0	-0	-0	-0.0 (-0.000)	0 (0.004)	2 (0.036)	-5 (-0.060)	3 (0.056)	-0 (-0.002)	-0	-0	-0	0	0
x10	37 (0.221)	-37 (-0.302)	0 (0.022)	1 (0.022)	0 (0.022)	0 (0.021)	0 (0.022)	1 (0.029)	3 (0.113)	3 (0.138)	22.2	1 (0.038)	0 (0.016)	0 (0.015)	0 (0.016)	0 (0.016)	0 (0.015)	-0	0	0	0	0.0 (0.001)	0 (0.005)	2 (0.032)	8 (0.110)	-10 (-0.204)	0 (0.003)	-0	-0	-0	0	0
x11	-2 (-0.012)	2 (0.016)	2 (0.093)	2 (0.095)	2 (0.094)	1 (0.090)	1 (0.089)	1 (0.082)	1 (0.048)	0 (0.017)	1 (0.038)	13	1 (0.070)	1 (0.066)	1 (0.067)	1 (0.068)	1 (0.066)	0	0	0	0	0 (0.002)	0 (0.006)	1 (0.019)	1 (0.011)	1 (0.016)	-2 (-0.150)	0	0	0	0	0
x12	-9 (-0.078)	9 (0.106)	2 (0.110)	2 (0.112)	2 (0.111)	2 (0.106)	2 (0.105)	1 (0.094)	0 (0.027)	-0 (-0.018)	0 (0.016)	1 (0.070)	10.7	1 (0.077)	1 (0.079)	1 (0.080)	1 (0.078)	0	0	0	0	0.0 (0.003)	0 (0.006)	0 (0.014)	-1 (-0.017)	0 (0.007)	0 (0.010)	-0	0	0	0	0
x13	-8 (-0.073)	8 (0.100)	1 (0.103)	2 (0.105)	2 (0.104)	1 (0.099)	1 (0.098)	1 (0.088)	0 (0.025)	-0 (-0.017)	0 (0.015)	1 (0.066)	1 (0.077)	9.19	1 (0.074)	1 (0.075)	1 (0.073)	0	0	0	0	0.0 (0.003)	0 (0.005)	0 (0.013)	-1 (-0.016)	0 (0.007)	0 (0.010)	0	-0	0	0	0
x14	-8 (-0.075)	8 (0.102)	2 (0.106)	2 (0.108)	2 (0.107)	1 (0.102)	1 (0.101)	1 (0.090)	0 (0.026)	-0 (-0.017)	0 (0.016)	1 (0.067)	1 (0.079)	1 (0.074)	9.75	1 (0.077)	1 (0.075)	0	0	0	0	0.0 (0.003)	0 (0.006)	0 (0.014)	-1 (-0.017)	0 (0.007)	0 (0.010)	0	0	-0	0	0
x15	-8 (-0.075)	8 (0.103)	2 (0.107)	2 (0.109)	2 (0.108)	1 (0.103)	1 (0.102)	1 (0.091)	0 (0.026)	-0 (-0.017)	0 (0.016)	1 (0.068)	1 (0.080)	1 (0.075)	1 (0.077)	9.93	1 (0.076)	0	0	0	0	0.0 (0.003)	0 (0.006)	0 (0.014)	-1 (-0.017)	0 (0.007)	0 (0.010)	0	0	0	-0	0
x16	-8 (-0.074)	8 (0.100)	1 (0.104)	2 (0.106)	2 (0.105)	1 (0.100)	1 (0.099)	1 (0.089)	0 (0.026)	-0 (-0.017)	0 (0.015)	1 (0.066)	1 (0.078)	1 (0.073)	1 (0.075)	1 (0.076)	9.37	0	0	0	0	0.0 (0.003)	0 (0.006)	0 (0.014)	-1 (-0.017)	0 (0.007)	0 (0.010)	0	0	0	0	-0
x17	-0	0	-0	0	0	0	0	0	-0	-0	-0	0	0	0	0	0	0	0	0	0	0	0	0	0	-0	-0	0	0	0	0	0	0
x18	-0	0	0	-0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
x19	-0	0	0	0	-0	0	0	0	0	-0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	-0	0	0	0	0	0	0	0
x20	-0	0	0	0	0	-0	0	0	0	-0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	-0	0	0	0	0	0	0	0
x21	-0.5 (-0.013)	0.5 (0.018)	0 (0.004)	0.0 (0.004)	0.0 (0.004)	0.0 (0.004)	-0.1 (-0.031)	0.0 (0.003)	0.0 (0.001)	-0.0 (-0.000)	0.0 (0.001)	0 (0.002)	0.0 (0.003)	0.0 (0.003)	0.0 (0.003)	0.0 (0.003)	0.0 (0.003)	0	0	0	0	0.99	0.0	0.0 (0.001)	-0.0 (-0.000)	0.0 (0.000)	0.0	0	0	0	0	0
x22	-3 (-0.028)	3 (0.039)	0 (0.008)	0 (0.008)	0 (0.008)	0 (0.008)	0 (0.007)	-1 (-0.086)	0 (0.006)	0 (0.004)	0 (0.005)	0 (0.006)	0 (0.006)	0 (0.005)	0 (0.006)	0 (0.006)	0 (0.006)	0	0	0	0	0.0	7.53	0 (0.002)	0 (0.003)	0 (0.002)	0 (0.001)	0	0	0	0	0
x23	-0.03e3 (-0.070)	0.03e3 (0.095)	1 (0.019)	1 (0.020)	1 (0.019)	1 (0.018)	1 (0.019)	1 (0.019)	-12 (-0.217)	2 (0.036)	2 (0.032)	1 (0.019)	0 (0.014)	0 (0.013)	0 (0.014)	0 (0.014)	0 (0.014)	0	0	0	0	0.0 (0.001)	0 (0.002)	103	0 (0.028)	0 (0.013)	0 (0.002)	0	0	0	0	0
x24	0.05e3 (0.085)	-0.05e3 (-0.116)	-2 (-0.023)	-2 (-0.024)	-2 (-0.024)	-2 (-0.022)	-1 (-0.021)	-1 (-0.008)	9 (0.109)	-5 (-0.060)	8 (0.110)	1 (0.011)	-1 (-0.017)	-1 (-0.016)	-1 (-0.017)	-1 (-0.017)	-1 (-0.017)	-0	0	-0	-0	-0.0 (-0.000)	0 (0.003)	0 (0.028)	244	0.01e3 (0.045)	-0 (-0.002)	-0	-0	-0	-0	0
x25	-0.01e3 (-0.034)	0.01e3 (0.047)	0 (0.009)	0 (0.010)	0 (0.010)	0 (0.009)	0 (0.010)	1 (0.012)	2 (0.046)	3 (0.056)	-10 (-0.204)	1 (0.016)	0 (0.007)	0 (0.007)	0 (0.007)	0 (0.007)	0 (0.007)	-0	0	0	0	0.0 (0.000)	0 (0.002)	0 (0.013)	0.01e3 (0.045)	103	0 (0.001)	-0	-0	-0	0	0
x26	-6 (-0.050)	6 (0.068)	0 (0.014)	0 (0.014)	0 (0.014)	0 (0.013)	0 (0.013)	0 (0.012)	0 (0.004)	-0 (-0.002)	0 (0.003)	-2 (-0.150)	0 (0.010)	0 (0.010)	0 (0.010)	0 (0.010)	0 (0.010)	0	0	0	0	0.0	0 (0.001)	0 (0.002)	-0 (-0.002)	0 (0.001)	13.2	0	0	0	0	0
x27	-0	0	0	0	0	0	0	0	-0	-0	-0	0	-0	0	0	0	0	0	0	0	0	0	0	0	-0	-0	0	0	0	0	0	0
x28	-0	0	0	0	0	0	0	0	-0	-0	-0	0	0	-0	0	0	0	0	0	0	0	0	0	0	-0	-0	0	0	0	0	0	0
x29	-0	0	0	0	0	0	0	0	0	-0	-0	0	0	0	-0	0	0	0	0	0	0	0	0	0	-0	-0	0	0	0	0	0	0
x30	-0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	-0	0	0	0	0	0	0	0	0	-0	0	0	0	0	0	0	0
x31	-0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	-0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0

The result of this fit is comparable to the bootstrap method for this example, but the chi2/ndof is now reasonable and the uncertainties are correct without further work. This method should perform better than the bootstrap method, if the count per bin in the templates is small.

Another advantage is of this technique is that one can profile over the likelihood to obtain a 2D confidence regions, which is not possible with the bootstrap technique.

[ ]:

m1.draw_mncontour("x0", "x1");

../_images/notebooks_template_fits_16_0.svg

Before moving on, we briefly explore a possible refinement of the previous method, which is to hide the nuisance parameters from Minuit with a nested fit. It turns out that this technique is not an improvement, but it is useful to show that explicitly.

The idea is to construct an outer cost function, which only has the yields as parameters. Inside the outer cost function, the best nuisance parameters are found for the current yields with an inner cost function. Technically, this is achieved by calling a minimizer on the inner cost function at every call to the outer cost function.

Technical detail: It is important here to adjust Minuit’s expectation of how accurate the cost function is computed. Usually, Minuit performs its internal calculations under the assumption that the cost function is accurate to machine precision. This is usually not the case when a minimizer is used internally to optimize the inner function. We perform the internal minimization with SciPy, which allows us to set the tolerance. We set it here to 1e-8, which is sufficient for this problem and saves a bit of time on the internal minimisation. We then instruct Minuit to expect only this precision.

[ ]:

precision = 1e-8


def cost(yields):
    bins = len(n)

    def inner(nuisance):
        b = nuisance[:bins]
        s = nuisance[bins:]
        mu = 0
        for y, c in zip(yields, (b, s)):
            mu += y * c / np.sum(c)
        r = poisson_chi2(n, mu) + poisson_chi2(t[0], b) + poisson_chi2(t[1], s)
        return r

    bounds = np.zeros((2 * bins, 2))
    bounds[:, 1] = np.inf
    r = minimize(inner, np.ravel(t), bounds=bounds, tol=precision)
    assert r.success
    return r.fun


cost.errordef = Minuit.LEAST_SQUARES
cost.ndata = np.prod(n.shape)

m2 = Minuit(cost, truth)
m2.precision = precision
m2.limits = (0, None)
m2.migrad()
m2.hesse()

Migrad
FCN = -2140 (χ²/ndof = -164.6)	Nfcn = 55
EDM = 3.28e-05 (Goal: 0.0002)
Valid Minimum	Below EDM threshold (goal x 10)
No parameters at limit	Below call limit
Hesse ok	Covariance accurate

	Name	Value	Hesse Error	Minos Error-	Minos Error+	Limit-	Limit+	Fixed
0	x0	784	35			0
1	x1	199	26			0

	x0	x1
x0	1.22e+03	-0.5e3 (-0.506)
x1	-0.5e3 (-0.506)	665

We obtain the exact same result as expected, but the runtime is much longer (more than a factor 10), which disfavors this technique compared to the straight-forward fit. The minimization is not as efficient, because Minuit cannot exploit correlations between the internal and the external parameters that allow it to converge it faster when it sees all parameters at once.

Template fits

The implementation described by Barlow and Beeston, Comput.Phys.Commun. 77 (1993) 219-228 solves the problem similarly to the nested fit described above, but the solution to the inner problem is found with an efficient algorithm.

The Barlow-Beeston approach still requires numerically solving a non-linear equation per bin. Several authors tried to replace the stop with approximations.

Conway, PHYSTAT 2011, https://arxiv.org/abs/1103.0354 uses an approximation in which the optimal nuisance parameters can be found by bin-by-bin by solving a quadratic equation which has only one allowed solution.
C.A. Argüelles, A. Schneider, T. Yuan, JHEP 06 (2019) 030 use a Bayesian treatment in which the uncertainty from the finite size of the simulation sample is modeled with a prior (which is itself a posterior conditioned on the simulation result). A closed formula for the likelihood can be derived.
H. Dembinski, A. Abdelmotteleb, Eur.Phys.J.C 82 (2022) 11, 1043 derived an approximation similar to Conway, but which treats data and simulation symmetrically.

All three methods are implemented in the built-in Template cost function (see documentation for details).

[ ]:

c = Template(n, xe, t, method="jsc")  # Conway
m3 = Minuit(c, *truth)
m3.limits = (0, None)
m3.migrad()
m3.hesse()

Migrad
FCN = 8.447 (χ²/ndof = 0.6)	Nfcn = 46
EDM = 7.7e-06 (Goal: 0.0002)
Valid Minimum	Below EDM threshold (goal x 10)
No parameters at limit	Below call limit
Hesse ok	Covariance accurate

	Name	Value	Hesse Error	Minos Error-	Minos Error+	Limit-	Limit+	Fixed
0	x0	790	50			0
1	x1	197	28			0

	x0	x1
x0	2.56e+03	-0.5e3 (-0.358)
x1	-0.5e3 (-0.358)	764

[ ]:

c = Template(n, xe, t, method="asy")  # Argüelles, Schneider, Yuan
m4 = Minuit(c, *truth)
m4.limits = (0, None)
m4.migrad()
m4.hesse()

Migrad
FCN = 54.95	Nfcn = 46
EDM = 5.98e-07 (Goal: 0.0001)
Valid Minimum	Below EDM threshold (goal x 10)
No parameters at limit	Below call limit
Hesse ok	Covariance accurate

	Name	Value	Hesse Error	Minos Error-	Minos Error+	Limit-	Limit+	Fixed
0	x0	760	50			0
1	x1	203	27			0

	x0	x1
x0	2.25e+03	-0.4e3 (-0.341)
x1	-0.4e3 (-0.341)	739

[ ]:

c = Template(n, xe, t, method="da")  # Dembinski, Abdelmotteleb
m5 = Minuit(c, *truth)
m5.limits = (0, None)
m5.migrad()
m5.hesse()

Migrad
FCN = 8.503 (χ²/ndof = 0.7)	Nfcn = 45
EDM = 9.17e-06 (Goal: 0.0002)
Valid Minimum	Below EDM threshold (goal x 10)
No parameters at limit	Below call limit
Hesse ok	Covariance accurate

	Name	Value	Hesse Error	Minos Error-	Minos Error+	Limit-	Limit+	Fixed
0	x0	780	50			0
1	x1	199	28			0

	x0	x1
x0	2.45e+03	-0.5e3 (-0.351)
x1	-0.5e3 (-0.351)	756

[ ]:

for title, m in zip(("full fit", "T(JSC)", "T(ASY)", "T(DA)"), (m1, m3, m4, m5)):
    print(title)
    cov = m.covariance
    for label, p, e in zip(("x0", "x1"), m.values, np.diag(cov) ** 0.5):
        print(f"  {label} {p:.0f} +- {e:.0f}")
    print(f"  correlation {cov[0, 1] / (cov[0, 0] * cov[1, 1]) ** 0.5:.2f}")

full fit
  x0 784 +- 35
  x1 199 +- 26
  correlation -0.52
T(JSC)
  x0 793 +- 51
  x1 197 +- 28
  correlation -0.36
T(ASY)
  x0 760 +- 47
  x1 203 +- 27
  correlation -0.34
T(DA)
  x0 784 +- 49
  x1 199 +- 27
  correlation -0.35

The yields found by the approximate Template methods (T) differ from those found with the exact Barlow-Beeston likelihood (BB) method, because the likelihoods are different. In this particular case, the uncertainty for the signal estimated by the Template methods is larger, but this not the case in general.

The difference shows up in particular in the 68 % confidence regions.

[ ]:

c1 = m1.mncontour("x0", "x1")
c3 = m3.mncontour("x0", "x1")
c4 = m4.mncontour("x0", "x1")
c5 = m5.mncontour("x0", "x1")
plt.plot(c1[:, 0], c1[:, 1], label="BB")
plt.plot(c3[:, 0], c3[:, 1], label="T(JSC)")
plt.plot(c4[:, 0], c4[:, 1], label="T(ASY)")
plt.plot(c5[:, 0], c4[:, 1], label="T(DA)")
plt.xlabel("background yield")
plt.ylabel("signal yield")
plt.legend();

../_images/notebooks_template_fits_26_0.svg

In this particular example, the BB method produces the smallest uncertainty for the background yield. The uncertainty for the signal yield is similar for all methods.

In general, the approximate methods perform well, the T(DA) method in particular is very close to the performance of the exact BB likelihood. The T(DA) method also has the advantage that it can be used with weighted data and/or weighted simulated samples.

For an in-depth comparison of the four methods, see H. Dembinski, A. Abdelmotteleb, Eur.Phys.J.C 82 (2022) 11, 1043.