Introduction#

Measurements in High Energy Physics (HEP) rely on determining the compatibility of observed collision events with theoretical predictions. The relationship between them is often formalised in a statistical model f(x|ϕ) describing the probability of data x given model parameters ϕ. Given observed data, the likelihood L(ϕ) then serves as the basis to test hypotheses on the parameters ϕ. For measurements based on binned data (histograms), the HistFactory family of statistical models has been widely used in both Standard Model measurements [intro-4] as well as searches for new physics [intro-5]. In this package, a declarative, plain-text format for describing HistFactory-based likelihoods is presented that is targeted for reinterpretation and long-term preservation in analysis data repositories such as HEPData [intro-3].

HistFactory#

Statistical models described using HistFactory [intro-2] center around the simultaneous measurement of disjoint binned distributions (channels) observed as event counts n. For each channel, the overall expected event rate [1] is the sum over a number of physics processes (samples). The sample rates may be subject to parametrised variations, both to express the effect of free parameters η [2] and to account for systematic uncertainties as a function of constrained parameters χ. The degree to which the latter can cause a deviation of the expected event rates from the nominal rates is limited by constraint terms. In a frequentist framework these constraint terms can be viewed as auxiliary measurements with additional global observable data a, which paired with the channel data n completes the observation x=(n,a). In addition to the partition of the full parameter set into free and constrained parameters ϕ=(η,χ), a separate partition ϕ=(ψ,θ) will be useful in the context of hypothesis testing, where a subset of the parameters are declared parameters of interest ψ and the remaining ones as nuisance parameters θ.

(1)#f(x|ϕ)=f(x|ηfree,χconstrained)=f(x|ψparameters of interest,θnuisance parameters)

Thus, the overall structure of a HistFactory probability model is a product of the analysis-specific model term describing the measurements of the channels and the analysis-independent set of constraint terms:

(2)#f(n,a|η,χ)=cchannelsbbinscPois(ncb|νcb(η,χ))Simultaneous measurementof multiple channelsχχcχ(aχ|χ)constraint termsfor auxiliary measurements,

where within a certain integrated luminosity we observe ncb events given the expected rate of events νcb(η,χ) as a function of unconstrained parameters η and constrained parameters χ. The latter has corresponding one-dimensional constraint terms cχ(aχ|χ) with auxiliary data aχ constraining the parameter χ. The event rates νcb are defined as

(3)#νcb(ϕ)=ssamplesνscb(η,χ)=ssamples(κκκscb(η,χ))multiplicative modifiers(νscb0(η,χ)+ΔΔΔscb(η,χ)additive modifiers).

The total rates are the sum over sample rates νcsb, each determined from a nominal rate νscb0 and a set of multiplicative and additive denoted rate modifiers κ(ϕ) and Δ(ϕ). These modifiers are functions of (usually a single) model parameters. Starting from constant nominal rates, one can derive the per-bin event rate modification by iterating over all sample rate modifications as shown in (3).

As summarised in Modifiers and Constraints, rate modifications are defined in HistFactory for bin b, sample s, channel c. Each modifier is represented by a parameter ϕ{γ,α,λ,μ}. By convention bin-wise parameters are denoted with γ and interpolation parameters with α. The luminosity λ and scale factors μ affect all bins equally. For constrained modifiers, the implied constraint term is given as well as the necessary input data required to construct it. σb corresponds to the relative uncertainty of the event rate, whereas δb is the event rate uncertainty of the sample relative to the total event rate νb=sνsb0.

Modifiers implementing uncertainties are paired with a corresponding default constraint term on the parameter limiting the rate modification. The available modifiers may affect only the total number of expected events of a sample within a given channel, i.e. only change its normalisation, while holding the distribution of events across the bins of a channel, i.e. its “shape”, invariant. Alternatively, modifiers may change the sample shapes. Here HistFactory supports correlated an uncorrelated bin-by-bin shape modifications. In the former, a single nuisance parameter affects the expected sample rates within the bins of a given channel, while the latter introduces one nuisance parameter for each bin, each with their own constraint term. For the correlated shape and normalisation uncertainties, HistFactory makes use of interpolating functions, fp and gp, constructed from a small number of evaluations of the expected rate at fixed values of the parameter α [3]. For the remaining modifiers, the parameter directly affects the rate.

Modifiers and Constraints#

Description

Modification

Constraint Term cχ

Input

Uncorrelated Shape

κscb(γb)=γb

bPois(rb=σb2|ρb=σb2γb)

σb

Correlated Shape

Δscb(α)=fp(α|Δscb,α=1,Δscb,α=1)

Gaus(a=0|α,σ=1)

Δscb,α=±1

Normalisation Unc.

κscb(α)=gp(α|κscb,α=1,κscb,α=1)

Gaus(a=0|α,σ=1)

κscb,α=±1

MC Stat. Uncertainty

κscb(γb)=γb

bGaus(aγb=1|γb,δb)

δb2=sδsb2

Luminosity

κscb(λ)=λ

Gaus(l=λ0|λ,σλ)

λ0,σλ

Normalisation

κscb(μb)=μb

Data-driven Shape

κscb(γb)=γb

Given the likelihood L(ϕ), constructed from observed data in all channels and the implied auxiliary data, measurements in the form of point and interval estimates can be defined. The majority of the parameters are nuisance parameters — parameters that are not the main target of the measurement but are necessary to correctly model the data. A small subset of the unconstrained parameters may be declared as parameters of interest for which measurements hypothesis tests are performed, e.g. profile likelihood methods [intro-1]. The Symbol Notation table provides a summary of all the notation introduced in this documentation.

Symbol Notation#

Symbol

Name

f(x|ϕ)

model

L(ϕ)

likelihood

x={n,a}

full dataset (including auxiliary data)

n

channel data (or event counts)

a

auxiliary data

ν(ϕ)

calculated event rates

ϕ={η,χ}={ψ,θ}

all parameters

η

free parameters

χ

constrained parameters

ψ

parameters of interest

θ

nuisance parameters

κ(ϕ)

multiplicative rate modifier

Δ(ϕ)

additive rate modifier

cχ(aχ|χ)

constraint term for constrained parameter χ

σχ

relative uncertainty in the constrained parameter

Declarative Formats#

While flexible enough to describe a wide range of LHC measurements, the design of the HistFactory specification is sufficiently simple to admit a declarative format that fully encodes the statistical model of the analysis. This format defines the channels, all associated samples, their parameterised rate modifiers and implied constraint terms as well as the measurements. Additionally, the format represents the mathematical model, leaving the implementation of the likelihood minimisation to be analysis-dependent and/or language-dependent. Originally XML was chosen as a specification language to define the structure of the model while introducing a dependence on ROOT to encode the nominal rates and required input data of the constraint terms [intro-2]. Using this specification, a model can be constructed and evaluated within the RooFit framework.

This package introduces an updated form of the specification based on the ubiquitous plain-text JSON format and its schema-language JSON Schema. Described in more detail in Likelihood Specification, this schema fully specifies both structure and necessary constrained data in a single document and thus is implementation independent.

Additional Material#

Footnotes#

Bibliography#

[intro-1]

Glen Cowan, Kyle Cranmer, Eilam Gross, and Ofer Vitells. Asymptotic formulae for likelihood-based tests of new physics. Eur. Phys. J. C, 71:1554, 2011. arXiv:1007.1727, doi:10.1140/epjc/s10052-011-1554-0.

[intro-2] (1,2,3)

Kyle Cranmer, George Lewis, Lorenzo Moneta, Akira Shibata, and Wouter Verkerke. HistFactory: A tool for creating statistical models for use with RooFit and RooStats. Technical Report CERN-OPEN-2012-016, New York U., New York, Jan 2012. URL: https://cds.cern.ch/record/1456844.

[intro-3]

Eamonn Maguire, Lukas Heinrich, and Graeme Watt. HEPData: a repository for high energy physics data. J. Phys. Conf. Ser., 898(10):102006, 2017. arXiv:1704.05473, doi:10.1088/1742-6596/898/10/102006.

[intro-4]

ATLAS Collaboration. Measurements of Higgs boson production and couplings in diboson final states with the ATLAS detector at the LHC. Phys. Lett. B, 726:88, 2013. arXiv:1307.1427, doi:10.1016/j.physletb.2014.05.011.

[intro-5]

ATLAS Collaboration. Search for supersymmetry in final states with missing transverse momentum and multiple b-jets in proton–proton collisions at s=13 TeV with the ATLAS detector. ATLAS-CONF-2018-041, 2018. URL: https://cds.cern.ch/record/2632347.