# Histogramming With Bayesian Blocks¶

The Bayesian Block algorithm, originally developed for applications in astronomy, can be used to improve the binning of histograms in high energy physics (HEP). The visual improvement can be dramatic, and more importantly, this algorithm produces histograms that accurately represent the underlying distribution while being robust to statistical fluctuations. The key concept behind Bayesian Blocks is that variable-width bins are determined for a given distribution, such that the data within each bin is consistent with a uniform distribution across the range of that bin. This reduces the appearance of statistical fluctuations while still capturing the form of the underlying distribution.

For more information on the algorithm and implementation, see:

For code documentation, see: http://scikit-hep.org/api/modeling.html

## Using Bayesian Blocks Binning¶

Bayesian Blocks binning options are available as part of
**Scikit-HEP**’s histogramming packages. Below is a simple example:

```
In [1]:
```

```
import numpy as np
from skhep.visual import MplPlotter as skh_plt
%matplotlib inline
plt.rcParams['figure.figsize'] = (8,6)
np.random.seed(1001)
data = np.random.laplace(size=10000)
```

```
In [2]:
```

```
_ = skh_plt.hist(data, bins=1000, scale='binwidth', label='Fine Binning')
_ = skh_plt.hist(data, bins='blocks', scale='binwidth', label='Bayesian Blocks', histtype='step')
plt.legend(loc=2)
plt.show()
```

To call Bayesian Blocks within the `MplPlotter.hist`

plotting
function, simply set `bins='blocks'`

. For appropriate visualization,
one should typically also use `scale='binwidth'`

. This divides each
bin by its width, which is important for capturing the overall shape of
the underlying distribution. Without using this argument, the histogram
will look jagged (a consequnce of using variable-width binning).

```
In [3]:
```

```
fig, axes = plt.subplots(ncols=2, figsize=(12,5))
_ = skh_plt.hist(data, bins='blocks', label='Bayesian Blocks', histtype='step', ax=axes[0])
axes[0].set_title('Unscaled')
_ = skh_plt.hist(data, bins='blocks', scale='binwidth', label='Bayesian Blocks', histtype='step', ax=axes[1])
axes[1].set_title('Scaled by bin width')
plt.show()
```

The user has control over an additional parameter to determine how many
bins are generated by the Bayesian Blocks algorithm. The `p0`

parameter (valid between 0 and 1) determines how strictly the algorithm
determines bin edges. A small `p0`

will be more robust to statistical
fluctuations in the data, but could be overly coarse. Conversely, a
large `p0`

will result in a finer binning, but could isolate spurious
fluctuations in the data.

```
In [4]:
```

```
fig, axes = plt.subplots(nrows=2, ncols=2, figsize=(12,12))
_ = skh_plt.hist(data, bins='blocks', scale='binwidth', histtype='step', ax=axes[0][0], p0=1e-50)
axes[0][0].set_title('p0=1e-50')
_ = skh_plt.hist(data, bins='blocks', scale='binwidth', histtype='step', ax=axes[0][1], p0=1e-5)
axes[0][1].set_title('p0=1e-5')
_ = skh_plt.hist(data, bins='blocks', scale='binwidth', histtype='step', ax=axes[1][0], p0=1e-3)
axes[1][0].set_title('p0=1e-3')
_ = skh_plt.hist(data, bins='blocks', scale='binwidth', histtype='step', ax=axes[1][1], p0=0.5)
axes[1][1].set_title('p0=0.5')
fig.suptitle('Varying the p0 parameter')
plt.show()
```

The optimal value of `p0`

differs, depending on the number of data
points and the nature of the underlying distribution. It typically must
be determined empirically, but in general the value of `p0`

should be
inversely proportional the size of the input dataset.

## Comparison with Other Binning Methods¶

Because Bayesian Blocks determines variable-width binning, the algorithm can provide a more optimal set of bins for a given distribution, especially if that distribution varies greatly in density. Below are some examples of Bayesian Blocks and other popular binning methods.

**A rapidly falling distribution:**

**An asymmetric, peaked distribution:**

**Two peaks of different widths:**

*Brian Pollack, 2018*