Bootstrap and Jackknife comparison¶

In this notebook we compare the bootstrap to the jackknife. Bootstrap resampling is superior to jackknifing, but the jackknife is deterministic, which may be helpful, and it can exactly remove biases of order 1/N from an estimator. The bootstrap does not have a simple bias estimator.

We consider as estimators the arithmetic mean and the naive variance $\hat{V} = ⟨ x^{2} ⟩ - ⟨ x ⟩^{2}$ from a sample of inputs. We use resample to compute the variances of these two estimators and their bias. This can be done elegantly by defining a single function fn which returns both estimates.

The exact bias is known for both estimators. It is zero for the mean, because it is a linear function of the sample. For $\hat{V}$ , the bias-corrected estimate is $\frac{N}{N - 1} \hat{V}$ , and thus the bias is $\frac{- 1}{N - 1} \hat{V}$ .

[10]:

from resample import jackknife as j, bootstrap as b
import numpy as np
from scipy import stats

rng = np.random.default_rng(1)
data = rng.normal(size=5)


def fn(d):
    return np.mean(d), np.var(d, ddof=0)  # we return the biased variance


print("estimates           ", np.round(fn(data), 3))
print("std.dev. (jackknife)", np.round(j.variance(fn, data) ** 0.5, 3))
print("std.dev. (bootstrap)", np.round(b.variance(fn, data, random_state=1) ** 0.5, 3))
print("bias (jackknife)    ", np.round(j.bias(fn, data), 3))
print("bias (exact)        ", np.round((0, -1 / (len(data) - 1) * np.var(data, ddof=0)), 3))

estimates            [0.22  0.636]
std.dev. (jackknife) [0.399 0.539]
std.dev. (bootstrap) [0.345 0.36 ]
bias (jackknife)     [ 0.    -0.159]
bias (exact)         [ 0.    -0.159]

The standard deviations for the estimates computed by bootstrap and jackknife differ by about 10 %. This difference shrinks for larger data sets.

The Jackknife find the correct bias for both estimators.