Testing Awkward Array¶

Awkward Array represents nested, variable-length, and mixed-type data, so its valid arrays span a large combinatorial space of layouts. Test data written by hand covers only a small part of that space, and failures often occur on input shapes that are absent from hand-written test cases. This makes Awkward Array hard to test thoroughly.

This page is for Awkward Array contributors and for readers assessing the project. If you want to use the strategies in your own tests, start with Getting Started.

How hypothesis-awkward tests Awkward Array¶

This package addresses that problem with property-based testing: running one test against many automatically generated inputs instead of a fixed list. Its main strategy, constructors.arrays(), generates nearly fully general Awkward Arrays, including virtual arrays (arrays whose buffers are not yet materialized). See Getting Started for what it produces.

These strategies are integrated into Awkward Array's continuous integration (CI). The first property-based tests were added in #3887 and now run on every change. Two kinds of properties are checked so far:

Round-trip. Converting an array to buffers with ak.to_buffers() and back with ak.from_buffers() reconstructs an equal array (#3887).

from hypothesis import given

import awkward as ak
import hypothesis_awkward.strategies as st_ak


@given(a=st_ak.constructors.arrays())
def test_roundtrip(a: ak.Array) -> None:
    sent = ak.to_buffers(a)
    returned = ak.from_buffers(*sent)
    assert ak.array_equal(a, returned, equal_nan=True)

Equality. ak.array_equal() is reflexive (an array equals itself) and symmetric (if a equals b, then b equals a) (#3891).

@given(a=st_ak.constructors.arrays())
def test_reflexivity(a: ak.Array) -> None:
    assert ak.array_equal(a, a, equal_nan=True)


@given(a1=st_ak.constructors.arrays(), a2=st_ak.constructors.arrays())
def test_symmetry(a1: ak.Array, a2: ak.Array) -> None:
    forward = ak.array_equal(a1, a2, equal_nan=True)
    backward = ak.array_equal(a2, a1, equal_nan=True)
    assert forward == backward

equal_nan=True treats two NaN values as equal, which the generated floating-point arrays require. When a property fails, Hypothesis shrinks the input to a minimal failing array, which is why the reports below reduce to small, reproducible cases.

The goal is to cover all testable properties of Awkward Array, including operations, slicing, reducers, and kernels.

Bugs found¶

These tests have found bugs in both Awkward Array and Hypothesis.

Awkward Array¶

#3888 (fixed) — ak.array_equal() raises an error on virtual arrays and returns the wrong result for empty unions.
#3921 (fixed) — ak.array_equal() returned the wrong result for datetimes and timedeltas containing NaT (not-a-time).
#3962 (fixed) — ak.almost_equal(), which backs ak.array_equal(), compared record-array fields incorrectly.
#4126 (open) — IndexedOptionArray.to_ByteMaskedArray raises a TypeError when its content is an empty EmptyArray.

The test suite also reproduces or accounts for two known Awkward Array issues. A test in test_from_buffers.py reproduces an ak.from_buffers() bug with virtual buffers and a RegularArray(size=0) inside a BitMaskedArray; it raised an AssertionError on Awkward Array v2.9.0, is marked xfail for that version, and was fixed by #3889. A NumPy property test in test_numpy_arrays.py accounts for #3690 (open): ak.to_numpy() does not support structured arrays whose fields are not one-dimensional.

Hypothesis¶

#4708 (fixed) — an AssertionError in Shrinker.explain() for unstable span labels. Fixed in #4717 and released in Hypothesis 6.152.4.

Outlook¶

Broad, automatically generated test inputs raise confidence that a change is correct across the full range of valid arrays, not only the cases a developer wrote by hand. This supports iterative development and test-driven development (TDD) with artificial-intelligence (AI) coding assistants. The plan is to extend coverage to all testable properties of Awkward Array.