Skip to content

Util

Utility functions for property-based test assertions, etc.

Functions:

Name Description
any_nan_nat_in_awkward_array

True if Awkward Array contains any NaN or NaT values, else False.

any_nan_in_awkward_array

True if Awkward Array contains any NaN values, else False.

any_nat_in_awkward_array

True if Awkward Array contains any NaT values, else False.

get_contents

Return the direct sub-contents of an ak.contents.Content node.

is_leaf

Return True if an ak.contents.Content is a leaf.

is_string_or_bytestring_leaf

Return True if an ak.contents.Content is string or bytestring.

is_string_leaf

Return True if an ak.contents.Content is string.

is_bytestring_leaf

Return True if an ak.contents.Content is bytestring.

iter_contents

Iterate over all contents in an Awkward Array layout.

iter_leaf_contents

Iterate over all leaf contents in an Awkward Array layout.

iter_numpy_arrays

Iterate over all NumPy arrays in an Awkward Array layout.

leaf_size

Count total leaf elements in an Awkward Array layout.

content_size

Count total scalars stored in an Awkward Array layout.

content_own_size

Count the scalars owned directly by a Content node.

any_nan_nat_in_numpy_array

True if NumPy array contains any NaN or NaT values, else False.

any_nan_in_numpy_array

True if NumPy array contains any NaN values, else False.

any_nat_in_numpy_array

True if NumPy array contains any NaT values, else False.

simple_dtypes_in

Return simple dtypes contained in a (compound) dtype d.

simple_dtype_kinds_in

Return character codes of simple dtypes contained in a (compound) dtype d.

n_scalars_in

Return the number of scalar values contained in a value of dtype d.

safe_compare

Return value if not None, else an object true for all comparisons.

safe_max

The largest item in vals that is not None.

safe_min

The smallest item in vals that is not None.

CountdownDrawer

Create a draw function with a shared element budget.

any_nan_nat_in_awkward_array

any_nan_nat_in_awkward_array(a: Array | Content) -> bool

True if Awkward Array contains any NaN or NaT values, else False.

Parameters:

Name Type Description Default
a Array | Content

An Awkward Array.

required

Returns:

Type Description
bool

True if a contains any NaN or NaT values, else False.

Examples:

>>> a = ak.Array([1.0, 2.0, np.nan])
>>> any_nan_nat_in_awkward_array(a)
True
>>> a = ak.Array([1.0, 2.0, 3.0])
>>> any_nan_nat_in_awkward_array(a)
False
>>> a = ak.Array([{'x': 1.0, 'y': np.nan}, {'x': 2.0, 'y': 3.0}])
>>> any_nan_nat_in_awkward_array(a)
True

any_nan_in_awkward_array

any_nan_in_awkward_array(a: Array | Content) -> bool

True if Awkward Array contains any NaN values, else False.

Parameters:

Name Type Description Default
a Array | Content

An Awkward Array.

required

Returns:

Type Description
bool

True if a contains any NaN values, else False.

Examples:

>>> a = ak.Array([1.0, 2.0, np.nan])
>>> any_nan_in_awkward_array(a)
True
>>> a = ak.Array([1.0, 2.0, 3.0])
>>> any_nan_in_awkward_array(a)
False
>>> a = ak.Array([{'x': 1.0, 'y': np.nan}, {'x': 2.0, 'y': 3.0}])
>>> any_nan_in_awkward_array(a)
True

any_nat_in_awkward_array

any_nat_in_awkward_array(a: Array | Content) -> bool

True if Awkward Array contains any NaT values, else False.

Parameters:

Name Type Description Default
a Array | Content

An Awkward Array.

required

Returns:

Type Description
bool

True if a contains any NaT values, else False.

Examples:

>>> a = ak.Array(np.array(['2020-01-01', 'NaT'], dtype='datetime64[D]'))
>>> any_nat_in_awkward_array(a)
True
>>> a = ak.Array(np.array(['2020-01-01', '2020-01-02'], dtype='datetime64[D]'))
>>> any_nat_in_awkward_array(a)
False

get_contents

get_contents(
    c: Content,
    /,
    *,
    string_as_leaf: bool = True,
    bytestring_as_leaf: bool = True,
) -> tuple[Content, ...]

Return the direct sub-contents of an ak.contents.Content node.

Dispatch is performed with functools.singledispatch so support for a new Content subclass can be added by calling get_contents.register without modifying this function.

Parameters:

Name Type Description Default
c Content

An Awkward Content node. ak.Array is not accepted — unwrap with a.layout first.

required
string_as_leaf bool

If True (default), treat string ListOffsetArray/ListArray/ RegularArray nodes as leaves — return () rather than (c.content,).

True
bytestring_as_leaf bool

Same as string_as_leaf for bytestring nodes.

True

Returns:

Type Description
tuple[Content, ...]

The direct sub-contents, in natural order (field order for RecordArray, member order for UnionArray). Empty for NumpyArray, EmptyArray, and list types configured as leaves.

Examples:

>>> import numpy as np
>>> from awkward.contents import NumpyArray, RegularArray
>>> c = NumpyArray(np.array([1, 2, 3]))
>>> get_contents(c)
()
>>> c = RegularArray(NumpyArray(np.array([1, 2, 3, 4])), size=2)
>>> subs = get_contents(c)
>>> len(subs) == 1 and subs[0] is c.content
True

is_leaf

is_leaf(
    c: Content,
    /,
    *,
    string_as_leaf: bool = True,
    bytestring_as_leaf: bool = True,
) -> bool

Return True if an ak.contents.Content is a leaf.

NumpyArray and EmptyArray are always leaves. String and bytestring list nodes are leaves only when the respective flag is set. Wrappers (RecordArray, UnionArray, option/masked types, non-string list types) are never leaves. Unknown types fall back to False.

Dispatch is performed with functools.singledispatch so support for a new Content subclass can be added by calling is_leaf.register without modifying this function.

Parameters:

Name Type Description Default
c Content

An Awkward Content node.

required
string_as_leaf bool

If True (default), treat string ListOffsetArray/ ListArray/RegularArray nodes as leaves.

True
bytestring_as_leaf bool

Same as string_as_leaf for bytestring nodes.

True

Returns:

Type Description
bool

True if c is a leaf under the given flags, else False.

Examples:

>>> import numpy as np
>>> from awkward.contents import NumpyArray, RegularArray
>>> is_leaf(NumpyArray(np.array([1, 2, 3])))
True

A non-string RegularArray is not a leaf:

>>> c = RegularArray(NumpyArray(np.array([1, 2, 3, 4])), size=2)
>>> is_leaf(c)
False

is_string_or_bytestring_leaf

is_string_or_bytestring_leaf(
    c: Content,
    string_as_leaf: bool = True,
    bytestring_as_leaf: bool = True,
) -> bool

Return True if an ak.contents.Content is string or bytestring.

Parameters:

Name Type Description Default
c Content

An Awkward Content node.

required
string_as_leaf bool

If True (default), treat string content as a leaf.

True
bytestring_as_leaf bool

If True (default), treat bytestring content as a leaf.

True

Returns:

Type Description
bool

True if the content is a string or bytestring leaf.

is_string_leaf

is_string_leaf(c: Content) -> bool

Return True if an ak.contents.Content is string.

Parameters:

Name Type Description Default
c Content

An Awkward Content node.

required

Returns:

Type Description
bool

True if the content has __array__ parameter 'string'.

is_bytestring_leaf

is_bytestring_leaf(c: Content) -> bool

Return True if an ak.contents.Content is bytestring.

Parameters:

Name Type Description Default
c Content

An Awkward Content node.

required

Returns:

Type Description
bool

True if the content has __array__ parameter 'bytestring'.

iter_contents

iter_contents(
    a: Array | Content,
    /,
    *,
    string_as_leaf: bool = True,
    bytestring_as_leaf: bool = True,
) -> Iterator[Content]

Iterate over all contents in an Awkward Array layout.

Parameters:

Name Type Description Default
a Array | Content

An Awkward Array or Content.

required
string_as_leaf bool

If True (default), treat string ListOffsetArray/ListArray/ RegularArray nodes as leaves — do not descend into the inner NumpyArray(uint8).

True
bytestring_as_leaf bool

If True (default), treat bytestring nodes as leaves.

True

Yields:

Type Description
Content

Each content node in the layout.

iter_leaf_contents

iter_leaf_contents(
    a: Array | Content,
    /,
    *,
    string_as_leaf: bool = True,
    bytestring_as_leaf: bool = True,
) -> Iterator[Content]

Iterate over all leaf contents in an Awkward Array layout.

Parameters:

Name Type Description Default
a Array | Content

An Awkward Array or Content.

required
string_as_leaf bool

If True (default), treat string ListOffsetArray/ListArray/ RegularArray nodes as leaves.

True
bytestring_as_leaf bool

If True (default), treat bytestring nodes as leaves.

True

Yields:

Type Description
Content

Each leaf content in the layout.

iter_numpy_arrays

iter_numpy_arrays(
    a: Array | Content,
    /,
    *,
    exclude_string: bool = True,
    exclude_bytestring: bool = True,
) -> Iterator[np.ndarray]

Iterate over all NumPy arrays in an Awkward Array layout.

Parameters:

Name Type Description Default
a Array | Content

An Awkward Array or Content.

required
exclude_string bool

If True (default), exclude the inner uint8 data of string nodes.

True
exclude_bytestring bool

If True (default), exclude the inner uint8 data of bytestring nodes.

True

Yields:

Type Description
ndarray

Each underlying NumPy array in the layout.

Examples:

>>> a = ak.Array([[1.0, 2.0], [3.0]])
>>> list(iter_numpy_arrays(a))
[array([1., 2., 3.])]
>>> a = ak.Array([{'x': 1, 'y': 2.0}, {'x': 3, 'y': 4.0}])
>>> sorted([arr.dtype for arr in iter_numpy_arrays(a)], key=str)
[dtype('float64'), dtype('int64')]

leaf_size

leaf_size(a: Array | Content) -> int

Count total leaf elements in an Awkward Array layout.

Each NumpyArray element counts as one. Each string and bytestring (not character or byte) counts as one. EmptyArray counts as zero.

Parameters:

Name Type Description Default
a Array | Content

An Awkward Array or Content.

required

Returns:

Type Description
int

Total number of leaf elements.

Examples:

>>> a = ak.Array([1, 2, 3])
>>> leaf_size(a)
3
>>> a = ak.Array([[1, 2], [3]])
>>> leaf_size(a)
3
>>> a = ak.Array(['hello', 'world'])
>>> leaf_size(a)
2

content_size

content_size(a: Array | Content) -> int

Count total scalars stored in an Awkward Array layout.

Counts data elements, offset/index buffer elements, and metadata values (RegularArray.size, RecordArray field names).

Parameters:

Name Type Description Default
a Array | Content

An Awkward Array or Content.

required

Returns:

Type Description
int

Total number of scalars stored in the content tree.

Examples:

A flat array has content_size equal to its length:

>>> a = ak.Array([1, 2, 3])
>>> content_size(a)
3

A variable-length list array counts offsets (n+1) plus child data:

>>> a = ak.Array([[1, 2], [3]])
>>> content_size(a)  # 3 offsets + 3 data = 6
6

A string array counts offsets (n+1) plus UTF-8 bytes:

>>> a = ak.Array(['hello', 'world'])
>>> content_size(a)  # 3 offsets + 10 bytes = 13
13

content_own_size

content_own_size(c: Content) -> int

Count the scalars owned directly by a Content node.

Counts the node's own buffer elements (offsets, starts/stops, mask, tags, index, numeric data) and metadata values (RegularArray.size, RecordArray field names, BitMaskedArray/ByteMaskedArray flags). Does not recurse into sub-contents — that is handled by content_size via get_contents.

Dispatch is performed with functools.singledispatch so support for a new Content subclass can be added by calling content_own_size.register without modifying this function.

Parameters:

Name Type Description Default
c Content

An Awkward Content node.

required

Returns:

Type Description
int

Number of scalars stored directly on c, excluding sub-contents.

Examples:

>>> import numpy as np
>>> from awkward.contents import NumpyArray, RegularArray
>>> c = NumpyArray(np.array([1, 2, 3]))
>>> content_own_size(c)
3

The size metadata of a RegularArray counts as one; the inner NumpyArray data is not included here:

>>> c = RegularArray(NumpyArray(np.array([1, 2, 3, 4])), size=2)
>>> content_own_size(c)
1

any_nan_nat_in_numpy_array

any_nan_nat_in_numpy_array(n: ndarray) -> bool

True if NumPy array contains any NaN or NaT values, else False.

Parameters:

Name Type Description Default
n ndarray

A NumPy array.

required

Returns:

Type Description
bool

True if n contains any NaN or NaT values, else False.

Examples:

>>> n = np.array([1.0, 2.0, np.nan])
>>> any_nan_nat_in_numpy_array(n)
True
>>> n = np.array([1.0, 2.0, 3.0])
>>> any_nan_nat_in_numpy_array(n)
False
>>> n = np.array(
...     [(1, np.datetime64('2020-01-01')), (2, np.datetime64('NaT'))],
...     dtype=[('a', 'i4'), ('b', 'M8[D]')],
... )
>>> any_nan_nat_in_numpy_array(n)
True

any_nan_in_numpy_array

any_nan_in_numpy_array(n: ndarray) -> bool

True if NumPy array contains any NaN values, else False.

Parameters:

Name Type Description Default
n ndarray

A NumPy array.

required

Returns:

Type Description
bool

True if n contains any NaN values, else False.

Examples:

>>> n = np.array([1.0, 2.0, np.nan])
>>> any_nan_in_numpy_array(n)
True
>>> n = np.array([1.0, 2.0, 3.0])
>>> any_nan_in_numpy_array(n)
False
>>> n = np.array([(1, 2.0), (2, np.nan)], dtype=[('a', 'i4'), ('b', 'f8')])
>>> any_nan_in_numpy_array(n)
True

any_nat_in_numpy_array

any_nat_in_numpy_array(n: ndarray) -> bool

True if NumPy array contains any NaT values, else False.

Parameters:

Name Type Description Default
n ndarray

A NumPy array.

required

Returns:

Type Description
bool

True if n contains any NaT values, else False.

Examples:

>>> n = np.array(['2020-01-01', 'NaT'], dtype='datetime64[D]')
>>> any_nat_in_numpy_array(n)
True
>>> n = np.array(['2020-01-01', '2020-01-02'], dtype='datetime64[D]')
>>> any_nat_in_numpy_array(n)
False
>>> n = np.array(
...     [(1, np.datetime64('2020-01-01')), (2, np.datetime64('NaT'))],
...     dtype=[('a', 'i4'), ('b', 'M8[D]')],
... )
>>> any_nat_in_numpy_array(n)
True

simple_dtypes_in

simple_dtypes_in(d: dtype) -> set[np.dtype]

Return simple dtypes contained in a (compound) dtype d.

Parameters:

Name Type Description Default
d dtype

A NumPy dtype. It can be a sub-array or structured dtype as well as a simple dtype.

required

Returns:

Type Description
set of np.dtype

Simple dtypes contained in d.

Examples:

>>> simple_dtypes_in(np.dtype('int32'))
{dtype('int32')}
>>> sorted(simple_dtypes_in(np.dtype([('f0', 'i4'), ('f1', 'f8')])))
[dtype('int32'), dtype('float64')]

simple_dtype_kinds_in

simple_dtype_kinds_in(d: dtype) -> set[str]

Return character codes of simple dtypes contained in a (compound) dtype d.

Parameters:

Name Type Description Default
d dtype

A NumPy dtype. It can be a sub-array or structured dtype as well as a simple dtype.

required

Returns:

Type Description
set of str

Character codes of simple dtypes contained in d.

Examples:

>>> simple_dtype_kinds_in(np.dtype('int32'))
{'i'}
>>> sorted(simple_dtype_kinds_in(np.dtype([('f0', 'i4'), ('f1', 'f8')])))
['f', 'i']

n_scalars_in

n_scalars_in(d: dtype) -> int

Return the number of scalar values contained in a value of dtype d.

Parameters:

Name Type Description Default
d dtype

A NumPy dtype. It can be a sub-array or structured dtype as well as a simple dtype.

required

Returns:

Type Description
int

The number of scalar values contained in a value of dtype d.

Examples:

>>> n_scalars_in(np.dtype('int32'))
1
>>> n_scalars_in(np.dtype(('int32', (3, 4))))
12
>>> n_scalars_in(np.dtype([('f0', 'i4'), ('f1', ('f8', (2,)))]))
3

safe_compare

safe_compare(value: T | None) -> T | GreaterAndLessThanAny

Return value if not None, else an object true for all comparisons.

This function helps you concisely write assertions that compare values that may be None.

Parameters:

Name Type Description Default
value T | None

A value or None.

required

Examples:

Suppose you have min_ and max_ that may be None

>>> import random
>>> min_ = random.choice([None, 1])
>>> max_ = random.choice([None, 3])

and val that should be in the range [min_, max_]:

>>> val = 2

Without this function, you need to check if min_ and max_ are None.

>>> if min_ is not None:
...     assert min_ <= val
>>> if max_ is not None:
...     assert val <= max_

This function lets you write the same assertion in one line:

>>> assert safe_compare(min_) <= val <= safe_compare(max_)

safe_max

safe_max(
    vals: Iterable[T], default: Optional[T] = None
) -> Optional[T]

The largest item in vals that is not None.

Parameters:

Name Type Description Default
vals Iterable[T]

An iterable of values.

required
default Optional[T]

The value to return if vals is empty or all items are None.

None

Examples:

>>> safe_max([None, 1, 2, None])
2

It returns None if vals is empty or all items in vals are None.

>>> print(safe_max([None, None]))
None
>>> print(safe_max([]))
None

If default is given, it returns default instead of None.

>>> safe_max([None, None], default=-1)
-1
>>> safe_max([], default=-1)
-1

safe_min

safe_min(
    vals: Iterable[T], default: Optional[T] = None
) -> Optional[T]

The smallest item in vals that is not None.

Parameters:

Name Type Description Default
vals Iterable[T]

An iterable of values.

required
default Optional[T]

The value to return if vals is empty or all items are None.

None

Examples:

>>> safe_min([None, 1, 2, None])
1

It returns None if vals is empty or all items in vals are None.

>>> print(safe_min([None, None]))
None
>>> print(safe_min([]))
None

If default is given, it returns default instead of None.

>>> safe_min([None, None], default=-1)
-1
>>> safe_min([], default=-1)
-1

CountdownDrawer

CountdownDrawer(
    draw: DrawFn,
    st_: _StWithMinMaxSize[_T],
    min_size_each: int = 0,
    max_size_each: int | None = None,
    min_size_total: int = 0,
    max_size_total: int = 10,
    max_draws: int = 100,
) -> Callable[[], _T | None]

Create a draw function with a shared element budget.

Each call draws from st_ and adds the length of the result to a running total. Returns None once the budget is exhausted, too small to satisfy min_size_each, or the draw limit is reached.

Parameters:

Name Type Description Default
draw DrawFn

The Hypothesis draw function.

required
st_ _StWithMinMaxSize[_T]

A callable that accepts min_size and max_size keyword arguments and returns a strategy.

required
min_size_each int

Minimum number of elements in each draw.

0
max_size_each int | None

Maximum number of elements in each draw. If None, only max_size_total limits the size.

None
min_size_total int

Minimum total elements across all draws.

0
max_size_total int

Total element budget shared across all draws.

10
max_draws int

Maximum number of non-None draws.

100