Util¶

Utility functions for property-based test assertions, etc.

Functions:

Name	Description
`any_nan_nat_in_awkward_array`	`True` if Awkward Array contains any `NaN` or `NaT` values, else `False`.
`any_nan_in_awkward_array`	`True` if Awkward Array contains any `NaN` values, else `False`.
`any_nat_in_awkward_array`	`True` if Awkward Array contains any `NaT` values, else `False`.
`get_contents`	Return the direct inner contents of the given content.
`is_leaf`	Return `True` if an `ak.contents.Content` is a leaf.
`is_string_or_bytestring_leaf`	Return `True` if an `ak.contents.Content` is string or bytestring.
`is_string_leaf`	Return `True` if an `ak.contents.Content` is string.
`is_bytestring_leaf`	Return `True` if an `ak.contents.Content` is bytestring.
`iter_contents`	Iterate over all contents in an Awkward Array layout.
`iter_leaf_contents`	Iterate over all leaf contents in an Awkward Array layout.
`iter_numpy_arrays`	Iterate over all NumPy arrays in an Awkward Array layout.
`leaf_size`	Count total leaf elements in an Awkward Array layout.
`content_size`	Count total scalars stored in an Awkward Array layout.
`content_own_size`	Count the scalars owned directly by a `Content` node.
`any_nan_nat_in_numpy_array`	`True` if NumPy array contains any `NaN` or `NaT` values, else `False`.
`any_nan_in_numpy_array`	`True` if NumPy array contains any `NaN` values, else `False`.
`any_nat_in_numpy_array`	`True` if NumPy array contains any `NaT` values, else `False`.
`simple_dtypes_in`	Return simple dtypes contained in a (compound) dtype `d`.
`simple_dtype_kinds_in`	Return character codes of simple dtypes contained in a (compound) dtype `d`.
`n_scalars_in`	Return the number of scalar values contained in a value of dtype `d`.
`safe_compare`	Return `value` if not `None`, else an object true for all comparisons.
`safe_max`	The largest item in `vals` that is not `None`.
`safe_min`	The smallest item in `vals` that is not `None`.
`CountdownDrawer`	Create a draw function with a shared element budget.

any_nan_nat_in_awkward_array ¶

any_nan_nat_in_awkward_array(a: Array | Content) -> bool

True if Awkward Array contains any NaN or NaT values, else False.

Parameters:

Name	Type	Description	Default
`a`	`Array \| Content`	An Awkward Array.	required

Returns:

Type	Description
`bool`	`True` if `a` contains any `NaN` or `NaT` values, else `False`.

Examples:

>>> a = ak.Array([1.0, 2.0, np.nan])
>>> any_nan_nat_in_awkward_array(a)
True

>>> a = ak.Array([1.0, 2.0, 3.0])
>>> any_nan_nat_in_awkward_array(a)
False

>>> a = ak.Array([{'x': 1.0, 'y': np.nan}, {'x': 2.0, 'y': 3.0}])
>>> any_nan_nat_in_awkward_array(a)
True

any_nan_in_awkward_array ¶

any_nan_in_awkward_array(a: Array | Content) -> bool

True if Awkward Array contains any NaN values, else False.

Parameters:

Name	Type	Description	Default
`a`	`Array \| Content`	An Awkward Array.	required

Returns:

Type	Description
`bool`	`True` if `a` contains any `NaN` values, else `False`.

Examples:

>>> a = ak.Array([1.0, 2.0, np.nan])
>>> any_nan_in_awkward_array(a)
True

>>> a = ak.Array([1.0, 2.0, 3.0])
>>> any_nan_in_awkward_array(a)
False

>>> a = ak.Array([{'x': 1.0, 'y': np.nan}, {'x': 2.0, 'y': 3.0}])
>>> any_nan_in_awkward_array(a)
True

any_nat_in_awkward_array ¶

any_nat_in_awkward_array(a: Array | Content) -> bool

True if Awkward Array contains any NaT values, else False.

Parameters:

Name	Type	Description	Default
`a`	`Array \| Content`	An Awkward Array.	required

Returns:

Type	Description
`bool`	`True` if `a` contains any `NaT` values, else `False`.

Examples:

>>> a = ak.Array(np.array(['2020-01-01', 'NaT'], dtype='datetime64[D]'))
>>> any_nat_in_awkward_array(a)
True

>>> a = ak.Array(np.array(['2020-01-01', '2020-01-02'], dtype='datetime64[D]'))
>>> any_nat_in_awkward_array(a)
False

get_contents ¶

get_contents(
    c: Content,
    /,
    *,
    string_as_leaf: bool = True,
    bytestring_as_leaf: bool = True,
) -> tuple[Content, ...]

Return the direct inner contents of the given content.

This function receives an instance of a subclass of Content and returns a tuple of its direct inner contents based on the optional arguments.

This function is a functools.singledispatch. Support for a new Content subclass can be added with get_contents.register().

Parameters:

Name	Type	Description	Default
`c`	`Content`	An instance of a subclass of `Content`.	required
`string_as_leaf`	`bool`	Whether to consider a string list as a leaf content. See Examples below.	`True`
`bytestring_as_leaf`	`bool`	Whether to consider a bytestring list as a leaf content. See Examples below.	`True`

Returns:

Type	Description
`tuple[Content, ...]`	The direct inner contents, or an empty tuple if none.

Examples:

EmptyArray / NumpyArray: These are leaf contents and have no inner contents. The get_contents() returns an empty tuple ().

>>> from awkward.contents import EmptyArray, NumpyArray
>>> c = EmptyArray()
>>> get_contents(c)
()
>>> c = NumpyArray([1, 2, 3])
>>> get_contents(c)
()

RegularArray / ListArray / ListOffsetArray: These have one inner content. The get_contents() returns a tuple with one element of the inner content:

>>> from awkward.contents import RegularArray, ListArray, ListOffsetArray
>>> i = ak.from_iter([1, 2, 3, 4, 5, 6], highlevel=False)
>>> c = RegularArray(i, size=2)
>>> get_contents(c) == (i,)
True
>>> start = ak.index.Index64([0, 3, 3])
>>> stop = ak.index.Index64([3, 3, 5])
>>> c = ListArray(start, stop, i)
>>> get_contents(c) == (i,)
True
>>> offsets = ak.index.Index64([0, 3, 3, 5])
>>> c = ListOffsetArray(offsets, i)
>>> get_contents(c) == (i,)
True

Strings and bytestrings: An array of strings (bytestrings) are a ListOffsetArray, ListArray, or RegularArray with an inner NumpyArray. However, by default, get_contents() considers them leaf contents and returns an empty tuple (). With the option string_as_leaf=False (bytestring_as_leaf=False), it returns a tuple with the single content of the underlying NumpyArray:

>>> c = ak.from_iter(['abc', 'de'], highlevel=False)
>>> get_contents(c)
()
>>> get_contents(c, string_as_leaf=False) == (c.content,)
True
>>> c = ak.from_iter([b'abc', b'de'], highlevel=False)
>>> get_contents(c)
()
>>> get_contents(c, bytestring_as_leaf=False) == (c.content,)
True

RecordArray / UnionArray: These have multiple inner contents. The get_contents() returns a tuple of the inner contents:

>>> from awkward.contents import RecordArray, UnionArray
>>> c = ak.zip({"x": [1, 2, 3], "y": [4.0, 5.0, 6.0]}, highlevel=False)
>>> isinstance(c, RecordArray)
True
>>> get_contents(c) == tuple(c.contents)
True
>>> c = ak.from_iter([0.0, [1, 2], 'three', 4.4, [5]], highlevel=False)
>>> isinstance(c, UnionArray)
True
>>> get_contents(c) == tuple(c.contents)
True

IndexedOptionArray / ByteMaskedArray / BitMaskedArray / UnmaskedArray: These have one inner content. The get_contents() returns a tuple with one element of the inner content:

>>> from awkward.contents import (
...     IndexedOptionArray,
...     ByteMaskedArray,
...     BitMaskedArray,
...     UnmaskedArray,
... )
>>> i = ak.from_iter([1, 2, 3, 4, 5, 6], highlevel=False)
>>> index = ak.index.Index64([0, -1, 2, -1, 4, 5])
>>> c = IndexedOptionArray(index, i)
>>> get_contents(c) == (i,)
True
>>> mask = ak.index.Index8([1, 0, 1, 0, 1, 1])
>>> c = ByteMaskedArray(mask, i, valid_when=True)
>>> get_contents(c) == (i,)
True
>>> bitmask = ak.index.IndexU8(np.array([0b10101100], dtype=np.uint8))
>>> c = BitMaskedArray(bitmask, i, valid_when=True, length=6, lsb_order=False)
>>> get_contents(c) == (i,)
True
>>> c = UnmaskedArray(i)
>>> get_contents(c) == (i,)
True

is_leaf ¶

is_leaf(
    c: Content,
    /,
    *,
    string_as_leaf: bool = True,
    bytestring_as_leaf: bool = True,
) -> bool

Return True if an ak.contents.Content is a leaf.

NumpyArray and EmptyArray are always leaves. String and bytestring list nodes are leaves only when the respective flag is set. Wrappers (RecordArray, UnionArray, option/masked types, non-string list types) are never leaves. Unknown types fall back to False.

Dispatch is performed with functools.singledispatch so support for a new Content subclass can be added by calling is_leaf.register without modifying this function.

Parameters:

Name	Type	Description	Default
`c`	`Content`	An Awkward `Content` node.	required
`string_as_leaf`	`bool`	If `True` (default), treat string `ListOffsetArray`/ `ListArray`/`RegularArray` nodes as leaves.	`True`
`bytestring_as_leaf`	`bool`	Same as `string_as_leaf` for bytestring nodes.	`True`

Returns:

Type	Description
`bool`	`True` if `c` is a leaf under the given flags, else `False`.

Examples:

>>> import numpy as np
>>> from awkward.contents import NumpyArray, RegularArray
>>> is_leaf(NumpyArray(np.array([1, 2, 3])))
True

A non-string RegularArray is not a leaf:

>>> c = RegularArray(NumpyArray(np.array([1, 2, 3, 4])), size=2)
>>> is_leaf(c)
False

is_string_or_bytestring_leaf ¶

is_string_or_bytestring_leaf(
    c: Content,
    string_as_leaf: bool = True,
    bytestring_as_leaf: bool = True,
) -> bool

Return True if an ak.contents.Content is string or bytestring.

Parameters:

Name	Type	Description	Default
`c`	`Content`	An Awkward `Content` node.	required
`string_as_leaf`	`bool`	If `True` (default), treat string content as a leaf.	`True`
`bytestring_as_leaf`	`bool`	If `True` (default), treat bytestring content as a leaf.	`True`

Returns:

Type	Description
`bool`	`True` if the content is a string or bytestring leaf.

is_string_leaf ¶

is_string_leaf(c: Content) -> bool

Return True if an ak.contents.Content is string.

Parameters:

Name	Type	Description	Default
`c`	`Content`	An Awkward `Content` node.	required

Returns:

Type	Description
`bool`	`True` if the content has `__array__` parameter `'string'`.

is_bytestring_leaf ¶

is_bytestring_leaf(c: Content) -> bool

Return True if an ak.contents.Content is bytestring.

Parameters:

Name	Type	Description	Default
`c`	`Content`	An Awkward `Content` node.	required

Returns:

Type	Description
`bool`	`True` if the content has `__array__` parameter `'bytestring'`.

iter_contents ¶

iter_contents(
    a: Array | Content,
    /,
    *,
    string_as_leaf: bool = True,
    bytestring_as_leaf: bool = True,
) -> Iterator[Content]

Iterate over all contents in an Awkward Array layout.

Parameters:

Name	Type	Description	Default
`a`	`Array \| Content`	An Awkward Array or Content.	required
`string_as_leaf`	`bool`	If `True` (default), treat string `ListOffsetArray`/`ListArray`/ `RegularArray` nodes as leaves — do not descend into the inner `NumpyArray(uint8)`.	`True`
`bytestring_as_leaf`	`bool`	If `True` (default), treat bytestring nodes as leaves.	`True`

Yields:

Type	Description
`Content`	Each content node in the layout.

iter_leaf_contents ¶

iter_leaf_contents(
    a: Array | Content,
    /,
    *,
    string_as_leaf: bool = True,
    bytestring_as_leaf: bool = True,
) -> Iterator[Content]

Iterate over all leaf contents in an Awkward Array layout.

Parameters:

Name	Type	Description	Default
`a`	`Array \| Content`	An Awkward Array or Content.	required
`string_as_leaf`	`bool`	If `True` (default), treat string `ListOffsetArray`/`ListArray`/ `RegularArray` nodes as leaves.	`True`
`bytestring_as_leaf`	`bool`	If `True` (default), treat bytestring nodes as leaves.	`True`

Yields:

Type	Description
`Content`	Each leaf content in the layout.

iter_numpy_arrays ¶

iter_numpy_arrays(
    a: Array | Content,
    /,
    *,
    exclude_string: bool = True,
    exclude_bytestring: bool = True,
) -> Iterator[np.ndarray]

Iterate over all NumPy arrays in an Awkward Array layout.

Parameters:

Name	Type	Description	Default
`a`	`Array \| Content`	An Awkward Array or Content.	required
`exclude_string`	`bool`	If `True` (default), exclude the inner `uint8` data of string nodes.	`True`
`exclude_bytestring`	`bool`	If `True` (default), exclude the inner `uint8` data of bytestring nodes.	`True`

Yields:

Type	Description
`ndarray`	Each underlying NumPy array in the layout.

Examples:

>>> a = ak.Array([[1.0, 2.0], [3.0]])
>>> list(iter_numpy_arrays(a))
[array([1., 2., 3.])]

>>> a = ak.Array([{'x': 1, 'y': 2.0}, {'x': 3, 'y': 4.0}])
>>> sorted([arr.dtype for arr in iter_numpy_arrays(a)], key=str)
[dtype('float64'), dtype('int64')]

leaf_size ¶

leaf_size(a: Array | Content) -> int

Count total leaf elements in an Awkward Array layout.

Each NumpyArray element counts as one. Each string and bytestring (not character or byte) counts as one. EmptyArray counts as zero.

Parameters:

Name	Type	Description	Default
`a`	`Array \| Content`	An Awkward Array or Content.	required

Returns:

Type	Description
`int`	Total number of leaf elements.

Examples:

>>> a = ak.Array([1, 2, 3])
>>> leaf_size(a)
3

>>> a = ak.Array([[1, 2], [3]])
>>> leaf_size(a)
3

>>> a = ak.Array(['hello', 'world'])
>>> leaf_size(a)
2

content_size ¶

content_size(a: Array | Content) -> int

Count total scalars stored in an Awkward Array layout.

Counts data elements, offset/index buffer elements, and metadata values (RegularArray.size, RecordArray field names).

Parameters:

Name	Type	Description	Default
`a`	`Array \| Content`	An Awkward Array or Content.	required

Returns:

Type	Description
`int`	Total number of scalars stored in the content tree.

Examples:

A flat array has content_size equal to its length:

>>> a = ak.Array([1, 2, 3])
>>> content_size(a)
3

A variable-length list array counts offsets (n+1) plus child data:

>>> a = ak.Array([[1, 2], [3]])
>>> content_size(a)  # 3 offsets + 3 data = 6
6

A string array counts offsets (n+1) plus UTF-8 bytes:

>>> a = ak.Array(['hello', 'world'])
>>> content_size(a)  # 3 offsets + 10 bytes = 13
13

content_own_size ¶

content_own_size(c: Content) -> int

Count the scalars owned directly by a Content node.

Counts the node's own buffer elements (offsets, starts/stops, mask, tags, index, numeric data) and metadata values (RegularArray.size, RecordArray field names, BitMaskedArray/ByteMaskedArray flags). Does not recurse into sub-contents — that is handled by content_size via get_contents.

Dispatch is performed with functools.singledispatch so support for a new Content subclass can be added by calling content_own_size.register without modifying this function.

Parameters:

Name	Type	Description	Default
`c`	`Content`	An Awkward `Content` node.	required

Returns:

Type	Description
`int`	Number of scalars stored directly on `c`, excluding sub-contents.

Examples:

>>> import numpy as np
>>> from awkward.contents import NumpyArray, RegularArray
>>> c = NumpyArray(np.array([1, 2, 3]))
>>> content_own_size(c)
3

The size metadata of a RegularArray counts as one; the inner NumpyArray data is not included here:

>>> c = RegularArray(NumpyArray(np.array([1, 2, 3, 4])), size=2)
>>> content_own_size(c)
1

any_nan_nat_in_numpy_array ¶

any_nan_nat_in_numpy_array(n: ndarray) -> bool

True if NumPy array contains any NaN or NaT values, else False.

Parameters:

Name	Type	Description	Default
`n`	`ndarray`	A NumPy array.	required

Returns:

Type	Description
`bool`	`True` if `n` contains any `NaN` or `NaT` values, else `False`.

Examples:

>>> n = np.array([1.0, 2.0, np.nan])
>>> any_nan_nat_in_numpy_array(n)
True

>>> n = np.array([1.0, 2.0, 3.0])
>>> any_nan_nat_in_numpy_array(n)
False

>>> n = np.array(
...     [(1, np.datetime64('2020-01-01')), (2, np.datetime64('NaT'))],
...     dtype=[('a', 'i4'), ('b', 'M8[D]')],
... )
>>> any_nan_nat_in_numpy_array(n)
True

any_nan_in_numpy_array ¶

any_nan_in_numpy_array(n: ndarray) -> bool

True if NumPy array contains any NaN values, else False.

Parameters:

Name	Type	Description	Default
`n`	`ndarray`	A NumPy array.	required

Returns:

Type	Description
`bool`	`True` if `n` contains any `NaN` values, else `False`.

Examples:

>>> n = np.array([1.0, 2.0, np.nan])
>>> any_nan_in_numpy_array(n)
True

>>> n = np.array([1.0, 2.0, 3.0])
>>> any_nan_in_numpy_array(n)
False

>>> n = np.array([(1, 2.0), (2, np.nan)], dtype=[('a', 'i4'), ('b', 'f8')])
>>> any_nan_in_numpy_array(n)
True

any_nat_in_numpy_array ¶

any_nat_in_numpy_array(n: ndarray) -> bool

True if NumPy array contains any NaT values, else False.

Parameters:

Name	Type	Description	Default
`n`	`ndarray`	A NumPy array.	required

Returns:

Type	Description
`bool`	`True` if `n` contains any `NaT` values, else `False`.

Examples:

>>> n = np.array(['2020-01-01', 'NaT'], dtype='datetime64[D]')
>>> any_nat_in_numpy_array(n)
True

>>> n = np.array(['2020-01-01', '2020-01-02'], dtype='datetime64[D]')
>>> any_nat_in_numpy_array(n)
False

>>> n = np.array(
...     [(1, np.datetime64('2020-01-01')), (2, np.datetime64('NaT'))],
...     dtype=[('a', 'i4'), ('b', 'M8[D]')],
... )
>>> any_nat_in_numpy_array(n)
True

simple_dtypes_in ¶

simple_dtypes_in(d: dtype) -> set[np.dtype]

Return simple dtypes contained in a (compound) dtype d.

Parameters:

Name	Type	Description	Default
`d`	`dtype`	A NumPy dtype. It can be a sub-array or structured dtype as well as a simple dtype.	required

Returns:

Type	Description
`set of np.dtype`	Simple dtypes contained in `d`.

Examples:

>>> simple_dtypes_in(np.dtype('int32'))
{dtype('int32')}

>>> sorted(simple_dtypes_in(np.dtype([('f0', 'i4'), ('f1', 'f8')])))
[dtype('int32'), dtype('float64')]

simple_dtype_kinds_in ¶

simple_dtype_kinds_in(d: dtype) -> set[str]

Return character codes of simple dtypes contained in a (compound) dtype d.

Parameters:

Name	Type	Description	Default
`d`	`dtype`	A NumPy dtype. It can be a sub-array or structured dtype as well as a simple dtype.	required

Returns:

Type	Description
`set of str`	Character codes of simple dtypes contained in `d`.

Examples:

>>> simple_dtype_kinds_in(np.dtype('int32'))
{'i'}

>>> sorted(simple_dtype_kinds_in(np.dtype([('f0', 'i4'), ('f1', 'f8')])))
['f', 'i']

n_scalars_in ¶

n_scalars_in(d: dtype) -> int

Return the number of scalar values contained in a value of dtype d.

Parameters:

Name	Type	Description	Default
`d`	`dtype`	A NumPy dtype. It can be a sub-array or structured dtype as well as a simple dtype.	required

Returns:

Type	Description
`int`	The number of scalar values contained in a value of dtype `d`.

Examples:

>>> n_scalars_in(np.dtype('int32'))
1

>>> n_scalars_in(np.dtype(('int32', (3, 4))))
12

>>> n_scalars_in(np.dtype([('f0', 'i4'), ('f1', ('f8', (2,)))]))
3

safe_compare ¶

safe_compare(value: T | None) -> T | GreaterAndLessThanAny

Return value if not None, else an object true for all comparisons.

This function helps you concisely write assertions that compare values that may be None.

Parameters:

Name	Type	Description	Default
`value`	`T \| None`	A value or `None`.	required

Examples:

Suppose you have min_ and max_ that may be None

>>> import random
>>> min_ = random.choice([None, 1])
>>> max_ = random.choice([None, 3])

and val that should be in the range [min_, max_]:

>>> val = 2

Without this function, you need to check if min_ and max_ are None.

>>> if min_ is not None:
...     assert min_ <= val

>>> if max_ is not None:
...     assert val <= max_

This function lets you write the same assertion in one line:

>>> assert safe_compare(min_) <= val <= safe_compare(max_)

safe_max ¶

safe_max(
    vals: Iterable[T], default: Optional[T] = None
) -> Optional[T]

The largest item in vals that is not None.

Parameters:

Name	Type	Description	Default
`vals`	`Iterable[T]`	An iterable of values.	required
`default`	`Optional[T]`	The value to return if `vals` is empty or all items are `None`.	`None`

Examples:

>>> safe_max([None, 1, 2, None])
2

It returns None if vals is empty or all items in vals are None.

>>> print(safe_max([None, None]))
None

>>> print(safe_max([]))
None

If default is given, it returns default instead of None.

>>> safe_max([None, None], default=-1)
-1

>>> safe_max([], default=-1)
-1

safe_min ¶

safe_min(
    vals: Iterable[T], default: Optional[T] = None
) -> Optional[T]

The smallest item in vals that is not None.

Parameters:

Name	Type	Description	Default
`vals`	`Iterable[T]`	An iterable of values.	required
`default`	`Optional[T]`	The value to return if `vals` is empty or all items are `None`.	`None`

Examples:

>>> safe_min([None, 1, 2, None])
1

It returns None if vals is empty or all items in vals are None.

>>> print(safe_min([None, None]))
None

>>> print(safe_min([]))
None

If default is given, it returns default instead of None.

>>> safe_min([None, None], default=-1)
-1

>>> safe_min([], default=-1)
-1

CountdownDrawer ¶

CountdownDrawer(
    draw: DrawFn,
    st_: _StWithMinMaxSize[_T],
    min_size_each: int = 0,
    max_size_each: int | None = None,
    min_size_total: int = 0,
    max_size_total: int = 10,
    max_draws: int = 100,
) -> Callable[[], _T | None]

Create a draw function with a shared element budget.

Each call draws from st_ and adds the length of the result to a running total. Returns None once the budget is exhausted, too small to satisfy min_size_each, or the draw limit is reached.

Parameters:

Name	Type	Description	Default
`draw`	`DrawFn`	The Hypothesis draw function.	required
`st_`	`_StWithMinMaxSize[_T]`	A callable that accepts `min_size` and `max_size` keyword arguments and returns a strategy.	required
`min_size_each`	`int`	Minimum number of elements in each draw.	`0`
`max_size_each`	`int \| None`	Maximum number of elements in each draw. If `None`, only `max_size_total` limits the size.	`None`
`min_size_total`	`int`	Minimum total elements across all draws.	`0`
`max_size_total`	`int`	Total element budget shared across all draws.	`10`
`max_draws`	`int`	Maximum number of non-None draws.	`100`