Skip to content

Constructors

The main strategy arrays().

The function arrays() is the main strategy of this package. It generates Awkward Arrays with multiple options to control the layout, data types, missing values, masks, and other array attributes.

Functions:

Name Description
arrays

Strategy for Awkward Arrays.

arrays

arrays(
    draw: DrawFn,
    *,
    dtypes: SearchStrategy[dtype] | None = None,
    max_size: int = 50,
    allow_nan: bool = True,
    allow_numpy: bool = True,
    allow_empty: bool = True,
    allow_string: bool = True,
    allow_bytestring: bool = True,
    allow_regular: bool = True,
    allow_list_offset: bool = True,
    allow_list: bool = True,
    allow_record: bool = True,
    allow_union: bool = True,
    allow_indexed_option: bool = True,
    allow_byte_masked: bool = True,
    allow_bit_masked: bool = True,
    allow_unmasked: bool = True,
    max_leaf_size: int | None = None,
    max_depth: int | None = None,
    max_length: int | None = None,
    allow_virtual: bool = True,
) -> ak.Array

Strategy for Awkward Arrays.

This is the main strategy in this package. It is under development. The aim is to generate fully general Awkward Arrays, with many options to control layout, data types, missing values, masks, and other array attributes.

In constructing arrays, this strategy follows the Awkward Array User Guide section "Direct constructors". It constructs layouts and wraps them in an ak.Array. The layouts are instances of subclasses of ak.contents.Content.

By default, when called with no arguments, arrays() generates the most general arrays currently implemented, subject to a finite maximum size. Arguments can be provided to exclude certain layouts or data types, or to constrain values and sizes.

The current implementation generates arrays with the following layouts:

  • EmptyArray
  • NumpyArray
  • RegularArray
  • ListArray
  • ListOffsetArray
  • Strings
  • Bytestrings
  • RecordArray
  • IndexedOptionArray
  • ByteMaskedArray
  • BitMaskedArray
  • UnmaskedArray
  • UnionArray

Each type can be excluded separately with the corresponding allow_* argument.

The max_size is the main argument for constraining the array size. It counts most of the scalar values in the layout, including data elements, offsets, indices, field names, and parameters. The array size can also be constrained with max_leaf_size, max_depth, and max_length.

The arrays() randomly generates virtual arrays by lazifying buffers. The allow_virtual can be used to disable virtual arrays.

Parameters:

Name Type Description Default
dtypes SearchStrategy[dtype] | None

A strategy for NumPy scalar dtypes used in NumpyArray. If None, the default strategy that generates any scalar dtype supported by Awkward Array is used. Does not affect string or bytestring content.

None
max_size int

Upper bound on the number of scalars in the generated content. Counts data elements, offsets, indices, field names, and parameters.

50
allow_nan bool

No NaN/NaT values are generated in NumpyArray if False.

True
allow_numpy bool

No NumpyArray is generated if False.

True
allow_empty bool

No EmptyArray is generated if False. EmptyArray has Awkward type unknown and carries no data. Unlike NumpyArray, it is unaffected by dtypes and allow_nan.

True
allow_string bool

No string content is generated if False. A string is represented as a ListOffsetArray wrapping a NumpyArray(uint8). Each character (uint8) and offset in the ListOffsetArray counts toward max_size. A string is considered a single leaf element in counting toward max_leaf_size and max_depth. Each string (not character) counts toward max_leaf_size. A string does not count toward max_depth. Unaffected by dtypes and allow_nan.

True
allow_bytestring bool

No bytestring content is generated if False. A bytestring is represented as a ListOffsetArray wrapping a NumpyArray(uint8). Each byte (uint8) and offset in the ListOffsetArray counts toward max_size. A bytestring is considered a single leaf element in counting toward max_leaf_size and max_depth. Each bytestring (not byte) counts toward max_leaf_size. A bytestring does not count toward max_depth. Unaffected by dtypes and allow_nan.

True
allow_regular bool

No RegularArray is generated if False.

True
allow_list_offset bool

No ListOffsetArray is generated if False.

True
allow_list bool

No ListArray is generated if False.

True
allow_record bool

No RecordArray is generated if False.

True
allow_union bool

No UnionArray is generated if False.

True
allow_indexed_option bool

No IndexedOptionArray is generated if False.

True
allow_byte_masked bool

No ByteMaskedArray is generated if False.

True
allow_bit_masked bool

No BitMaskedArray is generated if False.

True
allow_unmasked bool

No UnmaskedArray is generated if False.

True
max_leaf_size int | None

Maximum total number of leaf elements in the generated content. Each numerical value, including complex and datetime, counts as one. Each string and bytestring (not character or byte) counts as one.

None
max_depth int | None

Maximum nesting depth. Each RegularArray, ListOffsetArray, ListArray, RecordArray, and UnionArray layer adds one level, excluding those that form string or bytestring content. No constraint when None (the default).

None
max_length int | None

Maximum len() of the generated array. No constraint when None (the default).

None
allow_virtual bool

No virtual arrays are generated if False.

True

Examples:

>>> arrays().example()
<Array ... type='...'>