Developer Guide¶
Repository Layout¶
ragged/
├── src/ragged/
│ ├── __init__.py # public API / __all__
│ ├── _spec_array_object.py # array class + helpers
│ ├── _spec_creation_functions.py # zeros, ones, arange, …
│ ├── _spec_elementwise_functions.py # sqrt, add, sin, …
│ ├── _spec_manipulation_functions.py # reshape, roll, stack, …
│ ├── _spec_linear_algebra_functions.py # matmul, tensordot, vecdot
│ ├── _spec_statistical_functions.py # sum, mean, std, …
│ ├── _spec_searching_functions.py # argmax, nonzero, where, …
│ ├── _spec_sorting_functions.py # sort, argsort
│ ├── _spec_set_functions.py # unique_*
│ ├── _spec_indexing_functions.py # take
│ ├── _spec_data_type_functions.py # astype, can_cast, …
│ ├── _spec_constants.py # e, pi, inf, nan, newaxis
│ ├── _typing.py # type aliases (Shape, Dtype, …)
│ ├── _import.py # lazy cupy import helper
│ ├── _helper_functions.py # shared internal utilities
│ └── io/
│ ├── __init__.py
│ └── cf.py # CF Conventions I/O
├── docs/
│ ├── conf.py # Sphinx config (MyST + autodoc)
│ ├── index.md
│ ├── user_guide.rst
│ └── dev_guide.rst
├── tests/
│ ├── conftest.py
│ ├── test_spec_*.py # spec-driven test suites
│ └── test_*.py # feature-specific suites
├── pyproject.toml
└── noxfile.py
Each _spec_*.py module corresponds to one section of the
Array API specification. Module
names mirror the spec URL slugs intentionally so grep-based cross-referencing
is easy.
The array Class¶
The array class (lower-case, matching the Array API convention) lives in
_spec_array_object.py.
Instance attributes¶
Every array instance carries exactly four private attributes:
Attribute |
Type |
Description |
|---|---|---|
|
|
The underlying data buffer. Almost always |
|
|
Cached shape. Computed once by |
|
|
Cached dtype. Derived from the leaf |
|
|
String identifier for the compute backend. |
These are not part of the public API. Read them only inside _spec_*.py
modules; external code should use the .shape, .dtype, .device
properties.
_new class method¶
@classmethod
def _new(cls, impl, shape, dtype, device) -> array:
A fast constructor that bypasses __init__ validation. Used in hot paths
such as __iter__ where shape and dtype are already known. Do not call it
from user-facing code.
Layout Types and _shape_dtype¶
Internally, _impl is an ak.Array whose layout is one of a small
set of Awkward Array content types:
Layout class |
Meaning for |
|---|---|
|
1-D (or packed N-D) contiguous numeric data. |
|
Fixed inner dimension. Produced by |
|
Variable-length rows (the common case for user-constructed arrays).
May be truly ragged (different row lengths → |
_shape_dtype(layout) walks the layout tree once to extract shape and
dtype:
# simplified pseudocode
def _shape_dtype(layout):
shape = (len(layout),)
node = layout
while isinstance(node, ListOffsetArray | RegularArray | ListArray):
shape += (node.size if RegularArray else None,)
node = node.content
# node is now NumpyArray
return shape + node.data.shape[1:], node.data.dtype
Key rule: call _shape_dtype only when the layout actually changes.
After ak.values_astype (dtype cast only), shape is unchanged — update
_dtype directly without re-traversing the layout.
The Box / Unbox Pattern¶
Every function that consumes or produces ragged.array objects uses two
module-level helpers to move between the public type and its ak.Array
implementation:
_unbox(*inputs)Extract
._implfrom each inputarray. RaisesTypeErroron mixed array subclasses or device mismatches.(impl,) = _unbox(x) left_impl, right_impl = _unbox(a, b)
_box(cls, output, *, dtype=None, device=None)Wrap an
ak.Arrayresult back into aragged.array(or subclass). Calls_shape_dtypeto populate_shapeand_dtype.return _box(type(x), some_ak_array)
Always use type(x) (not array) as the first argument to _box so
that subclasses round-trip correctly.
Writing a New Function¶
The following checklist applies to any new Array API function or extension.
Choose the right module — pick the
_spec_*.pyfile whose name matches the spec section the function belongs to.Signature — match the Array API signature exactly (keyword-only arguments,
/positional-only markers):def my_func(x: array, /, *, axis: int | None = None) -> array:
Unbox inputs:
(impl,) = _unbox(x)
Fast path for uniform arrays — wrap the numpy equivalent in
contextlib.suppress(TypeError, ValueError)and tryak.to_numpy:with contextlib.suppress(TypeError, ValueError): np_arr = ak.to_numpy(impl) return _box(type(x), ak.from_numpy(np.my_func(np_arr)))
ak.to_numpysucceeds forNumpyArray,RegularArray, andListOffsetArraywith uniform row lengths. It raisesTypeErrororValueErrorfor genuinely ragged arrays.Ragged / general path — implement using Awkward Array primitives (
ak.flatten,ak.unflatten,ak.num, etc.) where possible. Usetolist()/ list-based fallback only as a last resort for complex shapes that have no efficient awkward equivalent.Box the result:
return _box(type(x), result_ak)
Export — add the function name to
__init__.py’s__all__list and the relevant import block.Docstring — include the Array API URL:
""" Short description. https://data-apis.org/array-api/latest/API_specification/generated/array_api.my_func.html """
Tests — add a
tests/test_<feature>.pyfile (see Testing Conventions).
Testing Conventions¶
Structure¶
Tests are grouped by feature in tests/test_<feature>.py. Within each
file, group related cases into classes:
class TestMyFunc1D:
def test_basic(self): ...
def test_dtype_preserved(self): ...
class TestMyFunc2DRagged:
def test_integer_index(self): ...
Helper¶
Every test file should define a local factory to avoid repeating
ragged.array(...):
def _make(data, dtype=None) -> ragged.array:
return ragged.array(data, dtype=dtype)
Coverage checklist¶
For each new function, cover:
1-D uniform input
2-D uniform input (created from
np.ndarray)2-D ragged input (created from Python lists)
dtype preservation (
np.float32should staynp.float32)result type (
isinstance(result, ragged.array))error cases (wrong shape, wrong dtype, unsupported key type, …)
copy / isolation (mutations via
__setitem__or.atdo not affect the original)
Running tests¶
pip install -e ".[test]"
pytest tests/
With coverage:
pytest tests/ --cov=ragged --cov-report=term-missing
The full test matrix (multiple Python / NumPy versions) is run via nox:
nox
Performance Patterns¶
The following patterns are used consistently throughout the codebase. New code should follow them.
Single try/except for fast-path detection¶
Determine whether an array is uniform by probing ak.to_numpy once, in a
single try/except. Do not wrap value unwrapping in a separate
try/except; instead, branch on the result of the single probe:
try:
arr_np = ak.to_numpy(self._impl)
except (TypeError, ValueError):
arr_np = None
if arr_np is not None:
# fast path — unwrap value as numpy
val = ak.to_numpy(value._impl) if isinstance(value, array) else value
...
else:
# slow path — unwrap value as list
val = value._impl.tolist() if isinstance(value, array) else value
...
Do not use isinstance(layout, NumpyArray | RegularArray) as the sole
fast-path gate — it misses ListOffsetArray arrays with incidentally
uniform rows (common when the user constructs from Python lists).
Avoid full tolist() in ragged paths¶
Prefer iterating over ak.Array sub-blocks and calling ak.to_numpy
per sub-block over calling ak.to_list() on the whole array.
ak.to_list allocates a Python object for every scalar; sub-block
ak.to_numpy stays in C for uniform chunks.
# Preferred
def _process(a: ak.Array, b: ak.Array) -> Any:
try:
return np_func(ak.to_numpy(a), ak.to_numpy(b))
except (TypeError, ValueError):
pass
return [_process(
ai if isinstance(ai, ak.Array) else ak.Array(ai),
bi if isinstance(bi, ak.Array) else ak.Array(bi),
) for ai, bi in zip(a, b, strict=False)]
O(D) layout walks for nested structures¶
When restoring a nested structure from a flat array (e.g. after
ak.flatten(axis=None)), collect counts at each nesting level with a
single top-down walk rather than calling ak.num(impl, axis=depth)
from the root for each depth:
# O(D) — peel one level at a time
level_counts: list[np.ndarray] = []
cur = impl
for _ in range(ndim - 1):
level_counts.append(ak.to_numpy(ak.num(cur, axis=1)))
cur = ak.flatten(cur, axis=1)
result = flat_rolled
for counts in reversed(level_counts):
result = ak.unflatten(result, counts)
Shape is invariant under ak.values_astype¶
After a dtype cast via ak.values_astype, shape does not change. Update
_dtype directly instead of re-running _shape_dtype:
self._impl = ak.values_astype(self._impl, new_dtype)
self._dtype = new_dtype # shape unchanged — no _shape_dtype call needed
Zero-copy dummies for broadcast helpers¶
When a dummy array is needed only to drive ak.broadcast_arrays (its
values are discarded), use a zero-copy broadcast view instead of allocating
a full array:
dummy = ak.from_numpy(np.broadcast_to(np.zeros((), dtype=np.int8), target_shape))
_apply_inplace and in-place operators¶
_apply_inplace copies _impl, _shape, _dtype, and _device
directly from the already-computed result — it does not call _shape_dtype
again. This is safe because all in-place operators (__iadd__, etc.) are
elementwise and therefore shape-preserving.
Awkward Array Gotchas¶
ak.from_numpy on N-D arrays¶
ak.from_numpy on a 2-D (or higher) NumPy array produces a
NumpyArray layout, not a ListOffsetArray. The resulting
ragged.array will have a concrete integer for the inner dimension
(e.g. shape == (3, 4)).
However, the helper _ak_from_numpy (defined in
_spec_manipulation_functions.py) calls ak.from_regular(..., axis=None)
afterwards to convert every regular dimension to variable-length. Use it
when the ragged convention (shape[-1] == None) is required:
from ._spec_manipulation_functions import _ak_from_numpy
impl = _ak_from_numpy(np_result)
ak.flatten(axis=1) is O(1) for ListOffsetArray¶
Peeling one nesting level with ak.flatten(impl, axis=1) returns the
content buffer of the outer ListOffsetArray — it does not copy data.
This makes iterative level-peeling (as in the roll axis=None path)
effectively O(1) per level.
ak.to_numpy on uniform ListOffsetArray¶
ak.to_numpy succeeds on a ListOffsetArray whose rows all have the
same length — it is not restricted to NumpyArray or RegularArray
layouts. This is why the fast-path probe uses try/except rather than a
layout isinstance check.