Developer Guide

Repository Layout

ragged/
├── src/ragged/
│   ├── __init__.py                       # public API / __all__
│   ├── _spec_array_object.py             # array class + helpers
│   ├── _spec_creation_functions.py       # zeros, ones, arange, …
│   ├── _spec_elementwise_functions.py    # sqrt, add, sin, …
│   ├── _spec_manipulation_functions.py   # reshape, roll, stack, …
│   ├── _spec_linear_algebra_functions.py # matmul, tensordot, vecdot
│   ├── _spec_statistical_functions.py    # sum, mean, std, …
│   ├── _spec_searching_functions.py      # argmax, nonzero, where, …
│   ├── _spec_sorting_functions.py        # sort, argsort
│   ├── _spec_set_functions.py            # unique_*
│   ├── _spec_indexing_functions.py       # take
│   ├── _spec_data_type_functions.py      # astype, can_cast, …
│   ├── _spec_constants.py                # e, pi, inf, nan, newaxis
│   ├── _typing.py                        # type aliases (Shape, Dtype, …)
│   ├── _import.py                        # lazy cupy import helper
│   ├── _helper_functions.py              # shared internal utilities
│   └── io/
│       ├── __init__.py
│       └── cf.py                         # CF Conventions I/O
├── docs/
│   ├── conf.py                           # Sphinx config (MyST + autodoc)
│   ├── index.md
│   ├── user_guide.rst
│   └── dev_guide.rst
├── tests/
│   ├── conftest.py
│   ├── test_spec_*.py                    # spec-driven test suites
│   └── test_*.py                         # feature-specific suites
├── pyproject.toml
└── noxfile.py

Each _spec_*.py module corresponds to one section of the Array API specification. Module names mirror the spec URL slugs intentionally so grep-based cross-referencing is easy.

The array Class

The array class (lower-case, matching the Array API convention) lives in _spec_array_object.py.

Instance attributes

Every array instance carries exactly four private attributes:

Attribute

Type

Description

_impl

ak.Array | numpy.ndarray | cupy.ndarray

The underlying data buffer. Almost always ak.Array; scalar (0-D) arrays may hold a raw numpy.ndarray.

_shape

tuple[int | None, ...]

Cached shape. Computed once by _shape_dtype and kept in sync manually after any mutation.

_dtype

numpy.dtype

Cached dtype. Derived from the leaf NumpyArray inside the layout.

_device

"cpu" | "cuda"

String identifier for the compute backend.

These are not part of the public API. Read them only inside _spec_*.py modules; external code should use the .shape, .dtype, .device properties.

_new class method

@classmethod
def _new(cls, impl, shape, dtype, device) -> array:

A fast constructor that bypasses __init__ validation. Used in hot paths such as __iter__ where shape and dtype are already known. Do not call it from user-facing code.

Layout Types and _shape_dtype

Internally, _impl is an ak.Array whose layout is one of a small set of Awkward Array content types:

Layout class

Meaning for ragged.array

NumpyArray

1-D (or packed N-D) contiguous numeric data. shape has no None entries. ak.to_numpy always succeeds.

RegularArray

Fixed inner dimension. Produced by ak.from_numpy on an N-D array. shape has no None entries.

ListOffsetArray

Variable-length rows (the common case for user-constructed arrays). May be truly ragged (different row lengths → shape[i] == None) or incidentally uniform (all rows same length but still ListOffsetArray). ak.to_numpy succeeds iff all rows are the same length.

_shape_dtype(layout) walks the layout tree once to extract shape and dtype:

# simplified pseudocode
def _shape_dtype(layout):
    shape = (len(layout),)
    node = layout
    while isinstance(node, ListOffsetArray | RegularArray | ListArray):
        shape += (node.size if RegularArray else None,)
        node = node.content
    # node is now NumpyArray
    return shape + node.data.shape[1:], node.data.dtype

Key rule: call _shape_dtype only when the layout actually changes. After ak.values_astype (dtype cast only), shape is unchanged — update _dtype directly without re-traversing the layout.

The Box / Unbox Pattern

Every function that consumes or produces ragged.array objects uses two module-level helpers to move between the public type and its ak.Array implementation:

_unbox(*inputs)

Extract ._impl from each input array. Raises TypeError on mixed array subclasses or device mismatches.

(impl,) = _unbox(x)
left_impl, right_impl = _unbox(a, b)
_box(cls, output, *, dtype=None, device=None)

Wrap an ak.Array result back into a ragged.array (or subclass). Calls _shape_dtype to populate _shape and _dtype.

return _box(type(x), some_ak_array)

Always use type(x) (not array) as the first argument to _box so that subclasses round-trip correctly.

Writing a New Function

The following checklist applies to any new Array API function or extension.

  1. Choose the right module — pick the _spec_*.py file whose name matches the spec section the function belongs to.

  2. Signature — match the Array API signature exactly (keyword-only arguments, / positional-only markers):

    def my_func(x: array, /, *, axis: int | None = None) -> array:
    
  3. Unbox inputs:

    (impl,) = _unbox(x)
    
  4. Fast path for uniform arrays — wrap the numpy equivalent in contextlib.suppress(TypeError, ValueError) and try ak.to_numpy:

    with contextlib.suppress(TypeError, ValueError):
        np_arr = ak.to_numpy(impl)
        return _box(type(x), ak.from_numpy(np.my_func(np_arr)))
    

    ak.to_numpy succeeds for NumpyArray, RegularArray, and ListOffsetArray with uniform row lengths. It raises TypeError or ValueError for genuinely ragged arrays.

  5. Ragged / general path — implement using Awkward Array primitives (ak.flatten, ak.unflatten, ak.num, etc.) where possible. Use tolist() / list-based fallback only as a last resort for complex shapes that have no efficient awkward equivalent.

  6. Box the result:

    return _box(type(x), result_ak)
    
  7. Export — add the function name to __init__.py’s __all__ list and the relevant import block.

  8. Docstring — include the Array API URL:

    """
    Short description.
    
    https://data-apis.org/array-api/latest/API_specification/generated/array_api.my_func.html
    """
    
  9. Tests — add a tests/test_<feature>.py file (see Testing Conventions).

Testing Conventions

Structure

Tests are grouped by feature in tests/test_<feature>.py. Within each file, group related cases into classes:

class TestMyFunc1D:
    def test_basic(self): ...
    def test_dtype_preserved(self): ...

class TestMyFunc2DRagged:
    def test_integer_index(self): ...

Helper

Every test file should define a local factory to avoid repeating ragged.array(...):

def _make(data, dtype=None) -> ragged.array:
    return ragged.array(data, dtype=dtype)

Coverage checklist

For each new function, cover:

  • 1-D uniform input

  • 2-D uniform input (created from np.ndarray)

  • 2-D ragged input (created from Python lists)

  • dtype preservation (np.float32 should stay np.float32)

  • result type (isinstance(result, ragged.array))

  • error cases (wrong shape, wrong dtype, unsupported key type, …)

  • copy / isolation (mutations via __setitem__ or .at do not affect the original)

Running tests

pip install -e ".[test]"
pytest tests/

With coverage:

pytest tests/ --cov=ragged --cov-report=term-missing

The full test matrix (multiple Python / NumPy versions) is run via nox:

nox

Performance Patterns

The following patterns are used consistently throughout the codebase. New code should follow them.

Single try/except for fast-path detection

Determine whether an array is uniform by probing ak.to_numpy once, in a single try/except. Do not wrap value unwrapping in a separate try/except; instead, branch on the result of the single probe:

try:
    arr_np = ak.to_numpy(self._impl)
except (TypeError, ValueError):
    arr_np = None

if arr_np is not None:
    # fast path — unwrap value as numpy
    val = ak.to_numpy(value._impl) if isinstance(value, array) else value
    ...
else:
    # slow path — unwrap value as list
    val = value._impl.tolist() if isinstance(value, array) else value
    ...

Do not use isinstance(layout, NumpyArray | RegularArray) as the sole fast-path gate — it misses ListOffsetArray arrays with incidentally uniform rows (common when the user constructs from Python lists).

Avoid full tolist() in ragged paths

Prefer iterating over ak.Array sub-blocks and calling ak.to_numpy per sub-block over calling ak.to_list() on the whole array. ak.to_list allocates a Python object for every scalar; sub-block ak.to_numpy stays in C for uniform chunks.

# Preferred
def _process(a: ak.Array, b: ak.Array) -> Any:
    try:
        return np_func(ak.to_numpy(a), ak.to_numpy(b))
    except (TypeError, ValueError):
        pass
    return [_process(
        ai if isinstance(ai, ak.Array) else ak.Array(ai),
        bi if isinstance(bi, ak.Array) else ak.Array(bi),
    ) for ai, bi in zip(a, b, strict=False)]

O(D) layout walks for nested structures

When restoring a nested structure from a flat array (e.g. after ak.flatten(axis=None)), collect counts at each nesting level with a single top-down walk rather than calling ak.num(impl, axis=depth) from the root for each depth:

# O(D) — peel one level at a time
level_counts: list[np.ndarray] = []
cur = impl
for _ in range(ndim - 1):
    level_counts.append(ak.to_numpy(ak.num(cur, axis=1)))
    cur = ak.flatten(cur, axis=1)

result = flat_rolled
for counts in reversed(level_counts):
    result = ak.unflatten(result, counts)

Shape is invariant under ak.values_astype

After a dtype cast via ak.values_astype, shape does not change. Update _dtype directly instead of re-running _shape_dtype:

self._impl = ak.values_astype(self._impl, new_dtype)
self._dtype = new_dtype   # shape unchanged — no _shape_dtype call needed

Zero-copy dummies for broadcast helpers

When a dummy array is needed only to drive ak.broadcast_arrays (its values are discarded), use a zero-copy broadcast view instead of allocating a full array:

dummy = ak.from_numpy(np.broadcast_to(np.zeros((), dtype=np.int8), target_shape))

_apply_inplace and in-place operators

_apply_inplace copies _impl, _shape, _dtype, and _device directly from the already-computed result — it does not call _shape_dtype again. This is safe because all in-place operators (__iadd__, etc.) are elementwise and therefore shape-preserving.

Awkward Array Gotchas

ak.from_numpy on N-D arrays

ak.from_numpy on a 2-D (or higher) NumPy array produces a NumpyArray layout, not a ListOffsetArray. The resulting ragged.array will have a concrete integer for the inner dimension (e.g. shape == (3, 4)).

However, the helper _ak_from_numpy (defined in _spec_manipulation_functions.py) calls ak.from_regular(..., axis=None) afterwards to convert every regular dimension to variable-length. Use it when the ragged convention (shape[-1] == None) is required:

from ._spec_manipulation_functions import _ak_from_numpy
impl = _ak_from_numpy(np_result)

ak.flatten(axis=1) is O(1) for ListOffsetArray

Peeling one nesting level with ak.flatten(impl, axis=1) returns the content buffer of the outer ListOffsetArray — it does not copy data. This makes iterative level-peeling (as in the roll axis=None path) effectively O(1) per level.

ak.to_numpy on uniform ListOffsetArray

ak.to_numpy succeeds on a ListOffsetArray whose rows all have the same length — it is not restricted to NumpyArray or RegularArray layouts. This is why the fast-path probe uses try/except rather than a layout isinstance check.