Description of datasets (skhep.dataset)

Subpackage for the description of datasets.

Module for NumpyDataset

The NumpyDataset class is the implementation of the Dataset abstract base class for the [NumPy] package.

Note: usage of course requires that NumPy is installed.

References

[NumPy]http://www.numpy.org/.
class skhep.dataset.numpydataset.NumpyDataset(data, provenance=None, **options)
copy()

Get a copy of the NumpyDataset.

datashape

Every dataset has a datashape, which describes its data types in a unified way.

dicttoarray()

Convert a dictionnary into a structured array. If using Python3, byte keys are decoded into string.

static from_file(files, **options)

Load a dataset from a file or collection of files.

Recognizes zipped Numpy (.npz) format.

Parameters:
  • files (a string file name (glob pattern), iterable of string file names, or an iterable of files open for reading (binary)) –
  • options – columns: a set of columns to select from the files.
immutable

If True, this dataset cannot be modified in place, only transformed.

Opposite of mutable.

keys()

Get the list of keys in the NumpyDataset (same as ‘variables’).

nentries

Get the number of entries in the NumpyDataset. Same as ‘nevents’

nevents

Get the number of events in the NumpyDataset.

persistent

If True, this dataset exists in a form that survives the Python session, such as a file or database.

If mutable, changes in this dataset are reflected in that persistent form.

Opposite of transient.

select(selection=None)

Apply a selection to the NumpyDataset.

Parameters:selection (Selection or str) –
Returns:
Return type:new NumpyDataset after selection
to_file(base, **options)

Save this dataset to a file or collection of files.

base: str or iterable of str
String file name or iterable of string file names.
Parameters
options: none
to_tree(treename, **options)

Copy this dataset into a new ROOTDataset, without sharing any underlying data.

treename: str
Name of ROOT TTree to be created.

options: none

Returns:
Return type:ROOTDataset holding new ROOT TTree.
variables

Get the list of variables in the NumpyDataset, i.e. the content of ‘numpy.dtype.names’ of the stored NumPy array.

class skhep.dataset.numpydataset.SkhepNumpyArray
copy()

Get a copy of the SkhepNumpyArray.

name

Return the name of the variable inside the SkhepNumpyArray.

provenance

Return the provenance of the SkhepNumpyArray.

Module for ROOTDataset

The ROOTDataset class is the implementation of the Dataset abstract base class for the [ROOT] package.

Note: usage of course requires that ROOT is installed.

References

[ROOT]https://root.cern.ch/.
class skhep.dataset.rootdataset.ROOTDataset(data, provenance=None)
datashape

Every dataset has a datashape, which describes its data types in a unified way.

static from_file(files, **options)

Load a dataset from a file or collection of files.

files: string file name, glob pattern or iterable of string file names. options:

treename: str, name of the TTree object in the collection of files.
Optional only if all input files contain a single TTree, with the same name.
immutable

If True, this dataset cannot be modified in place, only transformed.

Opposite of mutable.

persistent

If True, this dataset exists in a form that survives the Python session, such as a file or database.

If mutable, changes in this dataset are reflected in that persistent form.

Opposite of transient.

to_array(**options)

Copy this dataset into a new NumpyDataset, without sharing any underlying data.

A change in the NumpyDataset leaves the original untouched.
Parameters

options: see options of root_numpy.tree2array.

NumpyDataset holding the new NumPy structured array.

to_file(base, **options)

Save this dataset to a file or collection of files.

base: str or iterable of str
String file name or iterable of string file names.
options:
mode: ROOT’s mode in which the TFile is to be opened. Default=’update’.