Dealing with Multiple LHE Files
Oftentimes, you may wish to juggle many LHE files that have been generated using the same (or extremely similar) methods and you wish to combine all of these LHE files into one “sample” which you can analyze with a single set of analysis code. This can be done rather easily and quickly by utilizing an intermediate parquet file which is supported by awkward.
import awkward as ak
# Use an example LHE file from package scikit-hep-testdata
from skhep_testdata import data_path
import pylhe
lhe_file = data_path("pylhe-drell-yan-ll-lhe.gz")
# Our input files will simply be multiple copies of the same file for the sake of this example,
# but you can imagine doing the same process below with actually different LHE files
list_of_input_files = [lhe_file for _ in range(3)]
# get arrays for each file
unmerged_arrays = [
pylhe.to_awkward(pylhe.read_lhe_with_attributes(f)) for f in list_of_input_files
]
# merge arrays into single mega-array
array = ak.concatenate(unmerged_arrays)
# store merged array into cache parquet file
ak.to_parquet(array, "merged.parquet")
# any below analysis code can retrieve array using ak.from_parquent('merged.parquet')
<pyarrow._parquet.FileMetaData object at 0x7f2fecc9bec0>
created_by: parquet-cpp-arrow version 12.0.0
num_columns: 19
num_rows: 30000
num_row_groups: 1
format_version: 2.6
serialized_size: 0
Now all the analysis code can utilize the merged file which only needs to be regenerated if more files want to be included or the source LHE files change.
ak.from_parquet("merged.parquet")
[{eventinfo: {nparticles: 4, pid: 1, ...}, particles: [...]}, {eventinfo: {nparticles: 5, pid: 1, ...}, particles: [...]}, {eventinfo: {nparticles: 5, pid: 1, ...}, particles: [...]}, {eventinfo: {nparticles: 4, pid: 1, ...}, particles: [...]}, {eventinfo: {nparticles: 4, pid: 1, ...}, particles: [...]}, {eventinfo: {nparticles: 4, pid: 1, ...}, particles: [...]}, {eventinfo: {nparticles: 5, pid: 1, ...}, particles: [...]}, {eventinfo: {nparticles: 4, pid: 1, ...}, particles: [...]}, {eventinfo: {nparticles: 5, pid: 1, ...}, particles: [...]}, {eventinfo: {nparticles: 4, pid: 1, ...}, particles: [...]}, ..., {eventinfo: {nparticles: 4, pid: 1, ...}, particles: [...]}, {eventinfo: {nparticles: 4, pid: 1, ...}, particles: [...]}, {eventinfo: {nparticles: 4, pid: 1, ...}, particles: [...]}, {eventinfo: {nparticles: 4, pid: 1, ...}, particles: [...]}, {eventinfo: {nparticles: 5, pid: 1, ...}, particles: [...]}, {eventinfo: {nparticles: 4, pid: 1, ...}, particles: [...]}, {eventinfo: {nparticles: 4, pid: 1, ...}, particles: [...]}, {eventinfo: {nparticles: 4, pid: 1, ...}, particles: [...]}, {eventinfo: {nparticles: 5, pid: 1, ...}, particles: [...]}] ------------------------------------------------------------- type: 30000 * Event[ eventinfo: EventInfo[ nparticles: float64, pid: float64, weight: float64, scale: float64, aqed: float64, aqcd: float64 ], particles: var * Particle[ vector: Momentum4D[ x: float64, y: float64, z: float64, t: float64 ], id: float64, status: float64, mother1: float64, mother2: float64, color1: float64, color2: float64, m: float64, lifetime: float64, spin: float64 ] ]