{
"cells": [
{
"cell_type": "markdown",
"id": "d003df19-502d-42b2-94d6-fa9d909452ff",
"metadata": {},
"source": [
"# PyLHE version 1.0.0 demo"
]
},
{
"cell_type": "markdown",
"id": "5f29ceda",
"metadata": {},
"source": [
"## The LHE File Format\n",
"\n",
"- standardized format to describe events generated in high-energy physics simulations\n",
"- widely used in the context of Monte Carlo event generators\n",
"- designed to facilitate the exchange of event information between different software packages\n",
"\n",
"### Key Features\n",
"\n",
"- **XML-Based with Fortran mixin**: both human-readable and machine-readable.\n",
"- **Event Structure**: contains a series of events, each described by particles and their properties.\n",
"\n",
"### Basic Structure\n",
"\n",
"An LHE file typically consists of the following main components:\n",
"\n",
"1. **Header**: Contains metadata, version and generator information.\n",
"2. **Initialization Block**: Describes the initial state of the event, including the incoming beams.\n",
"3. **Event Blocks**: Each event is described in its own block, detailing the initial, intermediate, and final state particles.\n",
"4. **Weights Block**: Optional block that provides additional information about the event weights.\n",
"\n",
"```xml\n",
"\n",
"\n",
"\n",
"beam1id beam2id beam1energy beam2energy pdfg1 pdfg2 pdfs1 pdfs2 idweight nproc\n",
"crosssection crosssectionerror crosssectionmaximum pid\n",
"...\n",
"\n",
"\n",
"nparticles pid weight scale aqed aqcd\n",
"id status mother1 mother2 color1 color2 px py pz E m lifetime spin\n",
"...\n",
"\n",
"...\n",
"\n",
"```\n"
]
},
{
"cell_type": "markdown",
"id": "9df3cfc4",
"metadata": {},
"source": [
"## New since version 1.0.0\n",
"\n",
"- Strict typing checks with MyPy\n",
"- Larger test suite\n",
"- Sphinx documentation at https://pylhe.readthedocs.io/\n",
"\n",
" "
]
},
{
"cell_type": "markdown",
"id": "0c7c4959-7553-4e45-a855-0180c33c3f97",
"metadata": {},
"source": [
"### IO access via `@classmethod`s `LHEFile.fromstring/fromfile`instead of old standalone functions"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "c804dc49",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"LHEFile(init=LHEInit(initInfo=LHEInitInfo(beamA=11, beamB=-11, energyA=100.0, energyB=100.0, PDFgroupA=0, PDFgroupB=0, PDFsetA=0, PDFsetB=0, weightingStrategy=3, numProcesses=1), procInfo=[LHEProcInfo(xSection=3.78359, error=0.001676, unitWeight=0.001569, procId=0)], weightgroup={}, LHEVersion='1.0'), events=._generator at 0x7f1eb5031fc0>)"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from pylhe import LHEFile\n",
"\n",
"mylhe = \"\"\"\n",
"\n",
"\n",
"\n",
"11 -11 100.0 100.0 0 0 0 0 3 1\n",
"3.783590 0.001676 0.001569 0\n",
"\n",
"\n",
"6 0 3.783590e-06 200 0.007849 0.1075\n",
"11 -1 0 0 0 0 0 0 100.0 100.0 0 0 9.0\n",
"-11 -1 0 0 0 0 0 0 -100.0 100.0 0 0 9.0\n",
"22 2 1 2 0 0 0 0 0 200 200 0 9.0\n",
"2 1 3 0 0 0 48.253308 67.445271 -54.164510 99.050697 0 0 9.0\n",
"21 1 3 0 0 0 -1.190913 -12.743630 5.613176 13.975913 0 0 9.0\n",
"-2 1 3 0 0 0 -47.062395 -54.701640 48.551333 86.973389 0 0 9.0\n",
"\n",
"\n",
"\"\"\"\n",
"lhef = LHEFile.fromstring(mylhe)\n",
"lhef"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "2187339b",
"metadata": {},
"outputs": [
{
"data": {
"image/svg+xml": [
"\n",
"\n",
"\n",
"\n",
"\n"
],
"text/plain": [
"LHEEvent(eventinfo=LHEEventInfo(nparticles=6, pid=0, weight=3.78359e-06, scale=200.0, aqed=0.007849, aqcd=0.1075), particles=[LHEParticle(id=11, status=-1, mother1=0, mother2=0, color1=0, color2=0, px=0.0, py=0.0, pz=100.0, e=100.0, m=0.0, lifetime=0.0, spin=9.0), LHEParticle(id=-11, status=-1, mother1=0, mother2=0, color1=0, color2=0, px=0.0, py=0.0, pz=-100.0, e=100.0, m=0.0, lifetime=0.0, spin=9.0), LHEParticle(id=22, status=2, mother1=1, mother2=2, color1=0, color2=0, px=0.0, py=0.0, pz=0.0, e=200.0, m=200.0, lifetime=0.0, spin=9.0), LHEParticle(id=2, status=1, mother1=3, mother2=0, color1=0, color2=0, px=48.253308, py=67.445271, pz=-54.16451, e=99.050697, m=0.0, lifetime=0.0, spin=9.0), LHEParticle(id=21, status=1, mother1=3, mother2=0, color1=0, color2=0, px=-1.190913, py=-12.74363, pz=5.613176, e=13.975913, m=0.0, lifetime=0.0, spin=9.0), LHEParticle(id=-2, status=1, mother1=3, mother2=0, color1=0, color2=0, px=-47.062395, py=-54.70164, pz=48.551333, e=86.973389, m=0.0, lifetime=0.0, spin=9.0)], weights={}, attributes={}, optional=[], _graph=)"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"theevent = next(lhef.events)\n",
"theevent"
]
},
{
"cell_type": "markdown",
"id": "9935a24a-3d4c-4c12-93b9-00ebfdaf74ae",
"metadata": {},
"source": [
"### Structured dataclasses instead of deprecated dicts\n",
"\n",
"old dict way"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "799b1cc0-1bb6-4e84-94ad-ab79827d6b4c",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/tmp/ipykernel_26523/816196594.py:1: DeprecationWarning: Access by `object[\"init\"]` is deprecated and will be removed in a future version. Use `object.init` instead.\n",
" lhef[\"init\"][\"initInfo\"][\"beamA\"]\n",
"/tmp/ipykernel_26523/816196594.py:1: DeprecationWarning: Access by `lheinit[\"initInfo\"]` is deprecated and will be removed in a future version. Use `lheinit.initInfo` instead.\n",
" lhef[\"init\"][\"initInfo\"][\"beamA\"]\n",
"/tmp/ipykernel_26523/816196594.py:1: DeprecationWarning: Access by `object[\"beamA\"]` is deprecated and will be removed in a future version. Use `object.beamA` instead.\n",
" lhef[\"init\"][\"initInfo\"][\"beamA\"]\n"
]
},
{
"data": {
"text/plain": [
"11"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"lhef[\"init\"][\"initInfo\"][\"beamA\"]"
]
},
{
"cell_type": "markdown",
"id": "28a9626a-583d-43ed-9cb6-a6425523ef7a",
"metadata": {},
"source": [
"becomes simpler"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "4ebd041d",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"11"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"lhef.init.initInfo.beamA"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "7cca7914",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[LHEParticle(id=11, status=-1, mother1=0, mother2=0, color1=0, color2=0, px=0.0, py=0.0, pz=100.0, e=100.0, m=0.0, lifetime=0.0, spin=9.0),\n",
" LHEParticle(id=-11, status=-1, mother1=0, mother2=0, color1=0, color2=0, px=0.0, py=0.0, pz=-100.0, e=100.0, m=0.0, lifetime=0.0, spin=9.0),\n",
" LHEParticle(id=22, status=2, mother1=1, mother2=2, color1=0, color2=0, px=0.0, py=0.0, pz=0.0, e=200.0, m=200.0, lifetime=0.0, spin=9.0),\n",
" LHEParticle(id=2, status=1, mother1=3, mother2=0, color1=0, color2=0, px=48.253308, py=67.445271, pz=-54.16451, e=99.050697, m=0.0, lifetime=0.0, spin=9.0),\n",
" LHEParticle(id=21, status=1, mother1=3, mother2=0, color1=0, color2=0, px=-1.190913, py=-12.74363, pz=5.613176, e=13.975913, m=0.0, lifetime=0.0, spin=9.0),\n",
" LHEParticle(id=-2, status=1, mother1=3, mother2=0, color1=0, color2=0, px=-47.062395, py=-54.70164, pz=48.551333, e=86.973389, m=0.0, lifetime=0.0, spin=9.0)]"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"theevent.particles"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "27cf662b",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"11"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"theevent.particles[0].id"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "d4e6f640",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[LHEParticle(id=22, status=2, mother1=1, mother2=2, color1=0, color2=0, px=0.0, py=0.0, pz=0.0, e=200.0, m=200.0, lifetime=0.0, spin=9.0)]"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"theevent.particles[-1].mothers()"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "215c69f6",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
" 11 -11 1.0000000e+02 1.0000000e+02 0 0 0 0 3 1\n",
" 3.7835900e+00 1.6760000e-03 1.5690000e-03 0\n",
"\n",
"\n"
]
}
],
"source": [
"print(lhef.tolhe())"
]
},
{
"cell_type": "markdown",
"id": "b3151c22-4292-4324-8a1c-c32c06c00997",
"metadata": {},
"source": [
"> ⚠️ **No Events?!:** We already consumed the events `yield`ed by the generator. The generator approach is great for large files or event generation streams, but what if we just want to modify the/some events in a list-like fashion."
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "e08924b5-63d8-45bf-a5c2-5d8815c0b912",
"metadata": {},
"outputs": [],
"source": [
"lhef = LHEFile.fromstring(mylhe, generator=False)"
]
},
{
"cell_type": "markdown",
"id": "d32a09b8-35e2-48ee-87fc-0a8ef05f80f4",
"metadata": {},
"source": [
"### New output/write to file `LHEFile.tofile/tolhe`"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "0dab87e5-e04c-48b1-a412-daef147450c3",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
" 11 -11 1.0000000e+02 1.0000000e+02 0 0 0 0 3 1\n",
" 3.7835900e+00 1.6760000e-03 1.5690000e-03 0\n",
"\n",
"\n",
" 6 0 3.7835900000e-06 2.0000000000e+02 7.8490000000e-03 1.0750000000e-01\n",
" 11 -1 0 0 0 0 0.00000000e+00 0.00000000e+00 1.00000000e+02 1.00000000e+02 0.00000000e+00 0.0000e+00 9.0000e+00\n",
" -11 -1 0 0 0 0 0.00000000e+00 0.00000000e+00 -1.00000000e+02 1.00000000e+02 0.00000000e+00 0.0000e+00 9.0000e+00\n",
" 22 2 1 2 0 0 0.00000000e+00 0.00000000e+00 0.00000000e+00 2.00000000e+02 2.00000000e+02 0.0000e+00 9.0000e+00\n",
" 2 1 3 0 0 0 4.82533080e+01 6.74452710e+01 -5.41645100e+01 9.90506970e+01 0.00000000e+00 0.0000e+00 9.0000e+00\n",
" 21 1 3 0 0 0 -1.19091300e+00 -1.27436300e+01 5.61317600e+00 1.39759130e+01 0.00000000e+00 0.0000e+00 9.0000e+00\n",
" -2 1 3 0 0 0 -4.70623950e+01 -5.47016400e+01 4.85513330e+01 8.69733890e+01 0.00000000e+00 0.0000e+00 9.0000e+00\n",
"\n",
"\n"
]
}
],
"source": [
"lhef.tofile(\"myevents.lhe.gz\")\n",
"print(lhef.tolhe())"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "ad521850-bcd8-41d2-8cbd-d950fcde1ae9",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\n",
" 11 -11 1.0000000e+02 1.0000000e+02 0 0 0 0 3 1\n",
" 3.7835900e+00 1.6760000e-03 1.5690000e-03 0\n",
"\n",
"\n",
" 6 0 3.7835900000e-06 2.0000000000e+02 7.8490000000e-03 1.0750000000e-01\n",
" 11 -1 0 0 0 0 0.00000000e+00 0.00000000e+00 1.00000000e+02 1.00000000e+02 0.00000000e+00 0.0000e+00 9.0000e+00\n",
" -11 -1 0 0 0 0 0.00000000e+00 0.00000000e+00 -1.00000000e+02 1.00000000e+02 0.00000000e+00 0.0000e+00 9.0000e+00\n",
" 22 2 1 2 0 0 0.00000000e+00 0.00000000e+00 0.00000000e+00 2.00000000e+02 2.00000000e+02 0.0000e+00 9.0000e+00\n",
" 2 1 3 0 0 0 4.82533080e+01 6.74452710e+01 -5.41645100e+01 9.90506970e+01 0.00000000e+00 0.0000e+00 9.0000e+00\n",
" 21 1 3 0 0 0 -1.19091300e+00 -1.27436300e+01 5.61317600e+00 1.39759130e+01 0.00000000e+00 0.0000e+00 9.0000e+00\n",
" -2 1 3 0 0 0 -4.70623950e+01 -5.47016400e+01 4.85513330e+01 8.69733890e+01 0.00000000e+00 0.0000e+00 9.0000e+00\n",
"\n",
""
]
}
],
"source": [
"!zcat myevents.lhe.gz"
]
},
{
"cell_type": "markdown",
"id": "fcaa0fa1",
"metadata": {},
"source": [
"## dict -> dataclass: Parse don't validate! [1]\n",
"\n",
"- Don’t just check if data is valid instead transform it into a well-typed structure.\n",
"- Make invalid states unrepresentable through type-safe parsing.\n",
"\n",
"[1] https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-validate/\n"
]
},
{
"cell_type": "markdown",
"id": "cbc3e8ff",
"metadata": {},
"source": [
"### Dataclasses and strict typing instead of loose dicts\n",
"\n",
"The old layout:\n",
"- inherits from `dict`\n",
"- weird `fieldnames`\n",
"- `args` and `kwargs`\n",
"- ugly hypothetical `dict` typing"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "2ba8716d",
"metadata": {},
"outputs": [],
"source": [
"class LHEInit_old(dict):\n",
" \"\"\"Store the block as dict.\"\"\"\n",
"\n",
" # weightgroup : dict[str, dict[str, dict[str, Union[dict[str, str], str, int]]]]\n",
" fieldnames = [\"initInfo\", \"procInfo\", \"weightgroup\", \"LHEVersion\"]\n",
"\n",
" def __init__(self, *args, **kwargs):\n",
" super().__init__(*args, **kwargs)"
]
},
{
"cell_type": "markdown",
"id": "7d49eb1e",
"metadata": {},
"source": [
"The new layout below:\n",
"- members are strictly typed now\n",
"- properly type hint => better LLM/coding agent integration\n",
"- autocompletion works better in Jupyter Notebooks and IDEs\n",
"- data is guaranteed to exist in correct format unlike dicts\n",
"- MyPy found bugs in the old code while doing this (e.g. missing `None` checks)\n",
"- members have documentation strings\n",
"- data is printed nicely without extra effort"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "9b5e3e98",
"metadata": {},
"outputs": [],
"source": [
"from dataclasses import dataclass\n",
"\n",
"from pylhe import LHEInitInfo, LHEProcInfo, LHEWeightGroup\n",
"\n",
"\n",
"@dataclass\n",
"class LHEInit_new:\n",
" \"\"\"Store the block as a dataclass.\"\"\"\n",
"\n",
" initInfo: LHEInitInfo\n",
" \"\"\"Init information\"\"\"\n",
" procInfo: list[LHEProcInfo]\n",
" \"\"\"Process information\"\"\"\n",
" weightgroup: dict[str, LHEWeightGroup]\n",
" \"\"\"Weight group information\"\"\"\n",
" LHEVersion: str\n",
" \"\"\"LHE version\"\"\""
]
},
{
"cell_type": "markdown",
"id": "73b0208a",
"metadata": {},
"source": [
"### Compatibility with previous dict versions\n",
"\n",
"We do not want to break all old `pylhe` downstream consumer code."
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "5a3606a8",
"metadata": {},
"outputs": [],
"source": [
"from abc import ABC\n",
"from collections.abc import MutableMapping\n",
"from dataclasses import asdict, dataclass, fields\n",
"from typing import Any\n",
"\n",
"\n",
"@dataclass\n",
"class DictCompatibility(MutableMapping[str, Any], ABC):\n",
" \"\"\"\n",
" Mixin for dataclasses to behave like mutable dictionaries.\n",
" \"\"\"\n",
"\n",
" def __getitem__(self, key: str) -> Any:\n",
" return getattr(self, key)\n",
"\n",
" def __setitem__(self, key: str, value: Any) -> None:\n",
" setattr(self, key, value)\n",
"\n",
" def __delitem__(self, key: str) -> None:\n",
" err = f\"Cannot delete field {key!r} from dataclass instance\"\n",
" raise TypeError(err)\n",
"\n",
" def __iter__(self) -> Any:\n",
" return iter(asdict(self))\n",
"\n",
" def __len__(self) -> int:\n",
" return len(asdict(self))\n",
"\n",
" @property\n",
" def fieldnames(self) -> list[str]:\n",
" return [f.name for f in fields(self)]\n",
"\n",
"\n",
"@dataclass\n",
"class LHEInit_current(DictCompatibility): ..."
]
},
{
"cell_type": "markdown",
"id": "0c20f9ec",
"metadata": {},
"source": [
"- Achieves backwards compatibility where warnings can be added to nudge users to the new API\n",
"- Deprecated dict-like access via `object['key']`"
]
},
{
"cell_type": "markdown",
"id": "135f4c9c-c24e-4e0f-aac1-7e4bb9fdcfd8",
"metadata": {},
"source": [
"### Next steps\n",
"\n",
"- release 1.0.0\n",
"- publish JOSS paper"
]
},
{
"cell_type": "markdown",
"id": "de6ae27d-8f82-47bf-b05e-6ac43d4b0e39",
"metadata": {},
"source": [
"## Further interactive examples to explore\n",
"\n",
"- [Analyze LHE file and plot `hist`-ograms →](01_zpeak.ipynb)\n",
"- [Filter LHE events based on kinematic cuts →](02_filter_events_example.ipynb)\n",
"- [Simple Monte Carlo LHE event generator →](03_write_monte_carlo_example.ipynb)\n",
"- [Conversion/interface to `awkward` arrays →](91_awkward_example.ipynb)\n",
"- [Parallel processing of LHE files →](92_multiple_files.ipynb)\n",
"- [Parquet Cache →](93_parquet_cache.ipynb)"
]
},
{
"cell_type": "markdown",
"id": "6fbf43b5",
"metadata": {},
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.13.5"
}
},
"nbformat": 4,
"nbformat_minor": 5
}