Skip to content

Commit

Permalink
[pre-commit.ci] auto fixes from pre-commit.com hooks
Browse files Browse the repository at this point in the history
for more information, see https://pre-commit.ci
  • Loading branch information
pre-commit-ci[bot] committed Mar 12, 2024
1 parent 0de4b64 commit 4f8182c
Show file tree
Hide file tree
Showing 2 changed files with 30 additions and 31 deletions.
1 change: 0 additions & 1 deletion projects/cms-alpaka.yml
Original file line number Diff line number Diff line change
Expand Up @@ -50,4 +50,3 @@ contacts:
email: [email protected]
- name: Volodymyr Bezguba
email: [email protected]

60 changes: 30 additions & 30 deletions projects/uproot-awkwardforth-refactor.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,39 +20,39 @@ commitment:
program:
- IRIS-HEP fellow
shortdescription: >
Keeping the functionality of Uproot's accelerated reading through AwkwardForth, but
making it more maintainable by removing mutable state/coding it in a functional style.
Keeping the functionality of Uproot's accelerated reading through AwkwardForth, but
making it more maintainable by removing mutable state/coding it in a functional style.
description: >
Uproot is a Python library for reading and writing ROOT files (the most common file
format in particle physics). While it is relatively fast at reading "columnar" data,
either arrays of numbers or arrays of numbers that are grouped into variable-length
lists, any other data type requires iteration, which is a performance limitation in
the Python language. ("for" loops in Python are 100's of times slower than in compiled
languages.) To improve this situation, we introduced a domain-specific language (DSL)
called AwkwardForth, in which loops are much faster to execute than they are in Python
(factors of 100's again). This language was created in 2021 (https://arxiv.org/abs/2102.13516)
and added to Uproot in 2022 (https://arxiv.org/abs/2303.02202). In the end, an example
data structure (std::vector<std::vector<float>>) could be read 400× faster with
AwkwardForth than with Python. Users of Uproot don't have to opt in or change their
code, it just runs faster.
Uproot is a Python library for reading and writing ROOT files (the most common file
format in particle physics). While it is relatively fast at reading "columnar" data,
either arrays of numbers or arrays of numbers that are grouped into variable-length
lists, any other data type requires iteration, which is a performance limitation in
the Python language. ("for" loops in Python are 100's of times slower than in compiled
languages.) To improve this situation, we introduced a domain-specific language (DSL)
called AwkwardForth, in which loops are much faster to execute than they are in Python
(factors of 100's again). This language was created in 2021 (https://arxiv.org/abs/2102.13516)
and added to Uproot in 2022 (https://arxiv.org/abs/2303.02202). In the end, an example
data structure (std::vector<std::vector<float>>) could be read 400× faster with
AwkwardForth than with Python. Users of Uproot don't have to opt in or change their
code, it just runs faster.
That would be the end of the story, except that the AwkwardForth-generating code in
Uproot has been very hard to maintain. In part, it's because it's doing something
complicated: generating code that runs later or generating code that generates code
that runs later. But it is also more complicated than it needs to be, with Python
objects that change their own attributes in arbitrary ways as information about what
AwkwardForth needs to be generated accumulates. The code would be much easier to read
and reason about if it were stateless or append-only (see: functional programming),
and it easily could be. This project would be to restructure the AwkwardForth-generating
code in a functional style, to "remove the moving parts."
That would be the end of the story, except that the AwkwardForth-generating code in
Uproot has been very hard to maintain. In part, it's because it's doing something
complicated: generating code that runs later or generating code that generates code
that runs later. But it is also more complicated than it needs to be, with Python
objects that change their own attributes in arbitrary ways as information about what
AwkwardForth needs to be generated accumulates. The code would be much easier to read
and reason about if it were stateless or append-only (see: functional programming),
and it easily could be. This project would be to restructure the AwkwardForth-generating
code in a functional style, to "remove the moving parts."
To be clear, the project will not require you to understand the AwkwardForth that is
being generated (though that's not a bad thing), and it will not require you to figure
out how to generate the right AwkwardForth for a given data type. This part of the problem
has been solved and there are many unit tests that can check correctness, to allow you to
do test-driven development. The project is about software engineering: how to structure
code so that it can be read and understood, while keeping the problem-solving aspect
unchanged.
To be clear, the project will not require you to understand the AwkwardForth that is
being generated (though that's not a bad thing), and it will not require you to figure
out how to generate the right AwkwardForth for a given data type. This part of the problem
has been solved and there are many unit tests that can check correctness, to allow you to
do test-driven development. The project is about software engineering: how to structure
code so that it can be read and understood, while keeping the problem-solving aspect
unchanged.
contacts:
- name: Ioana Ifrim
email: [email protected]
Expand Down

0 comments on commit 4f8182c

Please sign in to comment.