-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Rust-based OpenQASM 2 converter #9784
Conversation
This is a vendored version of qiskit-qasm2 (https://pypi.org/project/qiskit-qasm2), with this initial commit being equivalent (barring some naming / documentation / testing conversions to match Qiskit's style) to version 0.5.3 of that package. This adds a new translation layer from OpenQASM 2 to Qiskit, which is around an order of magnitude faster than the existing version in Python, while being more type safe (in terms of disallowing invalid OpenQASM 2 programs rather than attempting to construction `QuantumCircuit`s that are not correct) and more extensible. The core logic is a hand-written lexer and parser combination written in Rust, which emits a bytecode stream across the PyO3 boundary to a small Python interpreter loop. The main bulk of the parsing logic is a simple LL(1) recursive-descent algorithm, which delegates to more specific recursive Pratt-based algorithm for handling classical expressions. Many of the design decisions made (including why the lexer is written by hand) are because the project originally started life as a way for me to learn about implementations of the different parts of a parser stack; this is the principal reason there are very few external crates used. There are a few inefficiencies in this implementation, for example: - the string interner in the lexer allocates twice for each stored string (but zero times for a lookup). It may be possible to completely eliminate allocations when parsing a string (or a file if it's read into memory as a whole), but realistically there's only a fairly small number of different tokens seen in most OpenQASM 2 programs, so it shouldn't be too big a deal. - the hand-off from Rust to Python transfers small objects frequently. It might be more efficient to have a secondary buffered iterator in Python space, transferring more bytecode instructions at a time and letting Python resolve them. This form could also be made asynchronous, since for the most part, the Rust components only need to acquire the CPython GIL at the API boundary. - there are too many points within the lexer that can return a failure result that needs unwrapping at every site. Since there are no tokens that can span multiple lines, it should be possible to refactor so that almost all of the byte-getter and -peeker routines cannot return error statuses, at the cost of the main lexer loop becoming responsible for advancing the line buffer, and moving the non-ASCII error handling into each token constructor. I'll probably keep playing with some of those in the `qiskit-qasm2` package itself when I have free time, but at some point I needed to draw the line and vendor the package. It's still ~10x faster than the existing one: In [1]: import qiskit.qasm2 ...: prog = """ ...: OPENQASM 2.0; ...: include "qelib1.inc"; ...: qreg q[2]; ...: """ ...: prog += "rz(pi * 2) q[0];\ncx q[0], q[1];\n"*100_000 ...: %timeit qiskit.qasm2.loads(prog) ...: %timeit qiskit.QuantumCircuit.from_qasm_str(prog) 2.26 s ± 39.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 22.5 s ± 106 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) `cx`-heavy programs like this one are actually the ones that the new parser is (comparatively) slowest on, because the construction time of `CXGate` is higher than most gates, and this dominates the execution time for the Rust-based parser.
Thank you for opening a new pull request. Before your PR can be merged it will first need to pass continuous integration tests and be reviewed. Sometimes the review process can be slow, so please be patient. While you're waiting, please feel free to review other open PRs. While only a subset of people are authorized to approve pull requests for merging, everyone is encouraged to review open pull requests. Doing reviews helps reduce the burden on the core team and helps make the project's code better for everyone. One or more of the the following people are requested to review this:
|
Pull Request Test Coverage Report for Build 4679866338
💛 - Coveralls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exciting speedup!
Many of the design decisions made (including why the lexer is written by hand) are because the project originally started life as a way for me to learn about implementations of the different parts of a parser stack; this is the principal reason there are very few external crates used.
What are your thoughts on migrating to using one of the excellent Rust parsing crates? https://lib.rs/parsing. That will result in much less code for us to maintain, along with benefitting from the often-years of improvements those teams have made to their implementation. (Ack it will take some work to migrate to)
--
Also, I think this may benefit from some unit tests written in Rust. Those will be executed in CI thanks to #9593.
You do already have solid tests in Python, so this question isn't blocking. But, for example, unit tests of the parser itself (if you don't switch to a library) will help make sure we're handling all edge cases correctly. And unit tests also give more clear feedback of what specifically broke than having to work backwards from a Python test failure.
What do you think?
Honestly, I'm not particularly interested in putting in any further work to migrate this to a parsing library, because a) I spent long enough in work time playing with this anyway and b) parsing isn't the bottleneck - that's Python-space construction. OpenQASM 2 is an old language that doesn't need any further features supporting (the current Python-space implementation of the parser has effectively not changed in 5 years). I'm not sure I see the benefit of adding any unit tests actually written in Rust. The Python tests are unit tests of the public API; there's no part of the Rust code that's public, and they already hit (as best as I could think of them) all the edge cases of parsing because the inputs are just OpenQASM 2 programs. I think adding tests that are specifically of private functions in order to get more accurate diagnostics is probably a false economy: I personally had no trouble fixing bugs in the implementation when I was writing the package using these tests, whereas having tests of non-public-API functionality means that the tests have to change when the implementation changes. For example, it should be possible to migrate to a parsing library without changing a single one of my tests of the public API, but private tests of the Rust code would have to be entirely rewritten. |
Oh, and more about lexing/parsing libraries: now that I've written one, I've no need to write a lexer by hand again - lexer generators are much better - but I'd still write the parser by hand. You can have much better error messages and error recovery with a handwritten parser - it's why the majority of actual language compilers (as opposed to simple DSL parsers, where a generator library is more appropriate) go that route. You realistically need to use a parser generator if you're going to do LALR(1) parsing because it'd be mad tedious to build the stack and tables by hand and you'd be very liable to get it wrong, but LL(1) rescursive-descent-based parsing like this is very straightforwards to write by hand, is easier to emit good error messages from, and really, I think lexing is more likely to be a performance problem, especially for such a simple language as OpenQASM 2. |
The version of Sphinx that we're constrained to use in the docs build can't handle the `Unpack` operator, so as a temporary measure we can just relax the type hint a little.
The docs failure appears to be something specific to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent quality code!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other reviewers: this file looks good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other reviewers: this file looks good, outside of some small suggestions. I closely reviewed it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other reviewers: this file looks good. I closely reviewed it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other reviewers: I did not closely review this, only the PyO3 things which look good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have looked at this file and it looks good. I only had a few clarification questions.
There is of course a very extensive error handling and there is an extensive set of tests in test/python/qasm2
, I have not made completely sure that every allowed behavior or an error is covered by some test.
|
||
// We define the exception in Python space so it can inherit from QiskitError; it's easier to do | ||
// that from Python and wrap rather than also needing to import QiskitError to Rust to wrap. | ||
import_exception!(qiskit.qasm2.exceptions, QASM2ParseError); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You define 3 exceptions in Python land but only import 1 here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's expected - the Rust code only needs to emit the concrete QASM2ParseError
, never QASM2ExportError
. QASM2Error
is more structural than a concrete class, but it's in place as the base class for a follow-up PR where I'll alias the existing QasmError
to QASM2Error
as part of the complete removal of the old qiskit.qasm
namespace. I actually only meant to define two exceptions in Python space in this PR - the QASM2ExportError
one doesn't need to be in this PR, I just missed it when rebasing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading it again, the comment is kind of an artifact of how I vendored the package - over on https://github.com/jakelishman/qiskit-qasm2, I just have a single exception defined in Rust space, so when I wrote the comment after fixing things up into Qiskit structure, I had that historical background in mind.
This should have no impact on runtime or on memory usage, since each of the new types has the same bit width and alignment as the `usize` values they replace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving the PyO3/Rusty parts of this PR. Didn't closely look at Python unit tests or Rust parser/lexer
/// Define a simple newtype that just has a single non-public `usize` field, has a `new` | ||
/// constructor, and implements `Copy` and `IntoPy`. The first argument is the name of the type, | ||
/// the second is whether to also define addition to make offsetting the newtype easier. | ||
macro_rules! newtype_id { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other reviewers: this looks good
docs/apidocs/qasm2.rst
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trivial review, done.
docs/apidocs/terra.rst
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trivial review, done.
The previous system was quite confusing, and required all accesses to the global symbol table to know that the `Gate` symbol could be present but overridable. This led to confusing logic, various bugs and unnecessary constraints, such as it previously being (erroneously) possible to provide re-definitions for any "built-in" gate. Instead, we keep a separate store of instructions that may be redefined. This allows the logic to be centralised to only to the place responsible for performing those overrides, and remains accessible for error-message builders to query in order to provide better diagnostics.
Co-authored-by: Alexander Ivrii <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a quick scan through the code to see if anything stood out to me, I'm relying mostly on the detailed review from all the other code-reviewers. I left a couple of inline comments, nothing major the only thing I think would be good to fix is the pyo3 version.
} | ||
} | ||
|
||
/// Apply a "scientific calculator" built-in function to an [expression][Expr]. If the operand |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was always amused by this phrasing in the openqasm2 paper, especially since the scientific calculator I got when I went to college was programmable and had numerical integration support.
Co-authored-by: Matthew Treinish <[email protected]>
For a hashset of only 6 elements that is only checked once, there's not really any point to pull in an extra dependency or use a hash set at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the update
Since Qiskitgh-9784 added a new `qiskit.qasm2` package and suitable exceptions for both import and export of OpenQASM 2 programs, this commit now unifies the exception handling through the rest of the library to use these exceptions. To avoid breaks in compatibility, the old `qiskit.qasm.QasmError` is now a re-export of `qiskit.qasm2.QASM2Error`. This is the base error of both the exporter and importers, despite the old documentation of `QasmError` claiming to be specifically for parser errors. In practice, the exporter also emitted `QasmError`, so having that be `QASM2Error` allows people catching that error to get the same behaviour in these new forms. The actual exporter and importer are updated to emit the precise subclasses.
* Add Rust-based OpenQASM 2 converter This is a vendored version of qiskit-qasm2 (https://pypi.org/project/qiskit-qasm2), with this initial commit being equivalent (barring some naming / documentation / testing conversions to match Qiskit's style) to version 0.5.3 of that package. This adds a new translation layer from OpenQASM 2 to Qiskit, which is around an order of magnitude faster than the existing version in Python, while being more type safe (in terms of disallowing invalid OpenQASM 2 programs rather than attempting to construction `QuantumCircuit`s that are not correct) and more extensible. The core logic is a hand-written lexer and parser combination written in Rust, which emits a bytecode stream across the PyO3 boundary to a small Python interpreter loop. The main bulk of the parsing logic is a simple LL(1) recursive-descent algorithm, which delegates to more specific recursive Pratt-based algorithm for handling classical expressions. Many of the design decisions made (including why the lexer is written by hand) are because the project originally started life as a way for me to learn about implementations of the different parts of a parser stack; this is the principal reason there are very few external crates used. There are a few inefficiencies in this implementation, for example: - the string interner in the lexer allocates twice for each stored string (but zero times for a lookup). It may be possible to completely eliminate allocations when parsing a string (or a file if it's read into memory as a whole), but realistically there's only a fairly small number of different tokens seen in most OpenQASM 2 programs, so it shouldn't be too big a deal. - the hand-off from Rust to Python transfers small objects frequently. It might be more efficient to have a secondary buffered iterator in Python space, transferring more bytecode instructions at a time and letting Python resolve them. This form could also be made asynchronous, since for the most part, the Rust components only need to acquire the CPython GIL at the API boundary. - there are too many points within the lexer that can return a failure result that needs unwrapping at every site. Since there are no tokens that can span multiple lines, it should be possible to refactor so that almost all of the byte-getter and -peeker routines cannot return error statuses, at the cost of the main lexer loop becoming responsible for advancing the line buffer, and moving the non-ASCII error handling into each token constructor. I'll probably keep playing with some of those in the `qiskit-qasm2` package itself when I have free time, but at some point I needed to draw the line and vendor the package. It's still ~10x faster than the existing one: In [1]: import qiskit.qasm2 ...: prog = """ ...: OPENQASM 2.0; ...: include "qelib1.inc"; ...: qreg q[2]; ...: """ ...: prog += "rz(pi * 2) q[0];\ncx q[0], q[1];\n"*100_000 ...: %timeit qiskit.qasm2.loads(prog) ...: %timeit qiskit.QuantumCircuit.from_qasm_str(prog) 2.26 s ± 39.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 22.5 s ± 106 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) `cx`-heavy programs like this one are actually the ones that the new parser is (comparatively) slowest on, because the construction time of `CXGate` is higher than most gates, and this dominates the execution time for the Rust-based parser. * Work around docs failure on Sphinx 5.3, Python 3.9 The version of Sphinx that we're constrained to use in the docs build can't handle the `Unpack` operator, so as a temporary measure we can just relax the type hint a little. * Remove unused import * Tweak documentation * More specific PyO3 usage * Use PathBuf directly for paths * Format * Freeze dataclass * Use type-safe id types This should have no impact on runtime or on memory usage, since each of the new types has the same bit width and alignment as the `usize` values they replace. * Documentation tweaks * Fix comments in lexer * Fix lexing version number with separating comments * Add test of pathological formatting * Fixup release note * Fix handling of u0 gate * Credit reviewers Co-authored-by: Luciano Bello <[email protected]> Co-authored-by: Kevin Hartman <[email protected]> Co-authored-by: Eric Arellano <[email protected]> * Add test of invalid gate-body statements * Refactor custom built-in gate definitions The previous system was quite confusing, and required all accesses to the global symbol table to know that the `Gate` symbol could be present but overridable. This led to confusing logic, various bugs and unnecessary constraints, such as it previously being (erroneously) possible to provide re-definitions for any "built-in" gate. Instead, we keep a separate store of instructions that may be redefined. This allows the logic to be centralised to only to the place responsible for performing those overrides, and remains accessible for error-message builders to query in order to provide better diagnostics. * Credit Sasha Co-authored-by: Alexander Ivrii <[email protected]> * Credit Matthew Co-authored-by: Matthew Treinish <[email protected]> * Remove dependency on `lazy_static` For a hashset of only 6 elements that is only checked once, there's not really any point to pull in an extra dependency or use a hash set at all. * Update PyO3 version --------- Co-authored-by: Luciano Bello <[email protected]> Co-authored-by: Kevin Hartman <[email protected]> Co-authored-by: Eric Arellano <[email protected]> Co-authored-by: Alexander Ivrii <[email protected]> Co-authored-by: Matthew Treinish <[email protected]>
* Add Rust-based OpenQASM 2 converter This is a vendored version of qiskit-qasm2 (https://pypi.org/project/qiskit-qasm2), with this initial commit being equivalent (barring some naming / documentation / testing conversions to match Qiskit's style) to version 0.5.3 of that package. This adds a new translation layer from OpenQASM 2 to Qiskit, which is around an order of magnitude faster than the existing version in Python, while being more type safe (in terms of disallowing invalid OpenQASM 2 programs rather than attempting to construction `QuantumCircuit`s that are not correct) and more extensible. The core logic is a hand-written lexer and parser combination written in Rust, which emits a bytecode stream across the PyO3 boundary to a small Python interpreter loop. The main bulk of the parsing logic is a simple LL(1) recursive-descent algorithm, which delegates to more specific recursive Pratt-based algorithm for handling classical expressions. Many of the design decisions made (including why the lexer is written by hand) are because the project originally started life as a way for me to learn about implementations of the different parts of a parser stack; this is the principal reason there are very few external crates used. There are a few inefficiencies in this implementation, for example: - the string interner in the lexer allocates twice for each stored string (but zero times for a lookup). It may be possible to completely eliminate allocations when parsing a string (or a file if it's read into memory as a whole), but realistically there's only a fairly small number of different tokens seen in most OpenQASM 2 programs, so it shouldn't be too big a deal. - the hand-off from Rust to Python transfers small objects frequently. It might be more efficient to have a secondary buffered iterator in Python space, transferring more bytecode instructions at a time and letting Python resolve them. This form could also be made asynchronous, since for the most part, the Rust components only need to acquire the CPython GIL at the API boundary. - there are too many points within the lexer that can return a failure result that needs unwrapping at every site. Since there are no tokens that can span multiple lines, it should be possible to refactor so that almost all of the byte-getter and -peeker routines cannot return error statuses, at the cost of the main lexer loop becoming responsible for advancing the line buffer, and moving the non-ASCII error handling into each token constructor. I'll probably keep playing with some of those in the `qiskit-qasm2` package itself when I have free time, but at some point I needed to draw the line and vendor the package. It's still ~10x faster than the existing one: In [1]: import qiskit.qasm2 ...: prog = """ ...: OPENQASM 2.0; ...: include "qelib1.inc"; ...: qreg q[2]; ...: """ ...: prog += "rz(pi * 2) q[0];\ncx q[0], q[1];\n"*100_000 ...: %timeit qiskit.qasm2.loads(prog) ...: %timeit qiskit.QuantumCircuit.from_qasm_str(prog) 2.26 s ± 39.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 22.5 s ± 106 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) `cx`-heavy programs like this one are actually the ones that the new parser is (comparatively) slowest on, because the construction time of `CXGate` is higher than most gates, and this dominates the execution time for the Rust-based parser. * Work around docs failure on Sphinx 5.3, Python 3.9 The version of Sphinx that we're constrained to use in the docs build can't handle the `Unpack` operator, so as a temporary measure we can just relax the type hint a little. * Remove unused import * Tweak documentation * More specific PyO3 usage * Use PathBuf directly for paths * Format * Freeze dataclass * Use type-safe id types This should have no impact on runtime or on memory usage, since each of the new types has the same bit width and alignment as the `usize` values they replace. * Documentation tweaks * Fix comments in lexer * Fix lexing version number with separating comments * Add test of pathological formatting * Fixup release note * Fix handling of u0 gate * Credit reviewers Co-authored-by: Luciano Bello <[email protected]> Co-authored-by: Kevin Hartman <[email protected]> Co-authored-by: Eric Arellano <[email protected]> * Add test of invalid gate-body statements * Refactor custom built-in gate definitions The previous system was quite confusing, and required all accesses to the global symbol table to know that the `Gate` symbol could be present but overridable. This led to confusing logic, various bugs and unnecessary constraints, such as it previously being (erroneously) possible to provide re-definitions for any "built-in" gate. Instead, we keep a separate store of instructions that may be redefined. This allows the logic to be centralised to only to the place responsible for performing those overrides, and remains accessible for error-message builders to query in order to provide better diagnostics. * Credit Sasha Co-authored-by: Alexander Ivrii <[email protected]> * Credit Matthew Co-authored-by: Matthew Treinish <[email protected]> * Remove dependency on `lazy_static` For a hashset of only 6 elements that is only checked once, there's not really any point to pull in an extra dependency or use a hash set at all. * Update PyO3 version --------- Co-authored-by: Luciano Bello <[email protected]> Co-authored-by: Kevin Hartman <[email protected]> Co-authored-by: Eric Arellano <[email protected]> Co-authored-by: Alexander Ivrii <[email protected]> Co-authored-by: Matthew Treinish <[email protected]>
Since Qiskitgh-9784 added a new `qiskit.qasm2` package and suitable exceptions for both import and export of OpenQASM 2 programs, this commit now unifies the exception handling through the rest of the library to use these exceptions. To avoid breaks in compatibility, the old `qiskit.qasm.QasmError` is now a re-export of `qiskit.qasm2.QASM2Error`. This is the base error of both the exporter and importers, despite the old documentation of `QasmError` claiming to be specifically for parser errors. In practice, the exporter also emitted `QasmError`, so having that be `QASM2Error` allows people catching that error to get the same behaviour in these new forms. The actual exporter and importer are updated to emit the precise subclasses.
Since Qiskitgh-9784 added a new `qiskit.qasm2` package and suitable exceptions for both import and export of OpenQASM 2 programs, this commit now unifies the exception handling through the rest of the library to use these exceptions. To avoid breaks in compatibility, the old `qiskit.qasm.QasmError` is now a re-export of `qiskit.qasm2.QASM2Error`. This is the base error of both the exporter and importers, despite the old documentation of `QasmError` claiming to be specifically for parser errors. In practice, the exporter also emitted `QasmError`, so having that be `QASM2Error` allows people catching that error to get the same behaviour in these new forms. The actual exporter and importer are updated to emit the precise subclasses.
* Unify OpenQASM 2 exceptions Since gh-9784 added a new `qiskit.qasm2` package and suitable exceptions for both import and export of OpenQASM 2 programs, this commit now unifies the exception handling through the rest of the library to use these exceptions. To avoid breaks in compatibility, the old `qiskit.qasm.QasmError` is now a re-export of `qiskit.qasm2.QASM2Error`. This is the base error of both the exporter and importers, despite the old documentation of `QasmError` claiming to be specifically for parser errors. In practice, the exporter also emitted `QasmError`, so having that be `QASM2Error` allows people catching that error to get the same behaviour in these new forms. The actual exporter and importer are updated to emit the precise subclasses. * Add newly-imported gates to exclude The slight change in import order means that two new `Gate` class bodies got executed during the imports for this particular part of the test suite. These gates were internal to the `qasm2.load` functionality and should be exempt from the automated definition-equivalence testing.
Summary
This is a vendored version of qiskit-qasm2, with this initial commit being equivalent (barring some naming / documentation / testing conversions to match Qiskit's style) to version 0.5.3 of that package.
This adds a new translation layer from OpenQASM 2 to Qiskit, which is around an order of magnitude faster than the existing version in Python, while being more type safe (in terms of disallowing invalid OpenQASM 2 programs rather than attempting to construct
QuantumCircuit
s that are not correct) and more extensible.When using its full set of compatibility options, this can be a drop-in replacement for
QuantumCircuit.from_qasm_str
andQuantumCircuit.from_qasm_file
, which will allow us to deprecate and remove all the code inqiskit/qasm
(except for the Pygments stuff which we can replace withopenqasm-pygments
, which I'd even forgotten I wrote ages ago). I haven't done that yet, though - in an approximate attempt to keep the diff down (+8,200, -11 lol), I can begin those deprecations in a follow-up. We can also choose to replace the internals ofQuantumCircuit.from_qasm_str
and_file
immediately, if we prefer, to realise the speed-ups for everyone (again, follow-up).Details and comments
The core logic is a hand-written lexer and parser combination written in Rust, which emits a bytecode stream across the PyO3 boundary to a small Python interpreter loop. The main bulk of the parsing logic is a simple LL(1) recursive-descent algorithm, which delegates to more specific recursive Pratt-based algorithm for handling classical expressions.
Many of the design decisions made (including why the lexer is written by hand) are because the project originally started life as a way for me to learn about implementations of the different parts of a parser stack; this is the principal reason there are very few external crates used. There are a few inefficiencies in this implementation, for example:
the string interner in the lexer allocates twice for each stored string (but zero times for a lookup). It may be possible to completely eliminate allocations when parsing a string (or a file if it's read into memory as a whole), but realistically there's only a fairly small number of different tokens seen in most OpenQASM 2 programs, so it shouldn't be too big a deal.
the hand-off from Rust to Python transfers small objects frequently. It might be more efficient to have a secondary buffered iterator in Python space, transferring more bytecode instructions at a time and letting Python resolve them. This form could also be made asynchronous, since for the most part, the Rust components only need to acquire the CPython GIL at the API boundary.
there are too many points within the lexer that can return a failure result that needs unwrapping at every site. Since there are no tokens that can span multiple lines, it should be possible to refactor so that almost all of the byte-getter and -peeker routines cannot return error statuses, at the cost of the main lexer loop becoming responsible for advancing the line buffer, and moving the non-ASCII error handling into each token constructor.
I'll probably keep playing with some of those in the
qiskit-qasm2
package itself when I have free time, but at some point I needed to draw the line and vendor the package. It's still ~10x faster than the existing one:cx
-heavy programs like this one are actually the ones that the new parser is (comparatively) slowest on, because the construction time ofCXGate
is higher than most gates, and this dominates the execution time for the Rust-based parser.