Skip to content

Commit

Permalink
Add Rust-based OpenQASM 2 converter (Qiskit#9784)
Browse files Browse the repository at this point in the history
* Add Rust-based OpenQASM 2 converter

This is a vendored version of qiskit-qasm2
(https://pypi.org/project/qiskit-qasm2), with this initial commit being
equivalent (barring some naming / documentation / testing conversions to
match Qiskit's style) to version 0.5.3 of that package.

This adds a new translation layer from OpenQASM 2 to Qiskit, which is
around an order of magnitude faster than the existing version in Python,
while being more type safe (in terms of disallowing invalid OpenQASM 2
programs rather than attempting to construction `QuantumCircuit`s that
are not correct) and more extensible.

The core logic is a hand-written lexer and parser combination written in
Rust, which emits a bytecode stream across the PyO3 boundary to a small
Python interpreter loop.  The main bulk of the parsing logic is a simple
LL(1) recursive-descent algorithm, which delegates to more specific
recursive Pratt-based algorithm for handling classical expressions.

Many of the design decisions made (including why the lexer is written by
hand) are because the project originally started life as a way for me to
learn about implementations of the different parts of a parser stack;
this is the principal reason there are very few external crates used.
There are a few inefficiencies in this implementation, for example:

- the string interner in the lexer allocates twice for each stored
  string (but zero times for a lookup).  It may be possible to
  completely eliminate allocations when parsing a string (or a file if
  it's read into memory as a whole), but realistically there's only a
  fairly small number of different tokens seen in most OpenQASM 2
  programs, so it shouldn't be too big a deal.

- the hand-off from Rust to Python transfers small objects frequently.
  It might be more efficient to have a secondary buffered iterator in
  Python space, transferring more bytecode instructions at a time and
  letting Python resolve them.  This form could also be made
  asynchronous, since for the most part, the Rust components only need
  to acquire the CPython GIL at the API boundary.

- there are too many points within the lexer that can return a failure
  result that needs unwrapping at every site.  Since there are no tokens
  that can span multiple lines, it should be possible to refactor so
  that almost all of the byte-getter and -peeker routines cannot return
  error statuses, at the cost of the main lexer loop becoming
  responsible for advancing the line buffer, and moving the non-ASCII
  error handling into each token constructor.

I'll probably keep playing with some of those in the `qiskit-qasm2`
package itself when I have free time, but at some point I needed to draw
the line and vendor the package.  It's still ~10x faster than the
existing one:

    In [1]: import qiskit.qasm2
       ...: prog = """
       ...:     OPENQASM 2.0;
       ...:     include "qelib1.inc";
       ...:     qreg q[2];
       ...: """
       ...: prog += "rz(pi * 2) q[0];\ncx q[0], q[1];\n"*100_000
       ...: %timeit qiskit.qasm2.loads(prog)
       ...: %timeit qiskit.QuantumCircuit.from_qasm_str(prog)
    2.26 s ± 39.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
    22.5 s ± 106 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

`cx`-heavy programs like this one are actually the ones that the new
parser is (comparatively) slowest on, because the construction time of
`CXGate` is higher than most gates, and this dominates the execution
time for the Rust-based parser.

* Work around docs failure on Sphinx 5.3, Python 3.9

The version of Sphinx that we're constrained to use in the docs build
can't handle the `Unpack` operator, so as a temporary measure we can
just relax the type hint a little.

* Remove unused import

* Tweak documentation

* More specific PyO3 usage

* Use PathBuf directly for paths

* Format

* Freeze dataclass

* Use type-safe id types

This should have no impact on runtime or on memory usage, since each of
the new types has the same bit width and alignment as the `usize` values
they replace.

* Documentation tweaks

* Fix comments in lexer

* Fix lexing version number with separating comments

* Add test of pathological formatting

* Fixup release note

* Fix handling of u0 gate

* Credit reviewers

Co-authored-by: Luciano Bello <[email protected]>
Co-authored-by: Kevin Hartman <[email protected]>
Co-authored-by: Eric Arellano <[email protected]>

* Add test of invalid gate-body statements

* Refactor custom built-in gate definitions

The previous system was quite confusing, and required all accesses to
the global symbol table to know that the `Gate` symbol could be present
but overridable.  This led to confusing logic, various bugs and
unnecessary constraints, such as it previously being (erroneously)
possible to provide re-definitions for any "built-in" gate.

Instead, we keep a separate store of instructions that may be redefined.
This allows the logic to be centralised to only to the place responsible
for performing those overrides, and remains accessible for error-message
builders to query in order to provide better diagnostics.

* Credit Sasha

Co-authored-by: Alexander Ivrii <[email protected]>

* Credit Matthew

Co-authored-by: Matthew Treinish <[email protected]>

* Remove dependency on `lazy_static`

For a hashset of only 6 elements that is only checked once, there's not
really any point to pull in an extra dependency or use a hash set at
all.

* Update PyO3 version

---------

Co-authored-by: Luciano Bello <[email protected]>
Co-authored-by: Kevin Hartman <[email protected]>
Co-authored-by: Eric Arellano <[email protected]>
Co-authored-by: Alexander Ivrii <[email protected]>
Co-authored-by: Matthew Treinish <[email protected]>
  • Loading branch information
6 people authored and giacomoRanieri committed Apr 16, 2023
1 parent 8ca3ebd commit 1a6d26e
Show file tree
Hide file tree
Showing 24 changed files with 8,352 additions and 11 deletions.
12 changes: 10 additions & 2 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

16 changes: 16 additions & 0 deletions crates/qasm2/Cargo.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
[package]
name = "qiskit-qasm2"
# The following options can be inherited with (e.g.) `version.workspace = true` once we hit Rust
# 1.64. Until then, keep in sync with the root `Cargo.toml`.
version = "0.24.0"
edition = "2021"
rust-version = "1.61"
license = "Apache-2.0"

[lib]
name = "qiskit_qasm2"
crate-type = ["cdylib"]

[dependencies]
hashbrown = "0.13.2"
pyo3 = { version = "0.18.2", features = ["extension-module"] }
6 changes: 6 additions & 0 deletions crates/qasm2/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# `qiskit._qasm2`

This crate is the bulk of the OpenQASM 2 parser. Since OpenQASM 2 is a simple language, it doesn't
bother with an AST construction step, but produces a simple linear bytecode stream to pass to a
small Python interpreter (in `qiskit.qasm2`). This started off life as a vendored version of [the
package `qiskit-qasm2`](https://pypi.org/project/qiskit-qasm2).
Loading

0 comments on commit 1a6d26e

Please sign in to comment.