Add demo "How to Quantum Just-In-Time Compile Grover's Algorithm with Catalyst" #1219

joeycarter · 2024-09-18T19:47:57Z

Title: How to Quantum Just-In-Time Compile Grover's Algorithm with Catalyst

Summary: This demo uses the existing Grover's Algorithm tutorial to describe how to just-in-time (JIT) compile a quantum circuit using Catalyst. It also includes runtime benchmarks to demonstrate the performance improvements that JIT compiling with Catalyst offers.

Relevant references: L. K. Grover (1996) "A fast quantum mechanical algorithm for database search"

Possible Drawbacks: None

Related GitHub Issues: None

[sc-72939]

If you are writing a demonstration, please answer these questions to facilitate the marketing process.

GOALS — Why are we working on this now?
- Promote Catalyst by demonstrating the performance improvements it offers by QJIT compiling a relatively simple quantum circuit.
AUDIENCE — Who is this for?
- Users of PennyLane looking to compile and optimize their circuits for better performance.
KEYWORDS — What words should be included in the marketing post?
- Grover's algorithm
- Catalyst
- QJIT
Which of the following types of documentation is most similar to your file?
(more details here)

Tutorial
Demo
How-to

github-actions · 2024-09-18T19:48:10Z

👋 Hey, looks like you've updated some demos!

🐘 Don't forget to update the dateOfLastModification in the associated metadata files so your changes are reflected in Glass Onion (search and recommendations).

Please hide this comment once the field(s) are updated. Thanks!

rauletorresc

I would use by default the lightning.qubit device in the baseline algorithm explanation, instead of doing it with the default.qubit device, for the same reason explained in the demo: Catalyst does not support the latter at the moment. I would also remove it from the benchmark and show only the results with lightning.qubit. Finally, I would write a small note stating non-supported devices of Catalyst. What do you think?

demonstrations/tutorial_qjit_compile_grovers_algorithm_with_catalyst.py

joeycarter · 2024-09-19T13:34:54Z

I would use by default the lightning.qubit device in the baseline algorithm explanation, instead of doing it with the default.qubit device, for the same reason explained in the demo: Catalyst does not support the latter at the moment. I would also remove it from the benchmark and show only the results with lightning.qubit. Finally, I would write a small note stating non-supported devices of Catalyst. What do you think?

My reasoning for using default.qubit as the baseline device was that the spirit of this demo was to take an existing PennyLane circuit, in this case the one from https://pennylane.ai/qml/demos/tutorial_grovers_algorithm/, and make it work with Catalyst. Switching the device from default.qubit to lightning.qubit is one of the required steps, and so including it in this demo makes it explicit what needs to be done for the circuit to work with Catalyst. My preference would be to leave it in.

There's an argument to be made to leave default.qubit out of the benchmarks, since it does consume the most CPU time to execute. I included it to really drive home each incremental performance improvement a user gets as they make the necessary modifications to their circuit. I'll go with popular opinion on this one whether to include or exclude it.

I would prefer not to list out the devices that Catalyst does not support, and just refer to the documentation. Suppose the list of supported devices changes over time, I think it would be better to not have to keep the demo in sync with that list.

rmoyard

Great demo, can you double check the build failure? After that we can check that it renders properly.

demonstrations/tutorial_qjit_compile_grovers_algorithm_with_catalyst.py

joeycarter · 2024-09-19T15:13:58Z

Great demo, can you double check the build failure? After that we can check that it renders properly.

Thanks @rmoyard! The build failure was due to the path to the preview image thumbnail not existing. I've put in the original Grover's algorithm preview image as a placeholder, just to check the build and rendering, which we can replace later on.

github-actions · 2024-09-19T15:33:02Z

Thank you for opening this pull request.

You can find the built site at this link.

Deployment Info:

Pull Request ID: 1219
Deployment SHA: 795f940483b86d805d8d2d016a00904d4869f7cc
(The Deployment SHA refers to the latest commit hash the docs were built from)

Note: It may take several minutes for updates to this pull request to be reflected on the deployed site.

rmoyard

Great first demo 💯

dime10

Nice work @joeycarter, that's a cool demo! :)

The benchmarking is done quite nicely, I like the comparison of the different stages (include the first load!).

demonstrations/tutorial_qjit_compile_grovers_algorithm_with_catalyst.py

josh146

Wow, really nicely written how-to guide @joeycarter! Super clear and very well written.

I've left some suggestions throughout.

demonstrations/tutorial_qjit_compile_grovers_algorithm_with_catalyst.metadata.json

demonstrations/tutorial_qjit_compile_grovers_algorithm_with_catalyst.py

demonstrations/tutorial_qjit_compile_grovers_algorithm_with_catalyst.metadata.json

joeycarter · 2024-09-25T16:38:53Z

I've refactored the timeit benchmarking; since I'm now running the compiled circuit earlier in the script to print out the results, we have to create a new qjit object of the circuit to get the runtime of the first call. I'm doing so with the setup input argument of timeit.

Interestingly, when I run this locally I no longer see a difference between the first-call runtime and subsequent-call runtimes. It's possible that when timeit was executing these commands in its own namespace, the compiled circuit ended up in some funky state that led to the small runtime overhead on the first call.

I'll see if I get the same result in the deployed version. If so, I'll remove the paragraph on the caching overhead, since this is likely the wrong interpretation of the larger runtime in the first call of the compiled circuit.

dime10 · 2024-09-25T17:31:44Z

Interestingly, when I run this locally I no longer see a difference between the first-call runtime and subsequent-call runtimes. It's possible that when timeit was executing these commands in its own namespace, the compiled circuit ended up in some funky state that led to the small runtime overhead on the first call.

If the first call overhead is primarily spent on loading shared libraries, then creating a new QJIT object may not recreate that state accurately. This is just a wild guess, but it could be that a lot of the shared libs (besides the program one, since that one should be new) that are needed to execute a catalyst program are still loaded in the Python process.

If you comment out the first call earlier in the demo, does it revert it to the old results?

joeycarter · 2024-09-25T18:02:46Z

If the first call overhead is primarily spent on loading shared libraries, then creating a new QJIT object may not recreate that state accurately. This is just a wild guess, but it could be that a lot of the shared libs (besides the program one, since that one should be new) that are needed to execute a catalyst program are still loaded in the Python process.

If you comment out the first call earlier in the demo, does it revert it to the old results?

Ah, yes if I comment out the first call then it reverts to the old results, with the first QJIT call taking significantly longer than subsequent calls. On my machine I get

Native (default.qubit) runtime: (13.31 +/- nan) s
Native (lightning.qubit) runtime: (5.359 +/- 0.023) s
QJIT compilation runtime: (0.4419 +/- nan) s
QJIT (first call) runtime: (0.007347 +/- nan) s
QJIT (subsequent calls) runtime: (0.001503 +/- 0.00021) s

So the overhead appears to be in calling for the first time any instance of a qjit-compiled circuit, rather than calling every new QJIT object for the first time. Loading shared libraries sounds like a plausible explanation, I'll run a profiler on this script to see if that reveals where the runtime hotspot is.

demonstrations/tutorial_qjit_compile_grovers_algorithm_with_catalyst.py

joeycarter · 2024-09-25T19:43:52Z

I profiled the first and second calls to the qjit-compiled circuit and in fact the hotspot is in the call to jnp.asarray() in CompiledFunction._exec().

If we trace down the function call stack from jnp.asarray, we get to a jax function dispatch.py:390(_device_put_sharding_impl). In the first call, this takes ~0.0086 s, and in the second only ~0.00005 s. Tracing down one step further in the stack, in the second (fast) call, the most time is spent in dispatch.py:331(_put_x), but in the first call, the most time is spent in pxla.py:1669(_get_default_device) (~0.008 s). Tracing down even further in the slow call, a lot of time is spent in XLA, e.g. in xla_bridge.py:737(_discover_and_register_pjrt_plugins) (~0.0047 s) and in xla_client.py:67(make_cpu_client) (~0.0024 s). These functions are not called in the second qjit-object call.

That's a lot of text so here's a visualization! First, the first (slow) call:

and the second (fast) call:

If I'm understanding all of this correctly, the overhead isn't in Catalyst per se, but in loading a bunch of JAX and XLA things the first time we call jnp.asarray().

In fact, if I throw in a jnp.asarray([0.]) before the first qjit-object call, I can get the first and subsequent calls to be roughly on par with one another! (The first call is still a bit slower, perhaps because of some smaller overheads elsewhere).

This is getting far into the nitty-gritty details of the performance of Catalyst and JAX, and well beyond the scope of this demo, I think. I propose we not show the difference between the first call and the subsequent calls in the benchmarks, since I think it's a reasonable assumption that any user who cares about that level of performance will already be using JAX arrays in their programs and not notice any difference in the first and second calls to an AOT-compiled circuit. @josh146, @dime10, how does that sound?

josh146 · 2024-09-25T20:12:20Z

@joeycarter yep that sounds good to me! On another note, we discovered that the decomposition pathway for lightning+noqjit for GroverOperator is dynamic, and uses a magic number to change strategy at <13 qubits:

https://github.com/PennyLaneAI/pennylane-lightning/blob/master/pennylane_lightning/lightning_qubit/lightning_qubit.py#L177-L178

This is likely not optimal, as these magic numbers are not always the best and can lead to slowdowns near the boundary (e.g., 10-12 qubits). I'd be curious to rerun the demo at 13 or 14 qubits (where lightning and catalyst should use the same decomposition strategy) to see what the outcome is?

dime10 · 2024-09-25T20:23:33Z

Thanks @joeycarter, nice digging! So the overhead is not related to catalyst, happy to leave it at that and not show 1st vs 2nd call 👍

joeycarter · 2024-09-25T20:41:16Z

Thanks @joeycarter, nice digging! So the overhead is not related to catalyst, happy to leave it at that and not show 1st vs 2nd call 👍

Thanks @dime10! Sounds good.

Co-authored-by: Ivana Kurečić <[email protected]>

joeycarter · 2024-11-06T19:31:50Z

Heads up, I've rebased this demo onto the master branch.

demonstrations/tutorial_qjit_compile_grovers_algorithm_with_catalyst.metadata.json

joeycarter requested a review from a team September 18, 2024 19:48

rmoyard self-requested a review September 18, 2024 19:52

rauletorresc approved these changes Sep 18, 2024

View reviewed changes

demonstrations/tutorial_qjit_compile_grovers_algorithm_with_catalyst.py Show resolved Hide resolved

rmoyard reviewed Sep 19, 2024

View reviewed changes

github-actions bot deployed to preview September 19, 2024 15:27 View deployment

rmoyard self-requested a review September 19, 2024 17:54

rmoyard approved these changes Sep 19, 2024

View reviewed changes

github-actions bot deployed to preview September 19, 2024 19:09 View deployment

github-actions bot deployed to preview September 19, 2024 19:32 View deployment

joeycarter requested a review from josh146 September 20, 2024 15:22

dime10 reviewed Sep 20, 2024

View reviewed changes

demonstrations/tutorial_qjit_compile_grovers_algorithm_with_catalyst.py Outdated Show resolved Hide resolved

josh146 requested changes Sep 23, 2024

View reviewed changes

github-actions bot deployed to preview September 23, 2024 15:23 View deployment

github-actions bot deployed to preview September 25, 2024 13:30 View deployment

github-actions bot deployed to preview September 25, 2024 16:41 View deployment

josh146 reviewed Sep 25, 2024

View reviewed changes

demonstrations/tutorial_qjit_compile_grovers_algorithm_with_catalyst.py Outdated Show resolved Hide resolved

josh146 reviewed Sep 25, 2024

View reviewed changes

demonstrations/tutorial_qjit_compile_grovers_algorithm_with_catalyst.py Outdated Show resolved Hide resolved

github-actions bot deployed to preview September 25, 2024 20:31 View deployment

github-actions bot deployed to preview September 25, 2024 20:49 View deployment

joeycarter and others added 15 commits November 6, 2024 14:29

Add link when referring to "previous tutorial"

691f304

Fix nested-list rendering

36a1644

adding new thumbnails

cb88d44

Update demo text to account for Lightning bug fix

3224eba

Apply suggestions from code review

0f26871

Co-authored-by: Ivana Kurečić <[email protected]>

Remove my author profile and photo

a628361

Remove unnecessary metadata

dc3bf7f

Apply suggestions from code review

5064ba1

Co-authored-by: Ivana Kurečić <[email protected]>

Add missing newline at end of file

c74d776

Reword explanation of native Python control flow usage

b9491f1

Move punctuation inside inline math blocks

dd19c4c

Format

02b9b06

Update publication date

82d02f7

Editorial change

fbcee73

Formatting

51b6c4a

joeycarter force-pushed the joeycarter/qjit-grovers-algo-with-catalyst branch from 96e156a to 51b6c4a Compare November 6, 2024 19:30

joeycarter changed the base branch from dev to master November 6, 2024 19:31

github-actions bot deployed to preview November 6, 2024 19:47 View deployment

Merge branch 'master' into joeycarter/qjit-grovers-algo-with-catalyst

f2a9fa5

ikurecic reviewed Nov 7, 2024

View reviewed changes

demonstrations/tutorial_qjit_compile_grovers_algorithm_with_catalyst.metadata.json Outdated Show resolved Hide resolved

Date

6308feb

ikurecic force-pushed the joeycarter/qjit-grovers-algo-with-catalyst branch from afe2e82 to 6308feb Compare November 7, 2024 10:51

github-actions bot deployed to preview November 7, 2024 11:02 View deployment

in-body image

c029a6c

github-actions bot deployed to preview November 7, 2024 15:36 View deployment

Merge branch 'master' into joeycarter/qjit-grovers-algo-with-catalyst

795f940

github-actions bot deployed to preview November 7, 2024 18:44 View deployment

joeycarter merged commit 22a06cd into master Nov 7, 2024
10 checks passed

joeycarter deleted the joeycarter/qjit-grovers-algo-with-catalyst branch November 7, 2024 19:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add demo "How to Quantum Just-In-Time Compile Grover's Algorithm with Catalyst" #1219

Add demo "How to Quantum Just-In-Time Compile Grover's Algorithm with Catalyst" #1219

joeycarter commented Sep 18, 2024

github-actions bot commented Sep 18, 2024

rauletorresc left a comment

joeycarter commented Sep 19, 2024

rmoyard left a comment

joeycarter commented Sep 19, 2024

github-actions bot commented Sep 19, 2024 •

edited

Loading

rmoyard left a comment

dime10 left a comment

josh146 left a comment

joeycarter commented Sep 25, 2024

dime10 commented Sep 25, 2024 •

edited

Loading

joeycarter commented Sep 25, 2024

joeycarter commented Sep 25, 2024

josh146 commented Sep 25, 2024

dime10 commented Sep 25, 2024

joeycarter commented Sep 25, 2024

joeycarter commented Nov 6, 2024

Add demo "How to Quantum Just-In-Time Compile Grover's Algorithm with Catalyst" #1219

Add demo "How to Quantum Just-In-Time Compile Grover's Algorithm with Catalyst" #1219

Conversation

joeycarter commented Sep 18, 2024

github-actions bot commented Sep 18, 2024

rauletorresc left a comment

Choose a reason for hiding this comment

joeycarter commented Sep 19, 2024

rmoyard left a comment

Choose a reason for hiding this comment

joeycarter commented Sep 19, 2024

github-actions bot commented Sep 19, 2024 • edited Loading

rmoyard left a comment

Choose a reason for hiding this comment

dime10 left a comment

Choose a reason for hiding this comment

josh146 left a comment

Choose a reason for hiding this comment

joeycarter commented Sep 25, 2024

dime10 commented Sep 25, 2024 • edited Loading

joeycarter commented Sep 25, 2024

joeycarter commented Sep 25, 2024

josh146 commented Sep 25, 2024

dime10 commented Sep 25, 2024

joeycarter commented Sep 25, 2024

joeycarter commented Nov 6, 2024

github-actions bot commented Sep 19, 2024 •

edited

Loading

dime10 commented Sep 25, 2024 •

edited

Loading