Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lazy pulse qobj loading for large backend #8885

Merged

Conversation

nkanazawa1989
Copy link
Contributor

Summary

Loading backend PulseDefaults with 100+ qubit is significantly slow because it immediately converts all pulse qobjs into Schedules when it is called for the first time. This induces significant performance regression in transpile because all preset passes require the default object, i.e. InstructionScheduleMap, to invoke pulse gate pass for user calibration.

This PR allows us to convert it as-needed basis for speed up.

Fix #7914

Blocked by #8839

Details and comments

This PR allows InstructionScheduleMap to take unconverted pulse qobj, and invoke the conversion logic when the InstructionScheduleMap.get method is called for a particular calibration.

This is mainly done by the commit 64a3712

image

According to the profile, I also removed the use of deep copy (this is added because Qobj data is a nested dict and to avoid mutating it) in the commit 0e1b5f2 (roughly 3x speedup)

image

This is output of airspeed velocity test:

Benchmarks that have improved:

       before           after         ratio
     [e9736b41]       [0e1b5f2e]
     <fix/pulse_qobj_converter>       <upgrade/lazy_instmap_qobj_conversion>
-        774±10ms       21.0±0.5ms     0.03  pulse.load_pulse_defaults.PulseDefaultsBench.time_building_defaults(1000)
-        88.1±2ms      2.09±0.08ms     0.02  pulse.load_pulse_defaults.PulseDefaultsBench.time_building_defaults(100)
-      14.8±0.6ms          292±6μs     0.02  pulse.load_pulse_defaults.PulseDefaultsBench.time_building_defaults(10)
-      6.66±0.3ms          112±3μs     0.02  pulse.load_pulse_defaults.PulseDefaultsBench.time_building_defaults(0)

Benchmarks that have stayed the same:

       before           after         ratio
     [e9736b41]       [0e1b5f2e]
     <fix/pulse_qobj_converter>       <upgrade/lazy_instmap_qobj_conversion>
         17.6±1ms       18.4±0.6ms     1.05  pulse.load_pulse_defaults.CircuitSchedulingBench.time_scheduling_circuits(3)
         74.8±2ms         73.0±3ms     0.98  pulse.load_pulse_defaults.CircuitSchedulingBench.time_scheduling_circuits(15)

Benchmarks that have got worse:

       before           after         ratio
     [e9736b41]       [0e1b5f2e]
     <fix/pulse_qobj_converter>       <upgrade/lazy_instmap_qobj_conversion>
+      6.74±0.3ms       9.98±0.3ms     1.48  pulse.load_pulse_defaults.CircuitSchedulingBench.time_scheduling_circuits(1)
+      12.6±0.7ms         16.0±2ms     1.27  pulse.load_pulse_defaults.CircuitSchedulingBench.time_scheduling_circuits(2)

As you can see we achieved roughly x50 speedup with these two commits. Note that scheduling test suffers the performance regression as a side effect of the lazy conversion (because it internally .get schedule). I guess the same problem will occur in the V2 converter #8759, but the converter should be able to avoid the performance regression by implementing the similar lazy conversion mechanism. Regarding circuit scheduler, in principle, we should improve the mechanism to build Schedule, as proposed in #8029.

@nkanazawa1989 nkanazawa1989 added the on hold Can not fix yet label Oct 12, 2022
@qiskit-bot
Copy link
Collaborator

Thank you for opening a new pull request.

Before your PR can be merged it will first need to pass continuous integration tests and be reviewed. Sometimes the review process can be slow, so please be patient.

While you're waiting, please feel free to review other open PRs. While only a subset of people are authorized to approve pull requests for merging, everyone is encouraged to review open pull requests. Doing reviews helps reduce the burden on the core team and helps make the project's code better for everyone.

One or more of the the following people are requested to review this:

@taalexander
Copy link
Contributor

While I cannot comment on the code right now. I can speak that as a user this is hugely appreciated 🎉

@mtreinish
Copy link
Member

I guess the same problem will occur in the V2 converter #8759, but the converter should be able to avoid the performance regression by implementing the similar lazy conversion mechanism.

All the IBM provider (except for the one with the q) backends will have this same problem too since they all pull in the pulse schedules from defaults for InstructionProperties during the Target generation

@coveralls
Copy link

coveralls commented Oct 12, 2022

Pull Request Test Coverage Report for Build 3908064617

  • 212 of 228 (92.98%) changed or added relevant lines in 10 files are covered.
  • 4 unchanged lines in 2 files lost coverage.
  • Overall coverage decreased (-0.02%) to 84.818%

Changes Missing Coverage Covered Lines Changed/Added Lines %
qiskit/qobj/pulse_qobj.py 17 18 94.44%
qiskit/pulse/calibration_entries.py 113 128 88.28%
Files with Coverage Reduction New Missed Lines %
qiskit/circuit/parameterexpression.py 1 88.21%
src/vf2_layout.rs 3 94.74%
Totals Coverage Status
Change from base Build 3907627832: -0.02%
Covered Lines: 65562
Relevant Lines: 77297

💛 - Coveralls

@nkanazawa1989
Copy link
Contributor Author

All the IBM provider (except for the one with the q) backends will have this same problem too since they all pull in the pulse schedules from defaults for InstructionProperties during the Target generation

Do they have own mechanism for Target construction or just using some code in terra? If they rely on terra code I can update InstructionProperties in follow-up/in this PR.

@mtreinish
Copy link
Member

It's a copy and paste of the same basic code that's in the BackendV2Converter, which was all based on a rough draft that I wrote about ~8 months ago for taking the IBM JSON response payloads and creating a Target from the dictionaries :

@nkanazawa1989
Copy link
Contributor Author

Added new commit 2690308 to support lazy conversion for V2. This commit allows InstructionProperties to take PulseQobjDef instance in addition to Schedule and ScheduleBlock. Then the conversion logic is invoked when InstructionProperties.calibration is called.

Copy link
Member

@mtreinish mtreinish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a big fan of this, it'll fix most of the overhead for circuits that aren't using pulse and avoid the need to parse the entire payload which can be super expensive. Thanks for doing this! Looking through the code as a first pass nothing major stands out to me, but I'll do a more through pass after #8839 merges and this is rebased.

The one thing I think will be good to have validation on is compilation across a multiprocessing boundary works as expected. I don't see anything super concerning on that front, but since we're moving towards lazy loading the defaults having tests that will do that across a process boundary seems like a good thing to add. We have this class:

https://github.com/Qiskit/qiskit-terra/blob/main/test/python/compiler/test_transpiler.py#L1817

If you adds some tests which require using the pulse calibrations from the defaults file for something in transpile() with multiple circuits along with PassManager.run() with multiple circuits too they have different parallel dispatch methods (although the current transpile() one will go away in 0.25.0 and just use PassManager.run() internally).

qiskit/providers/models/pulsedefaults.py Outdated Show resolved Hide resolved
@nkanazawa1989 nkanazawa1989 added mod: pulse Related to the Pulse module and removed on hold Can not fix yet labels Jan 7, 2023
@nkanazawa1989 nkanazawa1989 force-pushed the upgrade/lazy_instmap_qobj_conversion branch from a46ba04 to bd3b0ec Compare January 7, 2023 06:50
@nkanazawa1989
Copy link
Contributor Author

Thanks Matthew. I rebased the branch and added a new test file for calibration entry classes and updated the transpile test with the lazy loading test case. Because we don't have any preset pass that uses the backend calibration (we have RZX calibration builder but this is not in the preset passes), I wrote a fake pass for testing.

@mtreinish mtreinish added this to the 0.23.0 milestone Jan 9, 2023
Copy link
Member

@mtreinish mtreinish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this LGTM, I like the new structure and the definition of the interface for a calibration entry in a instruction schedule map is a good addition. Thanks for adding the additional test coverage with the parallel context, I didn't expect any issue with it, but think it's good to have.

I had a question inline and a spot where it looks like an unrelated change (not blocking). The only thing missing is maybe some release notes. We're changing a bit of the InstructionScheduleMap interface particularly around types supported. You might also want a feature note for the lazy loading, because the behavior will be a bit different (although equivalent) and provide a nice speedup which would good to talk about in the release documentation (also maybe talking about the new calibration entry interface, although that might not really be intended to be public, so we can skip it).

qiskit/providers/fake_provider/utils/backend_converter.py Outdated Show resolved Hide resolved
Comment on lines +286 to +293

del self._map[instruction][qubits]
if not self._map[instruction]:
self._map.pop(instruction)
del self._map[instruction]

self._qubit_instructions[qubits].remove(instruction)
if not self._qubit_instructions[qubits]:
self._qubit_instructions.pop(qubits)
del self._qubit_instructions[qubits]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a bit of an unrelated change, it's better to keep the diff limited to the logical change of the PR. The logic looks the same to me so it's not anything worth blocking over.

Copy link
Contributor Author

@nkanazawa1989 nkanazawa1989 Jan 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah strictly speaking this is unrelated to the main purpose but I realized this logic is slower according to this stackoverflow. So I made this change to improve overall performance of inst map.

@nkanazawa1989 nkanazawa1989 force-pushed the upgrade/lazy_instmap_qobj_conversion branch from 41217fd to 880a252 Compare January 12, 2023 03:05
mtreinish
mtreinish previously approved these changes Jan 12, 2023
Copy link
Member

@mtreinish mtreinish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the fast update.

@mtreinish
Copy link
Member

Oops before we tag as atuomerge did you want to add a release note?

@nkanazawa1989
Copy link
Contributor Author

Thanks for catching the missing release note. I had forgotten to write the note. Now it's added.

@mtreinish mtreinish added automerge Changelog: New Feature Include in the "Added" section of the changelog labels Jan 12, 2023
@mergify mergify bot merged commit ea984fd into Qiskit:main Jan 13, 2023
ElePT pushed a commit to ElePT/qiskit-ibm-provider that referenced this pull request Oct 4, 2023
* Update instmap to support lazy qobj conversion

* Avoid deepcopy and mutating the source dict

* Support V2 lazy load

* Update Gate.from_dict

* Move CalibrationEntry and CalibrationPublisher to dedicated file for future deprecation of instruction schedule map.

* revert typehint change

* add test for cal entries

* add test for parallel transpile and fix converter

* add API for lazy get

* add release note
ElePT pushed a commit to ElePT/qiskit-ibm-runtime that referenced this pull request Oct 10, 2023
* Update instmap to support lazy qobj conversion

* Avoid deepcopy and mutating the source dict

* Support V2 lazy load

* Update Gate.from_dict

* Move CalibrationEntry and CalibrationPublisher to dedicated file for future deprecation of instruction schedule map.

* revert typehint change

* add test for cal entries

* add test for parallel transpile and fix converter

* add API for lazy get

* add release note
ElePT pushed a commit to ElePT/qiskit that referenced this pull request Oct 12, 2023
* Update instmap to support lazy qobj conversion

* Avoid deepcopy and mutating the source dict

* Support V2 lazy load

* Update Gate.from_dict

* Move CalibrationEntry and CalibrationPublisher to dedicated file for future deprecation of instruction schedule map.

* revert typehint change

* add test for cal entries

* add test for parallel transpile and fix converter

* add API for lazy get

* add release note
ElePT pushed a commit to ElePT/qiskit-ibm-runtime that referenced this pull request Dec 8, 2023
* Update instmap to support lazy qobj conversion

* Avoid deepcopy and mutating the source dict

* Support V2 lazy load

* Update Gate.from_dict

* Move CalibrationEntry and CalibrationPublisher to dedicated file for future deprecation of instruction schedule map.

* revert typehint change

* add test for cal entries

* add test for parallel transpile and fix converter

* add API for lazy get

* add release note
@nkanazawa1989 nkanazawa1989 deleted the upgrade/lazy_instmap_qobj_conversion branch December 11, 2023 16:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Changelog: New Feature Include in the "Added" section of the changelog mod: pulse Related to the Pulse module priority: high
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Some test modules (at least one) take a long time to import
5 participants