-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Big overhead in parallel transpile for large target systems #7741
Comments
I did a quick profile of a modified version of your script, just running Just thinking out loud we might need to look if we can use shared memory somehow between the processes to avoid the serialization overhead because this problem is only going to get worse as the size of devices increases. Maybe we can leverage https://docs.python.org/3/library/multiprocessing.shared_memory.html#multiprocessing.shared_memory.SharedMemory although it's new enough that I haven't tried it before (and only available starting in python 3.8). |
This commit uses the Python shared memory library to reduce the overhead of launching parallel processes as part of transpile. As the size of the backends grow the payload size we're serializing and copying between worker processes is also increasing. When we're running a lot of small circuits at once on a big backend we can easily spend far more time dealing with IO overhead than running the transpilation. By using shared memory this reduces the overhead to only serializing and copying it once and then each worker process just needs to serializing it. While this doesn't remove all the overhead it should reduce the impact somewhat. Fixes Qiskit#7741
While I was investigating adding the same approach from #7789 to the parallel dispatch in import time
from qiskit.transpiler.preset_passmanagers import generate_preset_pass_manager
from qiskit.providers.fake_provider import FakeWashington
from qiskit.circuit.quantumcircuit import QuantumCircuit
from mthree.circuits import balanced_cal_strings, balanced_cal_circuits
backend = FakeWashington()
b_strs = balanced_cal_strings(127)
cal_circs = balanced_cal_circuits(b_strs, list(range(127)), 127)
# generate level 0 pass manager for backend
pm = generate_preset_pass_manager(0, backend)
start = time.perf_counter()
pm.run(cal_circs)
stop = time.perf_counter()
print(stop - start) (this will only work on that is running in parallel and taking ~3-4 seconds on python 3.7 (and ~1.6 seconds on python 3.10) for me. This is making me think to close this issue we might need to revisit the list inputs for arguments on |
Ahh this is interesting and seems to be the correct direction going forward. |
This is only partially addressed by the recently merged pr the full performance isn't there yet |
Right now the support for the argument broadcasting with list inputs for various transpiler options on the transpile() function causes a significant performance overhead to support, primarily do to how we have to handle the multiple arguments across a parallel dispatch boundary. It also significantly increases the code complexity of the function to support more than one input for each argument (except circuits). The utility of doing this type of argument handling is quite limited since a similar result can be achieved with a for loop and would like be simpler for users to reason about. When weighing all these factors the best path forward is to just remove this functionality. This commit starts the process of removing this feature by marking it as deprecated. Once the deprecation cycle is complete we can greatly simplify the code in transpile and primarily replace it with a call to generate_preset_pass_manager() and passmanager.run() (the only thing I think we'll have to handle out of band is faulty qubits defined in a BackendProperties for BackendV1). Related to Qiskit#7741
* Deprecate lists for argument input on transpile() Right now the support for the argument broadcasting with list inputs for various transpiler options on the transpile() function causes a significant performance overhead to support, primarily do to how we have to handle the multiple arguments across a parallel dispatch boundary. It also significantly increases the code complexity of the function to support more than one input for each argument (except circuits). The utility of doing this type of argument handling is quite limited since a similar result can be achieved with a for loop and would like be simpler for users to reason about. When weighing all these factors the best path forward is to just remove this functionality. This commit starts the process of removing this feature by marking it as deprecated. Once the deprecation cycle is complete we can greatly simplify the code in transpile and primarily replace it with a call to generate_preset_pass_manager() and passmanager.run() (the only thing I think we'll have to handle out of band is faulty qubits defined in a BackendProperties for BackendV1). Related to #7741 * Update qiskit/compiler/transpiler.py Co-authored-by: Kevin Hartman <[email protected]> * Update qiskit/compiler/transpiler.py Co-authored-by: Kevin Hartman <[email protected]> * Fix release note typo Co-authored-by: Kevin Hartman <[email protected]> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
* Deprecate lists for argument input on transpile() Right now the support for the argument broadcasting with list inputs for various transpiler options on the transpile() function causes a significant performance overhead to support, primarily do to how we have to handle the multiple arguments across a parallel dispatch boundary. It also significantly increases the code complexity of the function to support more than one input for each argument (except circuits). The utility of doing this type of argument handling is quite limited since a similar result can be achieved with a for loop and would like be simpler for users to reason about. When weighing all these factors the best path forward is to just remove this functionality. This commit starts the process of removing this feature by marking it as deprecated. Once the deprecation cycle is complete we can greatly simplify the code in transpile and primarily replace it with a call to generate_preset_pass_manager() and passmanager.run() (the only thing I think we'll have to handle out of band is faulty qubits defined in a BackendProperties for BackendV1). Related to Qiskit#7741 * Update qiskit/compiler/transpiler.py Co-authored-by: Kevin Hartman <[email protected]> * Update qiskit/compiler/transpiler.py Co-authored-by: Kevin Hartman <[email protected]> * Fix release note typo Co-authored-by: Kevin Hartman <[email protected]> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Just to write out the current plan here for completeness: After 0.24.0 when the |
This commit updates the transpile() function to no longer support broadcast of lists of arguments. This functionality was deprecated in the 0.23.0 release. As part of this removal the internals of the transpile() function are simplified so we don't need to handle broadcasting, building preset pass managers, parallel dispatch, etc anymore as this functionality (without broadcasting) already exists through the transpiler API. Besides greatly simplifying the transpile() code and using more aspects of the public APIs that exist in the qiskit.transpiler module, this commit also should fix the overhead we have around parallel execution due to the complexity of supporting broadcasting. This overhead was partially addressed before in Qiskit#7789 which leveraged shared memory to minimize the serialization time necessary for IPC but by using `PassManager.run()` internally now all of that overhead is removed as the initial fork will have all the necessary context in each process from the start. Three seemingly unrelated changes made here were necessary to support our current transpile() API without building custom pass manager construction. The first is the handling of layout from intlist. The current Layout class is dependent on a circuit because it maps Qubit objects to a physical qubit index. Ideally the layout structure would just map virtual indices to physical indices (see Qiskit#8060 for a similar issue, also it's worth noting this is how the internal NLayout and QPY represent layout), but because of the existing API the construction of a Layout is dependent on a circuit. For the initial_layout argument when running with multiple circuits to avoid the need to broadcasting the layout construction for supported input types that need the circuit to lookup the Qubit objects the SetLayout pass now supports taking in an int list and will construct a Layout object at run time. This effectively defers the Layout object creation for initial_layout to run time so it can be built as a function of the circuit as the API demands. The second is the FakeBackend class used in some tests was constructing invalid backends in some cases. This wasn't caught in the previous structure because the backends were not actually being parsed by transpile() previously which masked this issue. This commit fixes that issue because PassManagerConfig.from_backend() was failing because of the invalid backend construction. The third issue is a new _skip_target private argument to generate_preset_pass_manager() and PassManagerConfig. This was necessary to recreate the behavior of transpile() when a user provides a BackendV2 and either `basis_gates` or `coupling_map` arguments. In general the internals of the transpiler treat a target as higher priority because it has more complete and restrictive constraints than the basis_gates/coupling map objects. However, for transpile() if a backendv2 is passed in for backend paired with coupling_map and/or basis_gates the expected workflow is that the basis_gates and coupling_map arguments take priority and override the equivalent attributes from the backend. To facilitate this we need to block pulling the target from the backend This should only be needed for a short period of time as when Qiskit#9256 is implemented we'll just build a single target from the arguments as needed. Fixes Qiskit#7741
* Remove list argument broadcasting and simplify transpile() This commit updates the transpile() function to no longer support broadcast of lists of arguments. This functionality was deprecated in the 0.23.0 release. As part of this removal the internals of the transpile() function are simplified so we don't need to handle broadcasting, building preset pass managers, parallel dispatch, etc anymore as this functionality (without broadcasting) already exists through the transpiler API. Besides greatly simplifying the transpile() code and using more aspects of the public APIs that exist in the qiskit.transpiler module, this commit also should fix the overhead we have around parallel execution due to the complexity of supporting broadcasting. This overhead was partially addressed before in #7789 which leveraged shared memory to minimize the serialization time necessary for IPC but by using `PassManager.run()` internally now all of that overhead is removed as the initial fork will have all the necessary context in each process from the start. Three seemingly unrelated changes made here were necessary to support our current transpile() API without building custom pass manager construction. The first is the handling of layout from intlist. The current Layout class is dependent on a circuit because it maps Qubit objects to a physical qubit index. Ideally the layout structure would just map virtual indices to physical indices (see #8060 for a similar issue, also it's worth noting this is how the internal NLayout and QPY represent layout), but because of the existing API the construction of a Layout is dependent on a circuit. For the initial_layout argument when running with multiple circuits to avoid the need to broadcasting the layout construction for supported input types that need the circuit to lookup the Qubit objects the SetLayout pass now supports taking in an int list and will construct a Layout object at run time. This effectively defers the Layout object creation for initial_layout to run time so it can be built as a function of the circuit as the API demands. The second is the FakeBackend class used in some tests was constructing invalid backends in some cases. This wasn't caught in the previous structure because the backends were not actually being parsed by transpile() previously which masked this issue. This commit fixes that issue because PassManagerConfig.from_backend() was failing because of the invalid backend construction. The third issue is a new _skip_target private argument to generate_preset_pass_manager() and PassManagerConfig. This was necessary to recreate the behavior of transpile() when a user provides a BackendV2 and either `basis_gates` or `coupling_map` arguments. In general the internals of the transpiler treat a target as higher priority because it has more complete and restrictive constraints than the basis_gates/coupling map objects. However, for transpile() if a backendv2 is passed in for backend paired with coupling_map and/or basis_gates the expected workflow is that the basis_gates and coupling_map arguments take priority and override the equivalent attributes from the backend. To facilitate this we need to block pulling the target from the backend This should only be needed for a short period of time as when #9256 is implemented we'll just build a single target from the arguments as needed. Fixes #7741 * Fix _skip_target logic * Fix InstructionScheduleMap handling with backendv2 * Fix test failure caused by exception being raised later * Fix indentation error * Update qiskit/providers/fake_provider/fake_backend.py Co-authored-by: John Lapeyre <[email protected]> * Fix standalone dt argument handling * Remove unused code * Fix lint * Remove duplicate import in set_layout.py A duplicate import slipped through in the most recent rebase. This commit fixes that oversight and removes the duplicate. * Update release notes Co-authored-by: Jake Lishman <[email protected]> * Adjust logic for _skip_transpile to check if None * Simplify check cmap code * Only check backend if it exists --------- Co-authored-by: John Lapeyre <[email protected]> Co-authored-by: Jake Lishman <[email protected]>
…skit#10291) * Remove list argument broadcasting and simplify transpile() This commit updates the transpile() function to no longer support broadcast of lists of arguments. This functionality was deprecated in the 0.23.0 release. As part of this removal the internals of the transpile() function are simplified so we don't need to handle broadcasting, building preset pass managers, parallel dispatch, etc anymore as this functionality (without broadcasting) already exists through the transpiler API. Besides greatly simplifying the transpile() code and using more aspects of the public APIs that exist in the qiskit.transpiler module, this commit also should fix the overhead we have around parallel execution due to the complexity of supporting broadcasting. This overhead was partially addressed before in Qiskit/qiskit#7789 which leveraged shared memory to minimize the serialization time necessary for IPC but by using `PassManager.run()` internally now all of that overhead is removed as the initial fork will have all the necessary context in each process from the start. Three seemingly unrelated changes made here were necessary to support our current transpile() API without building custom pass manager construction. The first is the handling of layout from intlist. The current Layout class is dependent on a circuit because it maps Qubit objects to a physical qubit index. Ideally the layout structure would just map virtual indices to physical indices (see Qiskit/qiskit#8060 for a similar issue, also it's worth noting this is how the internal NLayout and QPY represent layout), but because of the existing API the construction of a Layout is dependent on a circuit. For the initial_layout argument when running with multiple circuits to avoid the need to broadcasting the layout construction for supported input types that need the circuit to lookup the Qubit objects the SetLayout pass now supports taking in an int list and will construct a Layout object at run time. This effectively defers the Layout object creation for initial_layout to run time so it can be built as a function of the circuit as the API demands. The second is the FakeBackend class used in some tests was constructing invalid backends in some cases. This wasn't caught in the previous structure because the backends were not actually being parsed by transpile() previously which masked this issue. This commit fixes that issue because PassManagerConfig.from_backend() was failing because of the invalid backend construction. The third issue is a new _skip_target private argument to generate_preset_pass_manager() and PassManagerConfig. This was necessary to recreate the behavior of transpile() when a user provides a BackendV2 and either `basis_gates` or `coupling_map` arguments. In general the internals of the transpiler treat a target as higher priority because it has more complete and restrictive constraints than the basis_gates/coupling map objects. However, for transpile() if a backendv2 is passed in for backend paired with coupling_map and/or basis_gates the expected workflow is that the basis_gates and coupling_map arguments take priority and override the equivalent attributes from the backend. To facilitate this we need to block pulling the target from the backend This should only be needed for a short period of time as when Qiskit/qiskit#9256 is implemented we'll just build a single target from the arguments as needed. Fixes Qiskit/qiskit#7741 * Fix _skip_target logic * Fix InstructionScheduleMap handling with backendv2 * Fix test failure caused by exception being raised later * Fix indentation error * Update qiskit/providers/fake_provider/fake_backend.py Co-authored-by: John Lapeyre <[email protected]> * Fix standalone dt argument handling * Remove unused code * Fix lint * Remove duplicate import in set_layout.py A duplicate import slipped through in the most recent rebase. This commit fixes that oversight and removes the duplicate. * Update release notes Co-authored-by: Jake Lishman <[email protected]> * Adjust logic for _skip_transpile to check if None * Simplify check cmap code * Only check backend if it exists --------- Co-authored-by: John Lapeyre <[email protected]> Co-authored-by: Jake Lishman <[email protected]>
…skit#10291) * Remove list argument broadcasting and simplify transpile() This commit updates the transpile() function to no longer support broadcast of lists of arguments. This functionality was deprecated in the 0.23.0 release. As part of this removal the internals of the transpile() function are simplified so we don't need to handle broadcasting, building preset pass managers, parallel dispatch, etc anymore as this functionality (without broadcasting) already exists through the transpiler API. Besides greatly simplifying the transpile() code and using more aspects of the public APIs that exist in the qiskit.transpiler module, this commit also should fix the overhead we have around parallel execution due to the complexity of supporting broadcasting. This overhead was partially addressed before in Qiskit/qiskit#7789 which leveraged shared memory to minimize the serialization time necessary for IPC but by using `PassManager.run()` internally now all of that overhead is removed as the initial fork will have all the necessary context in each process from the start. Three seemingly unrelated changes made here were necessary to support our current transpile() API without building custom pass manager construction. The first is the handling of layout from intlist. The current Layout class is dependent on a circuit because it maps Qubit objects to a physical qubit index. Ideally the layout structure would just map virtual indices to physical indices (see Qiskit/qiskit#8060 for a similar issue, also it's worth noting this is how the internal NLayout and QPY represent layout), but because of the existing API the construction of a Layout is dependent on a circuit. For the initial_layout argument when running with multiple circuits to avoid the need to broadcasting the layout construction for supported input types that need the circuit to lookup the Qubit objects the SetLayout pass now supports taking in an int list and will construct a Layout object at run time. This effectively defers the Layout object creation for initial_layout to run time so it can be built as a function of the circuit as the API demands. The second is the FakeBackend class used in some tests was constructing invalid backends in some cases. This wasn't caught in the previous structure because the backends were not actually being parsed by transpile() previously which masked this issue. This commit fixes that issue because PassManagerConfig.from_backend() was failing because of the invalid backend construction. The third issue is a new _skip_target private argument to generate_preset_pass_manager() and PassManagerConfig. This was necessary to recreate the behavior of transpile() when a user provides a BackendV2 and either `basis_gates` or `coupling_map` arguments. In general the internals of the transpiler treat a target as higher priority because it has more complete and restrictive constraints than the basis_gates/coupling map objects. However, for transpile() if a backendv2 is passed in for backend paired with coupling_map and/or basis_gates the expected workflow is that the basis_gates and coupling_map arguments take priority and override the equivalent attributes from the backend. To facilitate this we need to block pulling the target from the backend This should only be needed for a short period of time as when Qiskit/qiskit#9256 is implemented we'll just build a single target from the arguments as needed. Fixes Qiskit/qiskit#7741 * Fix _skip_target logic * Fix InstructionScheduleMap handling with backendv2 * Fix test failure caused by exception being raised later * Fix indentation error * Update qiskit/providers/fake_provider/fake_backend.py Co-authored-by: John Lapeyre <[email protected]> * Fix standalone dt argument handling * Remove unused code * Fix lint * Remove duplicate import in set_layout.py A duplicate import slipped through in the most recent rebase. This commit fixes that oversight and removes the duplicate. * Update release notes Co-authored-by: Jake Lishman <[email protected]> * Adjust logic for _skip_transpile to check if None * Simplify check cmap code * Only check backend if it exists --------- Co-authored-by: John Lapeyre <[email protected]> Co-authored-by: Jake Lishman <[email protected]>
) * Remove list argument broadcasting and simplify transpile() This commit updates the transpile() function to no longer support broadcast of lists of arguments. This functionality was deprecated in the 0.23.0 release. As part of this removal the internals of the transpile() function are simplified so we don't need to handle broadcasting, building preset pass managers, parallel dispatch, etc anymore as this functionality (without broadcasting) already exists through the transpiler API. Besides greatly simplifying the transpile() code and using more aspects of the public APIs that exist in the qiskit.transpiler module, this commit also should fix the overhead we have around parallel execution due to the complexity of supporting broadcasting. This overhead was partially addressed before in Qiskit#7789 which leveraged shared memory to minimize the serialization time necessary for IPC but by using `PassManager.run()` internally now all of that overhead is removed as the initial fork will have all the necessary context in each process from the start. Three seemingly unrelated changes made here were necessary to support our current transpile() API without building custom pass manager construction. The first is the handling of layout from intlist. The current Layout class is dependent on a circuit because it maps Qubit objects to a physical qubit index. Ideally the layout structure would just map virtual indices to physical indices (see Qiskit#8060 for a similar issue, also it's worth noting this is how the internal NLayout and QPY represent layout), but because of the existing API the construction of a Layout is dependent on a circuit. For the initial_layout argument when running with multiple circuits to avoid the need to broadcasting the layout construction for supported input types that need the circuit to lookup the Qubit objects the SetLayout pass now supports taking in an int list and will construct a Layout object at run time. This effectively defers the Layout object creation for initial_layout to run time so it can be built as a function of the circuit as the API demands. The second is the FakeBackend class used in some tests was constructing invalid backends in some cases. This wasn't caught in the previous structure because the backends were not actually being parsed by transpile() previously which masked this issue. This commit fixes that issue because PassManagerConfig.from_backend() was failing because of the invalid backend construction. The third issue is a new _skip_target private argument to generate_preset_pass_manager() and PassManagerConfig. This was necessary to recreate the behavior of transpile() when a user provides a BackendV2 and either `basis_gates` or `coupling_map` arguments. In general the internals of the transpiler treat a target as higher priority because it has more complete and restrictive constraints than the basis_gates/coupling map objects. However, for transpile() if a backendv2 is passed in for backend paired with coupling_map and/or basis_gates the expected workflow is that the basis_gates and coupling_map arguments take priority and override the equivalent attributes from the backend. To facilitate this we need to block pulling the target from the backend This should only be needed for a short period of time as when Qiskit#9256 is implemented we'll just build a single target from the arguments as needed. Fixes Qiskit#7741 * Fix _skip_target logic * Fix InstructionScheduleMap handling with backendv2 * Fix test failure caused by exception being raised later * Fix indentation error * Update qiskit/providers/fake_provider/fake_backend.py Co-authored-by: John Lapeyre <[email protected]> * Fix standalone dt argument handling * Remove unused code * Fix lint * Remove duplicate import in set_layout.py A duplicate import slipped through in the most recent rebase. This commit fixes that oversight and removes the duplicate. * Update release notes Co-authored-by: Jake Lishman <[email protected]> * Adjust logic for _skip_transpile to check if None * Simplify check cmap code * Only check backend if it exists --------- Co-authored-by: John Lapeyre <[email protected]> Co-authored-by: Jake Lishman <[email protected]>
…skit#10291) * Remove list argument broadcasting and simplify transpile() This commit updates the transpile() function to no longer support broadcast of lists of arguments. This functionality was deprecated in the 0.23.0 release. As part of this removal the internals of the transpile() function are simplified so we don't need to handle broadcasting, building preset pass managers, parallel dispatch, etc anymore as this functionality (without broadcasting) already exists through the transpiler API. Besides greatly simplifying the transpile() code and using more aspects of the public APIs that exist in the qiskit.transpiler module, this commit also should fix the overhead we have around parallel execution due to the complexity of supporting broadcasting. This overhead was partially addressed before in Qiskit/qiskit#7789 which leveraged shared memory to minimize the serialization time necessary for IPC but by using `PassManager.run()` internally now all of that overhead is removed as the initial fork will have all the necessary context in each process from the start. Three seemingly unrelated changes made here were necessary to support our current transpile() API without building custom pass manager construction. The first is the handling of layout from intlist. The current Layout class is dependent on a circuit because it maps Qubit objects to a physical qubit index. Ideally the layout structure would just map virtual indices to physical indices (see Qiskit/qiskit#8060 for a similar issue, also it's worth noting this is how the internal NLayout and QPY represent layout), but because of the existing API the construction of a Layout is dependent on a circuit. For the initial_layout argument when running with multiple circuits to avoid the need to broadcasting the layout construction for supported input types that need the circuit to lookup the Qubit objects the SetLayout pass now supports taking in an int list and will construct a Layout object at run time. This effectively defers the Layout object creation for initial_layout to run time so it can be built as a function of the circuit as the API demands. The second is the FakeBackend class used in some tests was constructing invalid backends in some cases. This wasn't caught in the previous structure because the backends were not actually being parsed by transpile() previously which masked this issue. This commit fixes that issue because PassManagerConfig.from_backend() was failing because of the invalid backend construction. The third issue is a new _skip_target private argument to generate_preset_pass_manager() and PassManagerConfig. This was necessary to recreate the behavior of transpile() when a user provides a BackendV2 and either `basis_gates` or `coupling_map` arguments. In general the internals of the transpiler treat a target as higher priority because it has more complete and restrictive constraints than the basis_gates/coupling map objects. However, for transpile() if a backendv2 is passed in for backend paired with coupling_map and/or basis_gates the expected workflow is that the basis_gates and coupling_map arguments take priority and override the equivalent attributes from the backend. To facilitate this we need to block pulling the target from the backend This should only be needed for a short period of time as when Qiskit/qiskit#9256 is implemented we'll just build a single target from the arguments as needed. Fixes Qiskit/qiskit#7741 * Fix _skip_target logic * Fix InstructionScheduleMap handling with backendv2 * Fix test failure caused by exception being raised later * Fix indentation error * Update qiskit/providers/fake_provider/fake_backend.py Co-authored-by: John Lapeyre <[email protected]> * Fix standalone dt argument handling * Remove unused code * Fix lint * Remove duplicate import in set_layout.py A duplicate import slipped through in the most recent rebase. This commit fixes that oversight and removes the duplicate. * Update release notes Co-authored-by: Jake Lishman <[email protected]> * Adjust logic for _skip_transpile to check if None * Simplify check cmap code * Only check backend if it exists --------- Co-authored-by: John Lapeyre <[email protected]> Co-authored-by: Jake Lishman <[email protected]>
Environment
0.19.2
What is happening?
There is a large overhead when doing parallel transpilation for circuits targeting larger systems. In the case below they are targeting the 127Q system, but the circuits themselves consist of only
x
gates and are already the correct width. Thus transpilation should just be a pass-through. However, in parallel mode (quad-core machine) it takes 330sec to transpile. If I turn off parallel transpile it takes 3sec.How can we reproduce the issue?
Run above
What should happen?
Parallel should not give 100x overhead.
Any suggestions?
I am guessing this is copying overhead from forking when the systems are large.
The text was updated successfully, but these errors were encountered: