-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Quantum Memory Management for Cirq #6040
Comments
I'm interested in helping out on this project if possible. |
This project reminds me of #4100 and related PRs, which allow simulators to perform implicit qubit (de-)allocation when operating on definite qubits. It seems that the primary features being proposed here are:
If I understand correctly, none of this is meant to affect existing, definite-qubit circuits - only new circuits which make use of allocated qubits will be affected. Is this an accurate summary of the proposal? (Apologies if this is just restating what you've already said - writing it out mostly helped me make sense of it myself.) |
That is absolutely correct! The goal is to provide a framework for users to easily construct circuits composed of composite operations (aka gadgets) that can use additional ancillas "under the hood" (i.e. as part of their decompose protocol). |
I'm not quite clear, do the allocations occur at runtime, or is there a transformer by which you take a circuit definition that has ancilla-using ops, and transforms that into a fully-defined circuit that you can run? Knee-jerk reaction is that the latter would be preferable, since the former could lead to space errors halfway through simulation, whereas the latter you know exactly how many qubits you need before simulating. The former may be useful if we have nondeterministic circuits (in which case routing the ancillas in advance may be impossible), but we don't currently have that. Aside, it may be useful to have a PC memory requirement calculator for simulation, since we should be able to determine the maximum entanglement for a circuit prior to running it. One interesting use case would be the deferred measurements transformer. Would it be possible to rewrite this in terms of ancillas? Hmm, actually maybe not. The qubits added in that algorithm end up getting measured at the end, so aren't true ancillas, so never mind. |
I'm an idiot. I spent six months working on adding nondeterministic circuits to cirq.... In particular, Alternatively you could specify that a dynamic qubit allocation within a subcircuit has to be deallocated within that same subcircuit, avoiding the OOM potential. Then that would also then become deterministic (in terms of circuit layout), as each repetition could reuse the same ancilla, even if the number of repetitions is nondeterministic. So could be done at circuit building time rather than runtime. So yeah, I think focusing on the use case where the tooling transforms a circuit with ancillas to a fully-defined circuit is the more applicable for Cirq. Runtime allocation is more useful if you really want to move toward Turing-completeness with the classical circuit model, like the Q# approach. But right now Cirq doesn't have enough classical logic primitives to make that worthwhile. |
@daxfohl It's indeed the latter with the intention to implement different strategies that optimize for different properties (e.g. circuit width, depth ..) As for nondeterministic circuits. if these circuits/subcircuits gaurentee that the qubits will be clean (returned to their initial state) at the end then they can be created using Correction: the type that will always be gauranteed to be clean is |
Potential use case: part of the reason we have ...then again, that would be a pretty big change for a relatively minor improvement. Certainly not a high-priority task to pursue, if indeed we decided to pursue it at all. |
I am concerned that the QubitManager is a global state. In general, global state is generally bad practice, as it is tricky to make statements or guarantees about such objects. Other code unrelated to you can change it. Using a context manager is much better, but then this has its own gotchas if two separate pieces of code I also think that kwargs should be passed to the QubitManager so that you can potentially adjust behavior based on custom parameters. (Most obviously, you may want to label the ancilla qubits in some way). While resource estimation might not care about this 'temporary' state, I would be hesitant to say that no users care about how this works. If nothing else, it would be good for debugging if you get some weird ancillae that you were not expecting and want to know where they came from. Other than these two concerns, I like it. It's exciting to see this! |
+1 to the above. I think a good number (all?) of cirq protocols could allow kwargs for easier customization. If |
@daxfohl I updated the original text with some more context around how simulators would be affected by this change. Can you please take a look and share your thoughts? |
So you mean doing the allocations at runtime rather than compile time? Should be doable. First have to undo #5748, and re-add the Then in This would generically give you the ability to do the allocations at runtime in all simulators that support kron and factor. (Density Matrix and State Vector do; I couldn't figure out a The hard part may be getting the |
I figured I'd try it out. daxfohl@548e61f Here we make an ancilla'd X gate by X'ing a new ancilla, CX'ing from the ancilla to the target, then un-X'ing the ancilla to get it back to zero. Seems to work equivalently to X gate for state vector and density matrix simulators. |
@daxfohl I think it should be possible to use the new proposed
But I'm not sure if this is what you had in mind when you say "Would it be possible to rewrite this in terms of ancillas? ". In general,
@95-martin-orion Could you elaborate more? The newly added ancilla / placeholder qubit types are still valid cirq qubits (they derive from |
I'm not sure where I was going. Both things kind of create qubits out of thin air, so it seemed maybe there was some overlap that could be wheedled out. But I don't think so. Ancillas have to be freed at the end of the operation. "MeasurementQids" from deferred measurements have to stay around until the end of the circuit. So it's not quite the same thing. |
[Disclaimer: the only benefit of this is to (potentially) make the code cleaner. Net value added is likely small.] Some of the details are lost to history, but I have a couple of docs on the original To take this one step further: we could have all |
Oh, I know what I was thinking. Related to the But if we had the ability to add qubits during the simulation, Update: POC of basic functionality f7de78d.
|
@dstrain115 I agree that it would have been nicer if we didn't need to maintain a global state and could pass around qubit manager everywhere. I thought hard about potential ways of passing around qubit manager and I concluded that it's best to go with a global state given the tradeoffs. Let me provide some more context as follows. We need
|
It's possible to use |
Okay, I have a working prototype that does not require a global state of FYI @daxfohl @dstrain115, please see tanujkhattar@f05f7d7 and let me know if you have any initial feedback. In terms of next steps, I'll mark the issue as accepted and create subtasks assuming we add a new decompose protocol and don't maintain a global state. If we discover any further issues with using a new decompose protocol, we can always fallback the original global state approach. |
The problem recognized above is In addition to supporting a qubit manager there are other related considerations.
I think instead instead of This would work the same as in #6040 (comment) but will allow us to support different cost models or experimenting with novel decompositions. The model should be supplied in the constructor and can be either an
prototype implementationclass SupportedCostModels(Enum):
T_COUNT: int = 0
TOFFOLI_COUNT: int = 1
#... etc
class SomeGate:
class Models(cirq.GateModels):
DEFAULT: int = 0
ARXIV_123: int = 1
ARXIV_456: int = 2
@classmethod
def select(cls, model=None, cost_model=None) -> DecomposerFunc[[Sequence[qubits], cirq.QubitManager], cirq.OP_TREE]:
"""Returns a decomposer matching either requested gate model or one that optimizes the given cost_model.""""
def _decompose_with_options_(self, qubits, qubit_manager: cirq.QubitManager, cost_model: cirq.SupportedCostModels = cirq.SupportedCostModels.T_COUNT, from_model: bool = True):
if from_model and self.model_func is not None:
# A supplied model is preferred
yield from self.model_func(qubits, qubit_manager)
return
# yield the model that optimizes the given cost target.
yield from Models.select(cost_model=cost_model)(qubits, qubit_manager) |
Largely agree with @NoureldinYosri above. "Why we decompose", is something I still have a hard time with after two years on the project, and I know end users ask about this too. I think Cirq team needs to take the opportunity to step back and come up with a holistic answer there. One extra note, should GateSets perhaps be involved? I'd think you want to decompose to a gate set somehow. Or maybe decomposition should be external to the gate itself, since there are always multiple ways to decompose a gate. I don't have an answer, but I think there needs to be something more unified rather than yet another monkey patch regarding decomposition. |
This is why I'm introducing these ways of instantiating a gate
so for example if a decomposition targeting a specific gateset was introduced in an a paper say ARXIV_XXX. Then either we have it and the user can simply create an instance as
|
I have explored this in the past and I think this would be a huge breaking change for Cirq IMO. Qiskit supports doing decompositions external to the gate itself by maintaining an EquivalenceLibrary, but the reason it works is that they have a very restricted set of types that can be "parameters" to a gate/operation. And for each For us, a gate can be any class that can accept any arbitrary parameters (including other parameterized gates; eg: for ControlledGate). And, decomposition of gate can depend upon arbitrary conditions on these parameters. Therefore, capturing this complexity externally for all gates and for all ranges of parameters for those gates would mean that we expose an API like the Note that even if we make such an The idea of the decompose protocol is to provide a way to define the action of a composite gate in terms of other "simpler" gates, by potentially allocating new ancilla qubits as part of defining this action. For the purpose of this issue, I don't think we need to generalize this even more to the point where we can pick an "optimal" decomposition depending on the use case -- this is something that can be achieved by passing an I propose that we open a separate issue to brainstorm the generalization of decompose protocol and discuss "How to pick an optimal decomposition for a gate". For the sake of this issue, I'd like to keep the discussion focussed on "How can I define the action of a composite gate in terms of other simpler / known gates; by potentially using additional ancilla qubits?". What do you guys think? @NoureldinYosri @daxfohl |
@tanujkhattar
Thus as @daxfohl mentioned we probably should use this as an opportunity to rethink the decompose protocol to make our lives easier for the goals dependent on it:
|
I don't have any additional thoughts on decompose, I think you two have covered it well. @tanujkhattar I just looked at your proposed solution and it does bring up an interesting question. For the unitary protocol, there's really no way to identify which qubits are ancillas in order to remove them afterward. But they probably should be extracted. It's straightforward to do the linalg if you know which axes to extract. But identifying those axes, no idea. I also don't know the theory, whether gates with ancillas are guaranteed to be unitary in the non-ancilla qubits. Seems like they should be, but if that's the case then why are they necessary? |
See my sample implementation in #6101. In general, for operations we know which qubits they act on before calling |
It works, but I can't help but feel this is something that should be done at a lower level. Otherwise there's going to be logic removing ancillas scattered around lots of different places in the codebase. It may be hard to change the low level code in a backwards compatible way though. Perhaps defer to 2.0 and introduce some breaking changes that allow doing things more cleanly. I think there's a reasonable argument to be made that strict adherence to backwards compatibility has slowed down dev velocity and made the user experience suboptimal. Planning some breaking changes for a 2.0 cleanup release may be a good kickstart to the product. |
Can you give an example of what do you mean by "lower level" ? |
Borrowing qubits is entirely fine in the ideal world of unitary operations. On NISQ devices all operations are actually noisy quantum channels. For example, CNOT12, Z2, CNOT12, Z2 implements the Z1 gate and restores the second qubit to whatever state it was before it was borrowed for the operation. However, when we replace the four gates with their noisy equivalents this is no longer true. That said, this feature can be useful to make it easier and more automated to make an effective use of a quantum processor. I'd just keep in mind that this is an idealization. Therefore, the user should be made aware of the decisions made by memory management and should be able to affect them and override them. IOW, memory management for NISQ devices should be very much unlike the memory management on a classical computer which hides details from the programmer. For example, there is a trade-off is between borrowing otherwise inactive qubits and applying dynamical decoupling to them. This should be something the programmer gets to decide.
See this comment for some ideas on how to test this. There I assumed we may want to keep decompositions that don't return auxiliary qubits to the initial state, but I agree with you that restricting to just those that do (and hence applying the tests proposed in the linked comment to all |
@tanujkhattar I don't have an example of "lower level", just thinking that the ancilla handling needs to be somewhere such that you don't have to explicitly deal with it in all the various protocols that work on operations and gates. But it shouldn't be hidden away completely; there are situations where you might need to explicitly handle the ancillas. Judging by @viathor's comment, those situations may be more common than not in actual quantum research. Anyway that's all I was trying to convey. |
I'm really interested in helping with the QubitManager interface if it still needs someone to work on it. Questions:
|
hey @shef4 the qubit manager has a couple of tasks remaining for it
the first task is easiest and doesn't require quantum knowledge. the second and the third also don't require quantum knowledge but they require knoweldge of graphs and algorithms. |
Thanks for explaining! I'll start working on the first task so I get a quick start. the online qubit management sounds very cool though. From what I understood it reminds me of concurrent resource allocation but instead of threads & processing time, it's qubit memory & the duration of allocation.
if so I might look into trying topologically sort on the current free qubits beforehand so the manager does have to wait when it asks for a free qubit. The main issue I see is that it would need to be resorted when 1 or more qubits are deallocated. This might be O(N*log(M)). I'll start researching other graph algorithms that might be useful/similar to the main problem. |
@shef4 no deadlocks are not applicable here since there is no interdependency between qubits and allocating qubits happens sequentially. there are multiple valid mappings to graph problems. the problem can be states as interval coloring (the qubit are colors), clique cover (qubits are assigned per clique), and even as a scheduling problem. I don't really understand your mapping so I can't judge. |
Thanks for the clarification, I see what you saying and the problem statements are super helpful |
Can I take on the second task? |
@ishmum123 you are free to propose a new qubit manager and whether it gets merged depends on its performance and efficiency. items 2 and 3 are less of task and more of goals that we want to achieve. |
We have added an interface for This issue is now complete and can be closed. |
RFC
Background
In Cirq, we apply gates on qubits to get operations, which can then be inserted into a container (Circuit) to construct a (implicit) compute graph. This pattern assumes that it’s straightforward for the users to identify the qubits on which a gate should act to yield operations.
However, as we gradually move beyond the NISQ era, this assumption faces a number of challenges listed below:
AND
gate as in https://arxiv.org/abs/1805.03662), it often becomes very tedious for users to correctly keep track of all the ancillas needed in the system.The goal of this design discussion is to propose a mechanism to enable cirq users to overcome the above mentioned challenges and more easily construct large circuits.
Key Observations
Qubit allocations can introduce new implicit dependencies between operations
Borrowing qubits cannot be supported during circuit construction
Cirq fails if a gate decomposes into an OP_TREE with operations on newly allocated ancilla qubits.
_num_qubits_
._decompose_
protocol to achieve the desired effect without exposing all the gory details.Design Details
Using the key observations above, the proposed high level design is as follows:
Circuit construction
qalloc
,qborrow
andqfree
methods which delegate calls to aQubitManager
. The choice of which qubit manager to delegate to can be controlled by a context manager.- The default prescription would be to use a
SimpleQubitManager
+ post-processing transformers but the framework is flexible enough to directly support other heuristic qubit allocation strategies at the time of circuit construction itself.SimpleQubitManager
that always allocates a new qubit for eachqalloc
/qborrow
call. This ensures that no implicit data dependencies are introduced at the time of circuit construction due to qubit allocation. In other words, the simple qubit manager uses an allocation strategy that maximises qubit width and minimized circuit depth.- In order to be able to easily identify which qubits were allocated by the
SimpleQubitManager
, we introduce two new internal qubit types -CleanQubit
andBorrowableQubit
.CleanQubit
/BorrowableQubit
s, and perform a smarter qubit allocation. At this step, different post processing transformers can use different qubit width / circuit depth tradeoff strategies and also inspect the compute graph to automatically support borrowing qubits.Circuit Simulations
Suppose we construct a circuit that contains an operation defined using it's
_decompose_
protocol and that uses additional ancillas invoked usingqalloc
/qfree
. The simulators would raise an error right now if we try to simulate such an operation because memory for a simulation is allocated in advance by inspectingcirq.num_qubits(op)
, and as a consequence the simulators expect an operation to decompose into an OP_TREE acting only a subset ofop.qubits
.To mitigate this issue, one easy option is to recursively decompose the input circuit till we get to a point where the resulting circuit is a "trivial" circuit, i.e. has no more implicit allocations / deallocations inside the decompose protocol of operations. We can introduce additional protocols to keep track of which operation is "trivial" or not so we only decompose the operations that have a hidden allocation.
Another option that comes to mind, but I'm not fully sure if we can implement efficiently in the existing cirq simulator infrastructure, is to come up with a way of computing the effect of a "non-trivial" operation (one that has hidden allocations) only on the qubits that it expects as input (for example, by computing a reduced density matrix). This should be possible because there's an implicit promise that the allocated ancillas should be cleaned up when the user calls
qfree(q)
and therefore an outer simulator should not have to care about them. cc @daxfohl I'm curious to hear your thoughts on this.Example code snippets
Simple Qubit Manager with post-process transformer
Greedy Qubit Manager
Conclusion
I'd love to hear thoughts of other maintainers and contributors, specifically @daxfohl @dstrain115 @dabacon @maffoo. Once there is consensus, I'll create smaller subtasks which can be taken up by 20%ers and other community contributors.
@NoureldinYosri and @mpharrigan helped with this exploration and have given a round of feedback offline.
Update - 3rd April 2023
See the comments for more discussions on maintaining a global state for
qalloc
/qborrow
vs passing around thequbit_manager
as part of the decompose protocol tocirq.Gate
andcirq.Operation
. Right now, we have a working prototype without maintaining a global state. Here, we list down all the subtasks for implementing this roadmap item, many of which can be worked upon in parallel.QubitManager
interface and aSimpleQubitManager
implementation, which always allocates newCleanQubits
/BorrowableQubits
for eachqalloc
/qborrow
request._decompose_with_qubit_manager_
protocol for bothcirq.Gate
andcirq.Operation
.cirq.decompose
and it's variants to first try_decompose_with_qubit_manager_
and fallback on_decompose
.cirq.unitary(val)
to compute a reduced unitary whenval
s decomposition can allocate new qubits #6101)GreedyQubitManager
that maximizes/minimizes qubit reuse based on a configurable parameter.CleanQubit
/BorrowQubit
in a circuit to system qubits using a qubit manager.The text was updated successfully, but these errors were encountered: