-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tensor Simulation protocol 2: keep indices factorized #1070
Conversation
0bf245a
to
df07276
Compare
This is ready for review. I split it into two commit -- one is changes to the protocol; the other is moving all the bloqs+ to the new protocol |
df07276
to
c709c59
Compare
@tanujkhattar can you take a look when you have a chance |
Some high level comments about the approach
Have you contrasted this with the strategy where we try to flatten till the point we hit the first node that has its
One use case for defining
Directly implementing the decomposition is probably a higher bar. The way I imagine the workflow to be is the for many bloqs, people would first implement |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a round of comments and questions.
@@ -125,28 +121,6 @@ def dtype(self): | |||
def signature(self): | |||
return Signature([Register("a", self.a_dtype), Register("b", self.b_dtype)]) | |||
|
|||
def add_my_tensors( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: Following my high level comment, I think there's value in having my_tensors
implemented on bloqs that have decomposition defined; especially for verification of decomposition. It should be easy (or at least possible) for a user to get a "correct" expected matrix for an addition circuit and verify that the decomposition is correct -- for example: if were we to go and change the decomposition tomorrow and needed to verify that the decomposition is still correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wouldn't that be better in a unit test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe, but I don't see a need to remove it from bloqs that already have it defined.
Have you benchmarked what happens if you specify the tensors explicitly in my_tensors
and stop decomposing if a bloq has the tensor defined; vs if you flatten out everything? Now that we have an index for every qubit (unlike the situation before where had larger indices for soquets and there was a lot of splits and joins happening); I don't see why flattening out to the lowest level should be faster than stopping at a point where you have the tensor data available already.
If its not a significant speedup in performance; I think keeping consistency with the classical simulation workflow would be very nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the example of simulating the addition of 2x32 bit integers wouldn't work.
I agree that with most other protocols we keep things as high-level as possible. It's always better to let python handle the addition of two numbers than to emulate the CPU ourselves. The tensor simulation protocol is fundamentally different due to its inherent exponential scaling
return [ | ||
qtn.Tensor(data=eye, inds=[(outgoing, i), (incoming[i], 0)], tags=[str(self)]) | ||
for i in range(self.dtype.num_qubits) | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When implementing the my_tensors
method, a pattern like this looks super confusing. The indices now have two types of indexing going on - indexing into a numpy array if self.reg.shape
to get the corresponding Connection
object and then an implicit indexing into an implicit array of size self.bitsize
by specifying a tuple (cxn, i)
I think this calls for adding a new type (equivalent to a qubit) so that the ConnectionT
for a register reg
would now be a numpy array of shape reg.shape + (reg.bitsize,)
. This has the following advantages:
- It gets rid of an implicit vs explicit indexing. The implicit indexing is confusing at best.
- The concept aligns with the concept of a qubit - we just say that each entry in the incoming and outgoing dictionary corresponds to a qubit.
- This kind of
full shape
is already used as a concept in cirq style API ofdecompose_from_registers
and would so it'll unify the tensor contraction API nicely with that API.
Another alternative would be to keep ConnectionT
a numpy array of shape reg.shape
but override the __getitem__
method on Connection
object so cxn[i]
returns a new object which is essentially a dataclass / tuple storing (cxn, i) and can be used as a tensor index (like TensorIndex(cxn, i)
)
In general, it would be much nicer if the call sites look like inds=[outgoing[i], incoming[i][0]]
instead of what it is right now.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is three levels of indexing: by register name, optionally within a "shaped" register, and then the sub-register bit-indexing.
I don't think it's worth adding an extra dimension to all the arrays where the last column is just the numbers 0..n.
(incoming['ctrl'], 0)
is not meaningfully more confusing than incoming['ctrl'][0]
where indeed the latter syntax just maps to TensorIndex(incoming['ctrl'], 0)
anyways
This could be an interesting thing to explore. It would be an easy change by passing a predicate to the flattening call. With the following considerations
|
For classical-action-style bloqs I agree. It's one of my priorities to implement the tensors-from-classical-action helper. There are some bloqs that are naturally "tensor first" like leaf bloqs or state prep maybe. Those you can write tensors for. For bloqs higher-up in the call graph, I'm not sure there are many cases where it's more natural to express the tensor than the decomposition |
From a users perspective, if the default behavior we choose is to decompose only till the point where a bloq has its tensors defined; then a user can do: > bloq.tensor_contract() : Use the tensor data at the highest level possible
> bloq.as_composite_bloq().flatten().tensor_contract() : Use the tensor data at the lowest level possible Both the workflows are well supported with a single line of code. If we choose the default to go as deep as possible; the bloq authors need to:
With these considerations in mind, I'll suggest its better to change the default behavior and support both the workflows. |
A good unit test has the computed quantity validated against a reference value that we're pretty sure is correct. Once again: this means that we'd need non-leaf bloqs where the tensor data is somehow a natural expression of the bloq. This is the case for the classical action bloqs, but really -- the test should just use the classical action tests! As part of this PR, I went through every bloq that had defined tensors. There were two cases where the bloq had both tensors and a decomposition (excepting some that will be leaf bloqs #873 )
There are no tests of the form |
Wouldn't the
Following my comment above, it's better to support two things than one. Please consider making this change. |
d850862 restores the flattening logic with a boolean flag. By default, we'll flatten as far as possible. This can be changed if there's ever an example where this is not desired |
if any(reg.side != Side.THRU for reg in signature): | ||
raise ValueError( | ||
f"CirqGateAsBloq.my_tensors is only supported for " | ||
f"gates with thru registers. Found {gate}." | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This restriction was not present in the original _add_my_tensors_from_gate
method; so it'll be nice to not add this restriction as part of this large scale refactoring. The CirqGateAsBloq
specific errors should be raised in the implementation of CirqGateAsBloq
; not in the implementation of _my_tensors_from_gate
.
The original implementation works by first assuming that all registers are THRU registers and then extracting tensor data from the correct subspace whenever a register is not a THRU register. This is done by tensor_data_from_unitary_and_signature
-- the method is still implemented but not used anymore here.
Can you please revert the restricting changes to this method make sure the more general behavior is maintained? This method is called not just from CirqGateAsBloq
but also from GateWithRegisters
, that may have directional registers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should really have a test if it's behavior we're relying on. I'll revert the changes, though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM % one final comment about reverting the behavior of my_tensors_from_gate
and support directional registers as was done previously.
Fixes #793
The problem
In Qualtran, we try to keep everything at as-high-a-level of abstraction as possible and recurse "one layer at a time" as necessary. This is better for most of the protocols (resource counting, classical simulation, diagrams). When doing a full quantum/tensor simulation we need everything to be as factored as possible for optimal performance. This is because the tensor contraction engine can find good ways of contracting the small tensors that can potentially avoid making any huge intermediate tensors.
The problem is with splits and joins. When we flatten out a high-level circuit into its constituent parts; there are a plurality of these bookkeeping operations. The cost protocols and to-cirq protocols can ignore these or handle them elegantly. But the tensor simulation engine uses a third party package: quimb which would require special support for these fancy identity operations. Currently (prior to this PR), we mapped each soquet to a quimb index. This makes a lot of conceptual sense; and indeed quimb can support indices of any size (i.e. bitsize > 1 wires). But something like a
Split
bloq would have to have one index of size2**n
which would completely block any intelligent tensor contraction ordering if e'er one was encountered.Rejected (for now) solution
One solution would have been to flatten everything into a circuit and run some sort of graph-rewriting algorithm to fuse or otherwise minimize split/join pairs. This might be desirable for other reasons; but would have been complicated to get working right.
The solution in this PR
The solution in this PR is to take a NISQ-style, per-qubit, factorized-indices description of our tensor networks. Each tensor network index used to be a
Soquet
. Now it is a(Connection, int)
pair where the integerj
says which qubit of thereg.dtype.num_qubits
-sized register this ndarray dimension is.I've taken the liberty of identifying quimb indices with
Connection
instead ofSoquet
, which fixes a bit of a conceptual mistake from the original implementation.The new Split tensor method should be ellucidating
where it was once one
Tensor
with 1 large index coming in and n one-bit indices going out, it is nown
1-bit identity matrices.If your
Bloq
only hasQBit()
inputs and outputs, the extra index will always be0
, e.g.Flattening
To get good tensor performance, you ought to flatten your circuit.
Bloq.tensor_contract()
will now flatten the circuit, turn it to a tensor network, and contract it.It used to contract each subbloq recursively, which is not performant. There used to be a function called
flatten_for_tensor_contraction
that would use the correct predicate to flatten to the correct level. No one really knew about this or used it. After #1061 , the behavior now is to flatten as much as possible. All the bloqs in the flattened representation must have explicit tensors.Notable consequence: If your bloq has both a decomposition and a
my_tensors
method, themy_tensors
method will not be used by default. You'd have to use thequaltran.simulation.tensor
functions yourself if you don't want to flatten all the way before embarking on tensor contraction.The general advice is to not implement
my_tensors
and rather define a decomposition if you want to simulate your bloq. Leaf bloqs need to implementmy_tensors
but the rate that they get added to the library is likely slow.Done and todo
cbloq_to_quimb_2
back tocbloq_to_quimb