Tensor Simulation protocol 2: keep indices factorized #1070

mpharrigan · 2024-06-13T01:14:11Z

Fixes #793

The problem

In Qualtran, we try to keep everything at as-high-a-level of abstraction as possible and recurse "one layer at a time" as necessary. This is better for most of the protocols (resource counting, classical simulation, diagrams). When doing a full quantum/tensor simulation we need everything to be as factored as possible for optimal performance. This is because the tensor contraction engine can find good ways of contracting the small tensors that can potentially avoid making any huge intermediate tensors.

The problem is with splits and joins. When we flatten out a high-level circuit into its constituent parts; there are a plurality of these bookkeeping operations. The cost protocols and to-cirq protocols can ignore these or handle them elegantly. But the tensor simulation engine uses a third party package: quimb which would require special support for these fancy identity operations. Currently (prior to this PR), we mapped each soquet to a quimb index. This makes a lot of conceptual sense; and indeed quimb can support indices of any size (i.e. bitsize > 1 wires). But something like a Split bloq would have to have one index of size 2**n which would completely block any intelligent tensor contraction ordering if e'er one was encountered.

Rejected (for now) solution

One solution would have been to flatten everything into a circuit and run some sort of graph-rewriting algorithm to fuse or otherwise minimize split/join pairs. This might be desirable for other reasons; but would have been complicated to get working right.

The solution in this PR

The solution in this PR is to take a NISQ-style, per-qubit, factorized-indices description of our tensor networks. Each tensor network index used to be a Soquet. Now it is a (Connection, int) pair where the integer j says which qubit of the reg.dtype.num_qubits-sized register this ndarray dimension is.

I've taken the liberty of identifying quimb indices with Connection instead of Soquet, which fixes a bit of a conceptual mistake from the original implementation.

The new Split tensor method should be ellucidating

        eye = np.eye(2)
        incoming = incoming['reg']
        outgoing = cast(NDArray, outgoing['reg'])
        return [
            qtn.Tensor(data=eye, inds=[(outgoing[i], 0), (incoming, i)], tags=[str(self)])
            for i in range(self.dtype.num_qubits)
        ]

where it was once one Tensor with 1 large index coming in and n one-bit indices going out, it is now n 1-bit identity matrices.

If your Bloq only has QBit() inputs and outputs, the extra index will always be 0, e.g.

        return [
            qtn.Tensor(
                data=_PAULIX, inds=[(outgoing['q'], 0), (incoming['q'], 0)], tags=[str(self)]
            )
        ]

Flattening

To get good tensor performance, you ought to flatten your circuit. Bloq.tensor_contract() will now flatten the circuit, turn it to a tensor network, and contract it.

It used to contract each subbloq recursively, which is not performant. There used to be a function called flatten_for_tensor_contraction that would use the correct predicate to flatten to the correct level. No one really knew about this or used it. After #1061 , the behavior now is to flatten as much as possible. All the bloqs in the flattened representation must have explicit tensors.

Notable consequence: If your bloq has both a decomposition and a my_tensors method, the my_tensors method will not be used by default. You'd have to use the qualtran.simulation.tensor functions yourself if you don't want to flatten all the way before embarking on tensor contraction.

The general advice is to not implement my_tensors and rather define a decomposition if you want to simulate your bloq. Leaf bloqs need to implement my_tensors but the rate that they get added to the library is likely slow.

Done and todo

tests pass
Quimb tensor contraction is significantly slower than cirqs state vector simulation #793 performance is fast
demo showing high-bitsize one-amplitude tensor contraction
Are there other states I should be benchmarking?
Compare test durations before and after
rename cbloq_to_quimb_2 back to cbloq_to_quimb
syntactic sugar for getting statevector assuming starting in the zero state -- for follow-up

mpharrigan · 2024-06-19T00:31:52Z

This is ready for review. I split it into two commit -- one is changes to the protocol; the other is moving all the bloqs+ to the new protocol

mpharrigan · 2024-06-26T17:52:27Z

@tanujkhattar can you take a look when you have a chance

tanujkhattar · 2024-06-26T19:11:33Z

Some high level comments about the approach

Notable consequence: If your bloq has both a decomposition and a my_tensors method, the my_tensors method will not be used by default. You'd have to use the qualtran.simulation.tensor functions yourself if you don't want to flatten all the way before embarking on tensor contraction.

Have you contrasted this with the strategy where we try to flatten till the point we hit the first node that has its my_tensors defined? Is there clear evidence that this strategy of flattening everything is better? Should we open an issue to track this comparison for future?

The general advice is to not implement my_tensors and rather define a decomposition if you want to simulate your bloq. Leaf bloqs need to implement my_tensors but the rate that they get added to the library is likely slow.

One use case for defining my_tensors is to verify correctness of the decomposition - do something like bloq.tensor_contract() == bloq.decompose_bloq().tensor_contract(). This is similar to how we use bloq.call_classically() == cbloq.call_classicall() to verify decomposition via classical actions. The fact that these two statements imply different things for two similar protocols (classical and quantum simulation) would be confusing at best. Is there anyway to achieve this kind of verification with this new strategy? Should we add a predicate / flag to bloq.tensor_contract to control the flattening behavior so that, by default, we flatten only till we hit a bloq that has my_tensors defined and users can specify a bloq.tensor_contract(flatten=True) OR call bloq.as_composite_bloq.flatten().tensor_contract() (which is still fairly straightforward and single line) to achieve the "flatten before contraction" ?

The general advice is to not implement my_tensors and rather define a decomposition if you want to simulate your bloq.

Directly implementing the decomposition is probably a higher bar. The way I imagine the workflow to be is the for many bloqs, people would first implement my_tensors during initial verification and testing and then implement a decomposition if that becomes a bottleneck; especially for a lot of the classical action style bloqs like arithmetic, data loading etc. where action and tensor data is significantly easier to specify.

tanujkhattar

Left a round of comments and questions.

qualtran/_infra/composite_bloq.py

qualtran/_infra/controlled.py

qualtran/simulation/tensor/_dense.py

tanujkhattar · 2024-06-26T19:55:31Z

qualtran/bloqs/arithmetic/addition.py

@@ -125,28 +121,6 @@ def dtype(self):
    def signature(self):
        return Signature([Register("a", self.a_dtype), Register("b", self.b_dtype)])

-    def add_my_tensors(


Note: Following my high level comment, I think there's value in having my_tensors implemented on bloqs that have decomposition defined; especially for verification of decomposition. It should be easy (or at least possible) for a user to get a "correct" expected matrix for an addition circuit and verify that the decomposition is correct -- for example: if were we to go and change the decomposition tomorrow and needed to verify that the decomposition is still correct.

wouldn't that be better in a unit test?

Maybe, but I don't see a need to remove it from bloqs that already have it defined.

Have you benchmarked what happens if you specify the tensors explicitly in my_tensors and stop decomposing if a bloq has the tensor defined; vs if you flatten out everything? Now that we have an index for every qubit (unlike the situation before where had larger indices for soquets and there was a lot of splits and joins happening); I don't see why flattening out to the lowest level should be faster than stopping at a point where you have the tensor data available already.

If its not a significant speedup in performance; I think keeping consistency with the classical simulation workflow would be very nice.

the example of simulating the addition of 2x32 bit integers wouldn't work.

I agree that with most other protocols we keep things as high-level as possible. It's always better to let python handle the addition of two numbers than to emulate the CPU ourselves. The tensor simulation protocol is fundamentally different due to its inherent exponential scaling

qualtran/bloqs/basic_gates/global_phase.py

qualtran/bloqs/basic_gates/su2_rotation.py

tanujkhattar · 2024-06-26T20:24:42Z

qualtran/bloqs/bookkeeping/join.py

+        return [
+            qtn.Tensor(data=eye, inds=[(outgoing, i), (incoming[i], 0)], tags=[str(self)])
+            for i in range(self.dtype.num_qubits)
+        ]


When implementing the my_tensors method, a pattern like this looks super confusing. The indices now have two types of indexing going on - indexing into a numpy array if self.reg.shape to get the corresponding Connection object and then an implicit indexing into an implicit array of size self.bitsize by specifying a tuple (cxn, i)

I think this calls for adding a new type (equivalent to a qubit) so that the ConnectionT for a register reg would now be a numpy array of shape reg.shape + (reg.bitsize,). This has the following advantages:

It gets rid of an implicit vs explicit indexing. The implicit indexing is confusing at best.

The concept aligns with the concept of a qubit - we just say that each entry in the incoming and outgoing dictionary corresponds to a qubit.

This kind of full shape is already used as a concept in cirq style API of decompose_from_registers and would so it'll unify the tensor contraction API nicely with that API.

Another alternative would be to keep ConnectionT a numpy array of shape reg.shape but override the __getitem__ method on Connection object so cxn[i] returns a new object which is essentially a dataclass / tuple storing (cxn, i) and can be used as a tensor index (like TensorIndex(cxn, i))

In general, it would be much nicer if the call sites look like inds=[outgoing[i], incoming[i][0]] instead of what it is right now.

What do you think?

There is three levels of indexing: by register name, optionally within a "shaped" register, and then the sub-register bit-indexing.

I don't think it's worth adding an extra dimension to all the arrays where the last column is just the numbers 0..n.

(incoming['ctrl'], 0) is not meaningfully more confusing than incoming['ctrl'][0] where indeed the latter syntax just maps to TensorIndex(incoming['ctrl'], 0) anyways

qualtran/cirq_interop/_cirq_to_bloq.py

mpharrigan · 2024-06-27T20:43:06Z

Have you contrasted this with the strategy where we try to flatten till the point we hit the first node that has its my_tensors defined? Is there clear evidence that this strategy of flattening everything is better? Should we open an issue to track this comparison for future?

This could be an interesting thing to explore. It would be an easy change by passing a predicate to the flattening call. With the following considerations

It's kindof hard to figure out if a bloq overrides my_tensors. We could do the thing where we check if the method's qualname starts with "Bloq." or not
"better" is a loaded term. For arbitrary tensor network topologies, it's better to keep things factorized in the sense that one can be smarter about contraction ordering. If you have a low number of qubits and a deep circuit within a decomposition you can very easily spend more time making the tensor network and finding a contraction ordering than just keeping things more dense to begin with

mpharrigan · 2024-06-27T20:54:24Z

Directly implementing the decomposition is probably a higher bar. The way I imagine the workflow to be is the for many bloqs, people would first implement my_tensors during initial verification and testing and then implement a decomposition if that becomes a bottleneck; especially for a lot of the classical action style bloqs like arithmetic, data loading etc. where action and tensor data is significantly easier to specify.

For classical-action-style bloqs I agree. It's one of my priorities to implement the tensors-from-classical-action helper.

There are some bloqs that are naturally "tensor first" like leaf bloqs or state prep maybe. Those you can write tensors for. For bloqs higher-up in the call graph, I'm not sure there are many cases where it's more natural to express the tensor than the decomposition

tanujkhattar · 2024-06-27T21:00:17Z

This could be an interesting thing to explore. It would be an easy change by passing a predicate to the flattening call. With the following considerations

From a users perspective, if the default behavior we choose is to decompose only till the point where a bloq has its tensors defined; then a user can do:

> bloq.tensor_contract() : Use the tensor data at the highest level possible
> bloq.as_composite_bloq().flatten().tensor_contract() : Use the tensor data at the lowest level possible

Both the workflows are well supported with a single line of code. If we choose the default to go as deep as possible; the bloq authors need to:

Write a custom predicate, which is potentially error prone for a user to get right, to determine "go as low as possible"
Use the cbloq_to_quimb and quimb_to_dense APIs which are non-standard and harder to discover.

With these considerations in mind, I'll suggest its better to change the default behavior and support both the workflows.

mpharrigan · 2024-06-27T21:25:36Z

One use case for defining my_tensors is to verify correctness of the decomposition - do something like bloq.tensor_contract() == bloq.decompose_bloq().tensor_contract(). This is similar to how we use bloq.call_classically() == cbloq.call_classicall() to verify decomposition via classical actions. The fact that these two statements imply different things for two similar protocols (classical and quantum simulation) would be confusing at best. Is there anyway to achieve this kind of verification with this new strategy?

A good unit test has the computed quantity validated against a reference value that we're pretty sure is correct. Once again: this means that we'd need non-leaf bloqs where the tensor data is somehow a natural expression of the bloq. This is the case for the classical action bloqs, but really -- the test should just use the classical action tests!

As part of this PR, I went through every bloq that had defined tensors. There were two cases where the bloq had both tensors and a decomposition (excepting some that will be leaf bloqs #873 )

Add: classical action; should just use that for testing. The existing tensors are wrong for negative values. The unfactorized form prevents efficient contraction of high-bit low-depth circuits
IntState: classical action; really just a test of the int_to_bits function. The unfactorized form really prevents efficient contraction since this is a bunch of kets

There are no tests of the form bloq.tensor_contract() == bloq.flatten().tensor_contract() or cirq equivalent.

tanujkhattar · 2024-06-27T22:18:30Z

IntState: classical action; really just a test of the int_to_bits function. The unfactorized form really prevents efficient contraction since this is a bunch of kets

Wouldn't the my_tensors implementation of IntState now just return a list of tensors instead of one big tensor? The unfactorized form vs factorized form problem here is now solved by changing the way we represent the indices and not necessarily by removing the implementation of my_tensors and flattening first ?

Both the workflows are well supported with a single line of code.. With these considerations in mind, I'll suggest its better to change the default behavior and support both the workflows.

Following my comment above, it's better to support two things than one. Please consider making this change.

mpharrigan · 2024-07-02T01:21:47Z

d850862 restores the flattening logic with a boolean flag. By default, we'll flatten as far as possible. This can be changed if there's ever an example where this is not desired

tanujkhattar · 2024-07-02T16:49:37Z

qualtran/cirq_interop/_cirq_to_bloq.py

+    if any(reg.side != Side.THRU for reg in signature):
+        raise ValueError(
+            f"CirqGateAsBloq.my_tensors is only supported for "
+            f"gates with thru registers. Found {gate}."
        )


This restriction was not present in the original _add_my_tensors_from_gate method; so it'll be nice to not add this restriction as part of this large scale refactoring. The CirqGateAsBloq specific errors should be raised in the implementation of CirqGateAsBloq; not in the implementation of _my_tensors_from_gate.

The original implementation works by first assuming that all registers are THRU registers and then extracting tensor data from the correct subspace whenever a register is not a THRU register. This is done by tensor_data_from_unitary_and_signature -- the method is still implemented but not used anymore here.

Can you please revert the restricting changes to this method make sure the more general behavior is maintained? This method is called not just from CirqGateAsBloq but also from GateWithRegisters, that may have directional registers.

this should really have a test if it's behavior we're relying on. I'll revert the changes, though

tanujkhattar

LGTM % one final comment about reverting the behavior of my_tensors_from_gate and support directional registers as was done previously.

mpharrigan force-pushed the 2024-06/simmy branch 2 times, most recently from 0bf245a to df07276 Compare June 19, 2024 00:31

mpharrigan marked this pull request as ready for review June 19, 2024 00:31

mpharrigan added 2 commits June 18, 2024 17:35

New tensor network protocol

c0cac7b

Port usage to new tensor network protocol

c709c59

mpharrigan force-pushed the 2024-06/simmy branch from df07276 to c709c59 Compare June 19, 2024 00:35

tanujkhattar self-assigned this Jun 24, 2024

mpharrigan added 2 commits June 24, 2024 18:58

Merge branch 'main' into 2024-06/simmy

092ed27

notebook import fix

5c4c213

mpharrigan requested a review from tanujkhattar June 24, 2024 19:02

mpharrigan mentioned this pull request Jun 25, 2024

Introduce types to t_complexity #835

Merged

Merge branch 'main' into 2024-06/simmy

ad749ec

tanujkhattar requested changes Jun 26, 2024

View reviewed changes

Merge remote-tracking branch 'origin/main' into 2024-06/simmy

5d5ffe4

Merge remote-tracking branch 'mine/2024-06/simmy' into 2024-06/simmy

d3dbb83

mpharrigan added 4 commits June 27, 2024 14:43

Revert _unitary_ changes; revert ctrl global phase

6506556

remove old function I forgot about

a34a9b6

cxns_to_cxn_dict

87bf98f

comments on the various things we're looping over

2186132

mpharrigan added 2 commits July 1, 2024 17:52

Merge remote-tracking branch 'origin/main' into 2024-06/simmy

7372a8e

flattening options

d850862

tanujkhattar reviewed Jul 2, 2024

View reviewed changes

mpharrigan added 2 commits July 2, 2024 11:46

Unrestrict _my_tensors_from_gate

eaf6583

Merge branch 'main' into 2024-06/simmy

8cd5b6a

tanujkhattar approved these changes Jul 2, 2024

View reviewed changes

tanujkhattar enabled auto-merge (squash) July 2, 2024 18:48

tanujkhattar merged commit 9514139 into quantumlib:main Jul 2, 2024
7 checks passed

mpharrigan added the sim/quantum label Jul 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tensor Simulation protocol 2: keep indices factorized #1070

Tensor Simulation protocol 2: keep indices factorized #1070

mpharrigan commented Jun 13, 2024 •

edited

Loading

mpharrigan commented Jun 19, 2024

mpharrigan commented Jun 26, 2024

tanujkhattar commented Jun 26, 2024 •

edited

Loading

tanujkhattar left a comment

tanujkhattar Jun 26, 2024

mpharrigan Jun 27, 2024

tanujkhattar Jun 27, 2024

mpharrigan Jun 27, 2024

tanujkhattar Jun 26, 2024

mpharrigan Jun 27, 2024

mpharrigan commented Jun 27, 2024

mpharrigan commented Jun 27, 2024

tanujkhattar commented Jun 27, 2024

mpharrigan commented Jun 27, 2024

tanujkhattar commented Jun 27, 2024

mpharrigan commented Jul 2, 2024

tanujkhattar Jul 2, 2024

mpharrigan Jul 2, 2024

tanujkhattar left a comment

Tensor Simulation protocol 2: keep indices factorized #1070

Tensor Simulation protocol 2: keep indices factorized #1070

Conversation

mpharrigan commented Jun 13, 2024 • edited Loading

The problem

Rejected (for now) solution

The solution in this PR

Flattening

Done and todo

mpharrigan commented Jun 19, 2024

mpharrigan commented Jun 26, 2024

tanujkhattar commented Jun 26, 2024 • edited Loading

tanujkhattar left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mpharrigan commented Jun 27, 2024

mpharrigan commented Jun 27, 2024

tanujkhattar commented Jun 27, 2024

mpharrigan commented Jun 27, 2024

tanujkhattar commented Jun 27, 2024

mpharrigan commented Jul 2, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tanujkhattar left a comment

Choose a reason for hiding this comment

mpharrigan commented Jun 13, 2024 •

edited

Loading

tanujkhattar commented Jun 26, 2024 •

edited

Loading