Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tensor Simulation protocol 2: keep indices factorized #1070

Merged
merged 15 commits into from
Jul 2, 2024

Conversation

mpharrigan
Copy link
Collaborator

@mpharrigan mpharrigan commented Jun 13, 2024

Fixes #793

The problem

In Qualtran, we try to keep everything at as-high-a-level of abstraction as possible and recurse "one layer at a time" as necessary. This is better for most of the protocols (resource counting, classical simulation, diagrams). When doing a full quantum/tensor simulation we need everything to be as factored as possible for optimal performance. This is because the tensor contraction engine can find good ways of contracting the small tensors that can potentially avoid making any huge intermediate tensors.

The problem is with splits and joins. When we flatten out a high-level circuit into its constituent parts; there are a plurality of these bookkeeping operations. The cost protocols and to-cirq protocols can ignore these or handle them elegantly. But the tensor simulation engine uses a third party package: quimb which would require special support for these fancy identity operations. Currently (prior to this PR), we mapped each soquet to a quimb index. This makes a lot of conceptual sense; and indeed quimb can support indices of any size (i.e. bitsize > 1 wires). But something like a Split bloq would have to have one index of size 2**n which would completely block any intelligent tensor contraction ordering if e'er one was encountered.

Rejected (for now) solution

One solution would have been to flatten everything into a circuit and run some sort of graph-rewriting algorithm to fuse or otherwise minimize split/join pairs. This might be desirable for other reasons; but would have been complicated to get working right.

The solution in this PR

The solution in this PR is to take a NISQ-style, per-qubit, factorized-indices description of our tensor networks. Each tensor network index used to be a Soquet. Now it is a (Connection, int) pair where the integer j says which qubit of the reg.dtype.num_qubits-sized register this ndarray dimension is.

I've taken the liberty of identifying quimb indices with Connection instead of Soquet, which fixes a bit of a conceptual mistake from the original implementation.

The new Split tensor method should be ellucidating

        eye = np.eye(2)
        incoming = incoming['reg']
        outgoing = cast(NDArray, outgoing['reg'])
        return [
            qtn.Tensor(data=eye, inds=[(outgoing[i], 0), (incoming, i)], tags=[str(self)])
            for i in range(self.dtype.num_qubits)
        ]

where it was once one Tensor with 1 large index coming in and n one-bit indices going out, it is now n 1-bit identity matrices.

If your Bloq only has QBit() inputs and outputs, the extra index will always be 0, e.g.

        return [
            qtn.Tensor(
                data=_PAULIX, inds=[(outgoing['q'], 0), (incoming['q'], 0)], tags=[str(self)]
            )
        ]

Flattening

To get good tensor performance, you ought to flatten your circuit. Bloq.tensor_contract() will now flatten the circuit, turn it to a tensor network, and contract it.

It used to contract each subbloq recursively, which is not performant. There used to be a function called flatten_for_tensor_contraction that would use the correct predicate to flatten to the correct level. No one really knew about this or used it. After #1061 , the behavior now is to flatten as much as possible. All the bloqs in the flattened representation must have explicit tensors.

Notable consequence: If your bloq has both a decomposition and a my_tensors method, the my_tensors method will not be used by default. You'd have to use the qualtran.simulation.tensor functions yourself if you don't want to flatten all the way before embarking on tensor contraction.

The general advice is to not implement my_tensors and rather define a decomposition if you want to simulate your bloq. Leaf bloqs need to implement my_tensors but the rate that they get added to the library is likely slow.

Done and todo

@mpharrigan mpharrigan force-pushed the 2024-06/simmy branch 2 times, most recently from 0bf245a to df07276 Compare June 19, 2024 00:31
@mpharrigan mpharrigan marked this pull request as ready for review June 19, 2024 00:31
@mpharrigan
Copy link
Collaborator Author

This is ready for review. I split it into two commit -- one is changes to the protocol; the other is moving all the bloqs+ to the new protocol

@tanujkhattar tanujkhattar self-assigned this Jun 24, 2024
@mpharrigan
Copy link
Collaborator Author

@tanujkhattar can you take a look when you have a chance

@tanujkhattar
Copy link
Collaborator

tanujkhattar commented Jun 26, 2024

Some high level comments about the approach

Notable consequence: If your bloq has both a decomposition and a my_tensors method, the my_tensors method will not be used by default. You'd have to use the qualtran.simulation.tensor functions yourself if you don't want to flatten all the way before embarking on tensor contraction.

Have you contrasted this with the strategy where we try to flatten till the point we hit the first node that has its my_tensors defined? Is there clear evidence that this strategy of flattening everything is better? Should we open an issue to track this comparison for future?

The general advice is to not implement my_tensors and rather define a decomposition if you want to simulate your bloq. Leaf bloqs need to implement my_tensors but the rate that they get added to the library is likely slow.

One use case for defining my_tensors is to verify correctness of the decomposition - do something like bloq.tensor_contract() == bloq.decompose_bloq().tensor_contract(). This is similar to how we use bloq.call_classically() == cbloq.call_classicall() to verify decomposition via classical actions. The fact that these two statements imply different things for two similar protocols (classical and quantum simulation) would be confusing at best. Is there anyway to achieve this kind of verification with this new strategy? Should we add a predicate / flag to bloq.tensor_contract to control the flattening behavior so that, by default, we flatten only till we hit a bloq that has my_tensors defined and users can specify a bloq.tensor_contract(flatten=True) OR call bloq.as_composite_bloq.flatten().tensor_contract() (which is still fairly straightforward and single line) to achieve the "flatten before contraction" ?

The general advice is to not implement my_tensors and rather define a decomposition if you want to simulate your bloq.

Directly implementing the decomposition is probably a higher bar. The way I imagine the workflow to be is the for many bloqs, people would first implement my_tensors during initial verification and testing and then implement a decomposition if that becomes a bottleneck; especially for a lot of the classical action style bloqs like arithmetic, data loading etc. where action and tensor data is significantly easier to specify.

Copy link
Collaborator

@tanujkhattar tanujkhattar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a round of comments and questions.

qualtran/_infra/composite_bloq.py Show resolved Hide resolved
qualtran/_infra/composite_bloq.py Outdated Show resolved Hide resolved
qualtran/_infra/controlled.py Outdated Show resolved Hide resolved
qualtran/simulation/tensor/_dense.py Show resolved Hide resolved
@@ -125,28 +121,6 @@ def dtype(self):
def signature(self):
return Signature([Register("a", self.a_dtype), Register("b", self.b_dtype)])

def add_my_tensors(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: Following my high level comment, I think there's value in having my_tensors implemented on bloqs that have decomposition defined; especially for verification of decomposition. It should be easy (or at least possible) for a user to get a "correct" expected matrix for an addition circuit and verify that the decomposition is correct -- for example: if were we to go and change the decomposition tomorrow and needed to verify that the decomposition is still correct.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't that be better in a unit test?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, but I don't see a need to remove it from bloqs that already have it defined.

Have you benchmarked what happens if you specify the tensors explicitly in my_tensors and stop decomposing if a bloq has the tensor defined; vs if you flatten out everything? Now that we have an index for every qubit (unlike the situation before where had larger indices for soquets and there was a lot of splits and joins happening); I don't see why flattening out to the lowest level should be faster than stopping at a point where you have the tensor data available already.

If its not a significant speedup in performance; I think keeping consistency with the classical simulation workflow would be very nice.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the example of simulating the addition of 2x32 bit integers wouldn't work.

I agree that with most other protocols we keep things as high-level as possible. It's always better to let python handle the addition of two numbers than to emulate the CPU ourselves. The tensor simulation protocol is fundamentally different due to its inherent exponential scaling

qualtran/bloqs/basic_gates/global_phase.py Outdated Show resolved Hide resolved
qualtran/bloqs/basic_gates/su2_rotation.py Outdated Show resolved Hide resolved
Comment on lines +92 to +95
return [
qtn.Tensor(data=eye, inds=[(outgoing, i), (incoming[i], 0)], tags=[str(self)])
for i in range(self.dtype.num_qubits)
]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When implementing the my_tensors method, a pattern like this looks super confusing. The indices now have two types of indexing going on - indexing into a numpy array if self.reg.shape to get the corresponding Connection object and then an implicit indexing into an implicit array of size self.bitsize by specifying a tuple (cxn, i)

I think this calls for adding a new type (equivalent to a qubit) so that the ConnectionT for a register reg would now be a numpy array of shape reg.shape + (reg.bitsize,). This has the following advantages:

  • It gets rid of an implicit vs explicit indexing. The implicit indexing is confusing at best.
  • The concept aligns with the concept of a qubit - we just say that each entry in the incoming and outgoing dictionary corresponds to a qubit.
  • This kind of full shape is already used as a concept in cirq style API of decompose_from_registers and would so it'll unify the tensor contraction API nicely with that API.

Another alternative would be to keep ConnectionT a numpy array of shape reg.shape but override the __getitem__ method on Connection object so cxn[i] returns a new object which is essentially a dataclass / tuple storing (cxn, i) and can be used as a tensor index (like TensorIndex(cxn, i))

In general, it would be much nicer if the call sites look like inds=[outgoing[i], incoming[i][0]] instead of what it is right now.

What do you think?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is three levels of indexing: by register name, optionally within a "shaped" register, and then the sub-register bit-indexing.

I don't think it's worth adding an extra dimension to all the arrays where the last column is just the numbers 0..n.

(incoming['ctrl'], 0) is not meaningfully more confusing than incoming['ctrl'][0] where indeed the latter syntax just maps to TensorIndex(incoming['ctrl'], 0) anyways

qualtran/cirq_interop/_cirq_to_bloq.py Show resolved Hide resolved
@mpharrigan
Copy link
Collaborator Author

Have you contrasted this with the strategy where we try to flatten till the point we hit the first node that has its my_tensors defined? Is there clear evidence that this strategy of flattening everything is better? Should we open an issue to track this comparison for future?

This could be an interesting thing to explore. It would be an easy change by passing a predicate to the flattening call. With the following considerations

  • It's kindof hard to figure out if a bloq overrides my_tensors. We could do the thing where we check if the method's qualname starts with "Bloq." or not
  • "better" is a loaded term. For arbitrary tensor network topologies, it's better to keep things factorized in the sense that one can be smarter about contraction ordering. If you have a low number of qubits and a deep circuit within a decomposition you can very easily spend more time making the tensor network and finding a contraction ordering than just keeping things more dense to begin with

@mpharrigan
Copy link
Collaborator Author

Directly implementing the decomposition is probably a higher bar. The way I imagine the workflow to be is the for many bloqs, people would first implement my_tensors during initial verification and testing and then implement a decomposition if that becomes a bottleneck; especially for a lot of the classical action style bloqs like arithmetic, data loading etc. where action and tensor data is significantly easier to specify.

For classical-action-style bloqs I agree. It's one of my priorities to implement the tensors-from-classical-action helper.

There are some bloqs that are naturally "tensor first" like leaf bloqs or state prep maybe. Those you can write tensors for. For bloqs higher-up in the call graph, I'm not sure there are many cases where it's more natural to express the tensor than the decomposition

@tanujkhattar
Copy link
Collaborator

This could be an interesting thing to explore. It would be an easy change by passing a predicate to the flattening call. With the following considerations

From a users perspective, if the default behavior we choose is to decompose only till the point where a bloq has its tensors defined; then a user can do:

> bloq.tensor_contract() : Use the tensor data at the highest level possible
> bloq.as_composite_bloq().flatten().tensor_contract() : Use the tensor data at the lowest level possible

Both the workflows are well supported with a single line of code. If we choose the default to go as deep as possible; the bloq authors need to:

  1. Write a custom predicate, which is potentially error prone for a user to get right, to determine "go as low as possible"
  2. Use the cbloq_to_quimb and quimb_to_dense APIs which are non-standard and harder to discover.

With these considerations in mind, I'll suggest its better to change the default behavior and support both the workflows.

@mpharrigan
Copy link
Collaborator Author

One use case for defining my_tensors is to verify correctness of the decomposition - do something like bloq.tensor_contract() == bloq.decompose_bloq().tensor_contract(). This is similar to how we use bloq.call_classically() == cbloq.call_classicall() to verify decomposition via classical actions. The fact that these two statements imply different things for two similar protocols (classical and quantum simulation) would be confusing at best. Is there anyway to achieve this kind of verification with this new strategy?

A good unit test has the computed quantity validated against a reference value that we're pretty sure is correct. Once again: this means that we'd need non-leaf bloqs where the tensor data is somehow a natural expression of the bloq. This is the case for the classical action bloqs, but really -- the test should just use the classical action tests!

As part of this PR, I went through every bloq that had defined tensors. There were two cases where the bloq had both tensors and a decomposition (excepting some that will be leaf bloqs #873 )

  • Add: classical action; should just use that for testing. The existing tensors are wrong for negative values. The unfactorized form prevents efficient contraction of high-bit low-depth circuits
  • IntState: classical action; really just a test of the int_to_bits function. The unfactorized form really prevents efficient contraction since this is a bunch of kets

There are no tests of the form bloq.tensor_contract() == bloq.flatten().tensor_contract() or cirq equivalent.

@tanujkhattar
Copy link
Collaborator

IntState: classical action; really just a test of the int_to_bits function. The unfactorized form really prevents efficient contraction since this is a bunch of kets

Wouldn't the my_tensors implementation of IntState now just return a list of tensors instead of one big tensor? The unfactorized form vs factorized form problem here is now solved by changing the way we represent the indices and not necessarily by removing the implementation of my_tensors and flattening first ?

Both the workflows are well supported with a single line of code.. With these considerations in mind, I'll suggest its better to change the default behavior and support both the workflows.

Following my comment above, it's better to support two things than one. Please consider making this change.

@mpharrigan
Copy link
Collaborator Author

d850862 restores the flattening logic with a boolean flag. By default, we'll flatten as far as possible. This can be changed if there's ever an example where this is not desired

Comment on lines 212 to 216
if any(reg.side != Side.THRU for reg in signature):
raise ValueError(
f"CirqGateAsBloq.my_tensors is only supported for "
f"gates with thru registers. Found {gate}."
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This restriction was not present in the original _add_my_tensors_from_gate method; so it'll be nice to not add this restriction as part of this large scale refactoring. The CirqGateAsBloq specific errors should be raised in the implementation of CirqGateAsBloq; not in the implementation of _my_tensors_from_gate.

The original implementation works by first assuming that all registers are THRU registers and then extracting tensor data from the correct subspace whenever a register is not a THRU register. This is done by tensor_data_from_unitary_and_signature -- the method is still implemented but not used anymore here.

Can you please revert the restricting changes to this method make sure the more general behavior is maintained? This method is called not just from CirqGateAsBloq but also from GateWithRegisters, that may have directional registers.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should really have a test if it's behavior we're relying on. I'll revert the changes, though

Copy link
Collaborator

@tanujkhattar tanujkhattar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM % one final comment about reverting the behavior of my_tensors_from_gate and support directional registers as was done previously.

@tanujkhattar tanujkhattar enabled auto-merge (squash) July 2, 2024 18:48
@tanujkhattar tanujkhattar merged commit 9514139 into quantumlib:main Jul 2, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Quimb tensor contraction is significantly slower than cirqs state vector simulation
3 participants