Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Costs] Framework #913

Merged
merged 15 commits into from
May 15, 2024

Conversation

mpharrigan
Copy link
Collaborator

  • base class CostKey which can be overriden to define a "cost" (type).
  • functions for getting a specified cost for a specified bloq. This explicitly manages a cache of values and static overrides
  • extensible method on bloq for statically specifying costs. New costkeys can be added without needing to modify existing bloqs' cost methods.

Part of #899

Actual implementations of costs to follow.

For something like the existing t complexity protocol: the recursive computation would be wrapped in a CostKey implementation and the bespoke _t_complexity_ override would move to my_static_costs. We use the NotImplemented return value instead of using getattr to see if the method itself exists or not

@mpharrigan mpharrigan requested a review from tanujkhattar April 30, 2024 22:08
@mpharrigan mpharrigan mentioned this pull request Apr 30, 2024
6 tasks
@mpharrigan
Copy link
Collaborator Author

I can't reproduce the failure locally. Is this a python 3.10 vs 11 thing?

@mpharrigan
Copy link
Collaborator Author

@tanujkhattar ptal

Copy link
Collaborator

@tanujkhattar tanujkhattar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Looks pretty good!

# will be cached.
def _get_cost_val_internal(callee: 'Bloq'):
return _get_cost_value(callee, cost_key, costs_cache=costs_cache, generalizer=generalizer)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a docstring for part b. ? It's a little awkward as a reader to see part (a) and then nothing else.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whoops done

Comment on lines +228 to +230
This function can be used to annotate a call graph diagram with multiple costs
for each bloq. Specifically, the return value of this function can be used as the
`bloq_data` argument to `GraphvizCallGraph`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only concern here is that this function assumes an implicit "call graph" where the leaf nodes are the ones which have my_static_costs implemented. Does it make sense to change the API to accept either (a) an nx.DiGraph returned by calling bloq.call_graph() or (b) *bloqs instead of bloq. So that one can query costs for a call graph where intermediate nodes may have explicit annotations on them but there's still value in visualizing the costs of their callees.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see.. you want to force inclusion of bloqs that would not be traversed in the actual computation of the cost. This can always be added as a new function (the body is a simple wrapper over the other functions); but I'd be wary of situations where you'd want to show callee costs that don't actually factor in to their parents' (callers) computation

can optionally provide a value, which will be preferred over a computed cost.

Static costs can be provided if the particular cost cannot be easily computed or
as a performance optimization.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One issue here is that my_static_costs method does not have access to the costs cache. So, if it were to do any non-trivial computation where it needs to compute its cost in terms of the cost of its callees, then it wouldn't be able to pass on the cache to get_cost_value method.

We had faced this issue with TComplexity protocol and that's why we ended up using a global cache. In this framework, we should maybe accept a cache dictionary as another argument to this method since the call site already has it ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah the intent here is for my_static_costs to be used rarely and only for truly static costs that you can just declare. Consider it the generic analog to the _t_complexity_ override where you're really only allowed to fill in numbers.

The cache is available to the compute method, which is analogous to every strategy except the t_complexity._from_explicit_annotation strategy in the t complexity protocol.

I'll update some docstrings to clarify the expected use of my_static_costs.

Copy link
Collaborator

@tanujkhattar tanujkhattar May 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that, but I think there's value in supporting the use cases where my_static_costs needs to do something more complex than specifying just numbers.

An example would be supporting the cost key for circuit depth. The compute method would correspond to calling bloq.decompose_bloq() and then (one strategy would be to) find the longest path in a weighted DAG where the weights are given by depth of each subbloq. This will take time at least O(N) if the decomposition has N nodes (unlike other costs like T-complexity, which scale as O(M) where M is number of unique callees and often M << N)

Now, if we stick to the strategy where my_static_costs specifies just numbers and compute works as described above, the depth computation would be prohibitively slow for a wide variety of bloqs where decomposing the bloq would result in a large number of nodes. Some concrete examples are bloqs based on unary iteration, arithmetic circuits like adders etc.

In all these examples, as bloq authors we very well know the circuit structure and can easily annotate my_static_costs. But it's not enough to specify "a number" - we need the my_static_costs to be an expression in terms of costs of its callees. For unary iteration based bloqs, the my_static_costs would be something like c * N + sum(get_bloq_cost(bloq_for_nth_operation(n)) for n in range(self.selection_range)) - i.e. a constant for the unary iteration tree + depth for each controlled operation. For arithmetic circuits, we'd likely want to specify costs as a function of get_cost(Toffoli); which users can choose to specify by providing the cost dictionary since a Toffoli can have multiple different decompositions optimizing depth vs T-counts etc. and users can choose which decomposition should be used by providing a cost for depth of Toffoli.

Tl;Dr - I think its restrictive to assume that my_static_costs would only ever specify truly static costs and I think there's value in making it extensible to support the use case where it'll recursively need and use costs for it's callees.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the idea is to group any of this logic in the compute method, which can dispatch based on the type of bloq (analogous to how we provide costs for leaf bloqs in the existing t_complexity protocol).

We should also aim to reduce the number of bloqs where doing an O(N) operation is considered prohibitively costly. In your example, you still need to iterate over self.selection_range bloqs. I guess the presumption is that the constant term absorbs a number of bit-twiddling subbloqs that meaningfully exceeds the selection_range; but we should probably drive our efforts towards encoding the "simple" structure of unary iteration in the bloq decomposition hierarchy

Copy link
Collaborator

@tanujkhattar tanujkhattar May 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the idea is to group any of this logic in the compute method, which can dispatch based on the type of bloq (analogous to how we provide costs for leaf bloqs in the existing t_complexity protocol).

Yeah, I think that would just not be feasible for the cost types that need to look at the decompose composite bloq instead of just the bloq counts. I've raised this concern before and I provide a very concrete exampel above.

In your example, you still need to iterate over self.selection_range bloqs. I guess the presumption is that the constant term absorbs a number of bit-twiddling subbloqs that meaningfully exceeds the selection_range

For this specific example, the presumption is that building the composite bloq can be much slower than iterating over the callees, even if both of these take O(N) time. #609 (comment) was an example of how much of a different this can make.

We should also aim to reduce the number of bloqs where doing an O(N) operation is considered prohibitively costly

Well, that's a very optimistic goal and I don't think it aligns well with the rest of the philosophy of Qualtran. One of the reasons we have the concept of a call graph is because we think there will be cases where constructing a composite bloq in O(N) time would be prohibitively more expensive than just specifying a list of callees and their bloq counts.

The same logic applies to costs that depend upon the structure of the decomposed composite bloq, like depth. There will be cases where computing the composite bloq and then processing it to figure out the cost would be prohibitively more expensive than writing an expression for computing the cost in terms of the callees. For costs like T-complexity, this "expression" is simple and can be generalized and put as part of the compute method but for costs like depth this "expression" needs to live inside the bloq. That place can be my_static_costs (or we can rename it to be more descriptive for this use case). But as the design stands right now, we don't support this flow at all where users can write the expression for computing a cost for a bloq (in terms of its callees) as part of the bloq itself.

For some more concrete numbers, here are the timings to construct a composite bloq for a 1000 and 5000 bit adder. You can see this is already getting very expensive and these are numbers which we encounter often.

image

So now, if a user were to compute the cost of an adder for depth, they would either not override my_static_costs and we'll pay the time penalty of calling adder.decompose_bloq() (which is huge) OR the user would need to encode an expression that hardcodes the depth of subbloqs like AND gates; which isn't ideal and is similar to what we were doing in T-complexity and got rid of by introducing the call graph hierarchy. The ideal solution here would be to allow users to specify things like adder_depth = c1 * cost(And()) + c2 * cost(And().adjoint()) + c3 * cost(CNOT()) where c1, c2 and c3 are constants which are known to the bloq authors because they know the structure of the decomposition. A function similar to my_static_costs can be used for such annotations.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can put that logic in the compute method.

def compute():
  if isisntance(bloq, Add):
    return c1 * cost(And())

  return sum(depth(b) for b in b.decompose())

Remember: the extensibility axis is to support adding more costs. This is particularly acute for circuit depth, which isn't the most important cost for certain architectures.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can put that logic in the compute method.

This means every time a bloq author writes a bloq which is slow to decompose for large instances, they would have to update the compute method of the cost key - this does not scale at all (unless we assume all bloqs are authored by us and we maintain a central repository of these if / else overloads in the compute method of the cost key).

Remember: the extensibility axis is to support adding more costs. This is particularly acute for circuit depth, which isn't the most important cost for certain architectures

The argument you have given above completely ignores the ease of extensibility for more costs. Since the PR is already merged now, I'll open a new issue to track this so that whenever we end up implementing a cost like depth which depends upon the structure of decomposition, we can revisit this discussion.

@@ -0,0 +1,32 @@
# Copyright 2023 Google LLC
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update 2023 to 2024 everywhere.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gotta update my pycharm setting that puts these in automatically :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants