Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alloc vs BroadcastTo vs Second #367

Closed
ricardoV94 opened this issue Jul 3, 2023 · 3 comments · Fixed by #381
Closed

Alloc vs BroadcastTo vs Second #367

ricardoV94 opened this issue Jul 3, 2023 · 3 comments · Fixed by #381

Comments

@ricardoV94
Copy link
Member

Description

Alloc provides the same functionality as BroadcastTo, and seems to be the default introduced in PyTensor rewrites in graphs this:

import pytensor
import pytensor.tensor as pt

x = pt.scalar("x")
out = x + [5, 5, 5]
fn = pytensor.function([x], out)
pytensor.dprint(fn, print_type=True)
Alloc [id A] <Vector(float64, shape=(3,))> 2
 ├─ Add [id B] <Vector(float64, shape=(1,))> 1
 │  ├─ [5.] [id C] <Vector(float64, shape=(1,))>
 │  └─ ExpandDims{axis=0} [id D] <Vector(float64, shape=(1,))> 0
 │     └─ x [id E] <Scalar(float64, shape=())>
 └─ 3 [id F] <Scalar(int64, shape=())>

This is introduced usually by this helper:

def broadcast_like(value, template, fgraph, dtype=None):
"""
Return a Variable with the same shape and dtype as the template,
filled by broadcasting value through it. `value` will be cast as
necessary.
"""
value = as_tensor_variable(value)
if value.type.is_super(template.type):
return value
if template not in fgraph.variables:
raise NotImplementedError(
"broadcast_like currently requires the "
"template Variable to be in the fgraph already"
)
if dtype is None:
dtype = template.dtype
value = cast(value, dtype)
if value.type.is_super(template.type):
return value
if hasattr(fgraph, "shape_feature"):
new_shape = fgraph.shape_feature.shape_of[template]
else:
new_shape = template.shape
rval = alloc(value, *new_shape)
assert rval.type.dtype == dtype
return rval

It doesn't make sense to have two operators for the same functionality, so we should decide which one to support.

This was added in Aesara in aesara-devs/aesara#145
The original issue mentions the alloc / vs view question: aesara-devs/aesara#36, but it seems that could easily be achieved by a single Op by manipulating the view flag.

@ricardoV94
Copy link
Member Author

Actually this may touch on a more general question of when to allow Ops to be views vs require new allocations for the output. This also showed up in #344.

I guess this depends on other inplace Operations. For instance, if you have a set_subtensor operation downstream you might as well allocate the outputs in new arrays from the get go.

@ricardoV94 ricardoV94 changed the title Alloc vs BroadcastTo Alloc vs BroadcastTo vs Second Jul 4, 2023
@ricardoV94
Copy link
Member Author

ricardoV94 commented Jul 4, 2023

There's also Second (aliased to Fill) which is a hackish way of doing broadcasting via an "Elemwise" Operation so that it can be present in the gradient graphs (as those must all be defined in terms of Scalar operations).

def zeros_like(self, dtype=None):
# The second is needed for Elemwise ops to work right
if dtype is None:
dtype = str(self.type.dtype)
return second(self, ScalarConstant(get_scalar_type(dtype), 0))

class Imag(UnaryScalarOp):
nfunc_spec = ("imag", 1, 1)
def impl(self, x):
return np.imag(x)
def grad(self, inputs, gout):
(x,) = inputs
(gz,) = gout
if x.type in complex_types:
return [complex(0, gz)]
elif x.type in float_types:
return [second(x, 0)]
else:
return [x.zeros_like(dtype=config.floatX)]

It seems that there is a rough organization in the rewrites, where Second is used during canonicalization and then removed during specialization.

@register_specialize
@register_stabilize
@node_rewriter([fill])
def local_fill_to_alloc(fgraph, node):
r"""Remove `fill`\s or replace them with `Alloc`\s.
`Alloc`\s are preferable because they replace explicit tensor dependencies
with their dependencies on those tensors' shapes, and sometimes those
shapes can be computed without needing to compute the tensors themselves.

# here, we are past the point of canonicalization, so we don't want
# to put in un-necessary fills.

Would be useful to understand why these we defined as the "canonical" forms. Maybe easier to merge multiple equivalent broadcasts than if they were represented as Alloc?

I am pretty sure we don't need 3 separate Ops to do the same thing here :)

@ricardoV94
Copy link
Member Author

ricardoV94 commented Jul 10, 2023

BroadcastTo might be the only Op that returns a non-writeable output by default. It necessitated the addition of tag.indestructible to prevent other Ops from trying to write in place in aesara-devs/aesara#368

Otherwise, I imagine we would need that:

  1. Every inplace Op would need to check for zero-strides in non-length 1 dimensions? Do they have to do this already anyway?
  2. Only introduce the inplace version after all other inplace ops are in, after checking nothing is trying to destroy it's output?

We could simply remove it and continue having Alloc always be a fully allocated output.

More discussion in #361 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant