Alloc vs BroadcastTo vs Second #367

ricardoV94 · 2023-07-03T10:31:31Z

Description

Alloc provides the same functionality as BroadcastTo, and seems to be the default introduced in PyTensor rewrites in graphs this:

import pytensor
import pytensor.tensor as pt

x = pt.scalar("x")
out = x + [5, 5, 5]
fn = pytensor.function([x], out)
pytensor.dprint(fn, print_type=True)

Alloc [id A] <Vector(float64, shape=(3,))> 2
 ├─ Add [id B] <Vector(float64, shape=(1,))> 1
 │  ├─ [5.] [id C] <Vector(float64, shape=(1,))>
 │  └─ ExpandDims{axis=0} [id D] <Vector(float64, shape=(1,))> 0
 │     └─ x [id E] <Scalar(float64, shape=())>
 └─ 3 [id F] <Scalar(int64, shape=())>

This is introduced usually by this helper:

pytensor/pytensor/tensor/rewriting/basic.py

Lines 90 to 117 in 36161e8

    
           def broadcast_like(value, template, fgraph, dtype=None): 
        
               """ 
        
               Return a Variable with the same shape and dtype as the template, 
        
               filled by broadcasting value through it. `value` will be cast as 
        
               necessary. 
        
               """ 
        
               value = as_tensor_variable(value) 
        
               if value.type.is_super(template.type): 
        
                   return value 
        
               if template not in fgraph.variables: 
        
                   raise NotImplementedError( 
        
                       "broadcast_like currently requires the " 
        
                       "template Variable to be in the fgraph already" 
        
                   ) 
        
               if dtype is None: 
        
                   dtype = template.dtype 
        
               value = cast(value, dtype) 
        
               if value.type.is_super(template.type): 
        
                   return value 
        
               if hasattr(fgraph, "shape_feature"): 
        
                   new_shape = fgraph.shape_feature.shape_of[template] 
        
               else: 
        
                   new_shape = template.shape 
        
               rval = alloc(value, *new_shape) 
        
               assert rval.type.dtype == dtype 
        
               return rval

It doesn't make sense to have two operators for the same functionality, so we should decide which one to support.

This was added in Aesara in aesara-devs/aesara#145
The original issue mentions the alloc / vs view question: aesara-devs/aesara#36, but it seems that could easily be achieved by a single Op by manipulating the view flag.

The text was updated successfully, but these errors were encountered:

ricardoV94 · 2023-07-03T12:05:30Z

Actually this may touch on a more general question of when to allow Ops to be views vs require new allocations for the output. This also showed up in #344.

I guess this depends on other inplace Operations. For instance, if you have a set_subtensor operation downstream you might as well allocate the outputs in new arrays from the get go.

ricardoV94 · 2023-07-04T10:33:19Z

There's also Second (aliased to Fill) which is a hackish way of doing broadcasting via an "Elemwise" Operation so that it can be present in the gradient graphs (as those must all be defined in terms of Scalar operations).

pytensor/pytensor/scalar/basic.py

Lines 826 to 830 in e20dd0b

    
           def zeros_like(self, dtype=None): 
        
               # The second is needed for Elemwise ops to work right 
        
               if dtype is None: 
        
                   dtype = str(self.type.dtype) 
        
               return second(self, ScalarConstant(get_scalar_type(dtype), 0))

pytensor/pytensor/scalar/basic.py

Lines 3850 to 3864 in e20dd0b

    
           class Imag(UnaryScalarOp): 
        
               nfunc_spec = ("imag", 1, 1) 
        
               def impl(self, x): 
        
                   return np.imag(x) 
        
               def grad(self, inputs, gout): 
        
                   (x,) = inputs 
        
                   (gz,) = gout 
        
                   if x.type in complex_types: 
        
                       return [complex(0, gz)] 
        
                   elif x.type in float_types: 
        
                       return [second(x, 0)] 
        
                   else: 
        
                       return [x.zeros_like(dtype=config.floatX)]

It seems that there is a rough organization in the rewrites, where Second is used during canonicalization and then removed during specialization.

pytensor/pytensor/tensor/rewriting/basic.py

Lines 408 to 416 in e20dd0b

    
           @register_specialize 
        
           @register_stabilize 
        
           @node_rewriter([fill]) 
        
           def local_fill_to_alloc(fgraph, node): 
        
               r"""Remove `fill`\s or replace them with `Alloc`\s. 
        
               `Alloc`\s are preferable because they replace explicit tensor dependencies 
        
               with their dependencies on those tensors' shapes, and sometimes those 
        
               shapes can be computed without needing to compute the tensors themselves.

pytensor/pytensor/tensor/rewriting/math.py

Lines 2045 to 2046 in e20dd0b

    
           # here, we are past the point of canonicalization, so we don't want 
        
           # to put in un-necessary fills.

Would be useful to understand why these we defined as the "canonical" forms. Maybe easier to merge multiple equivalent broadcasts than if they were represented as Alloc?

I am pretty sure we don't need 3 separate Ops to do the same thing here :)

ricardoV94 · 2023-07-10T18:29:05Z

BroadcastTo might be the only Op that returns a non-writeable output by default. It necessitated the addition of tag.indestructible to prevent other Ops from trying to write in place in aesara-devs/aesara#368

Otherwise, I imagine we would need that:

Every inplace Op would need to check for zero-strides in non-length 1 dimensions? Do they have to do this already anyway?
Only introduce the inplace version after all other inplace ops are in, after checking nothing is trying to destroy it's output?

We could simply remove it and continue having Alloc always be a fully allocated output.

More discussion in #361 (comment)

ricardoV94 added request discussion Op implementation labels Jul 3, 2023

ricardoV94 changed the title ~~Alloc vs BroadcastTo~~ Alloc vs BroadcastTo vs Second Jul 4, 2023

ricardoV94 mentioned this issue Jul 10, 2023

Inline constants in composite graphs #361

Merged

ricardoV94 mentioned this issue Jul 13, 2023

Add shape_unsafe tag to rewrites that can hide shape errors #381

Merged

ricardoV94 closed this as completed in #381 Aug 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Alloc vs BroadcastTo vs Second #367

Alloc vs BroadcastTo vs Second #367

ricardoV94 commented Jul 3, 2023

ricardoV94 commented Jul 3, 2023

ricardoV94 commented Jul 4, 2023 •

edited

Loading

ricardoV94 commented Jul 10, 2023 •

edited

Loading

Alloc vs BroadcastTo vs Second #367

Alloc vs BroadcastTo vs Second #367

Comments

ricardoV94 commented Jul 3, 2023

Description

ricardoV94 commented Jul 3, 2023

ricardoV94 commented Jul 4, 2023 • edited Loading

ricardoV94 commented Jul 10, 2023 • edited Loading

ricardoV94 commented Jul 4, 2023 •

edited

Loading

ricardoV94 commented Jul 10, 2023 •

edited

Loading