Support torch.compile #89

alihassanijr · 2024-01-14T20:30:35Z

NATTEN ops break graphs because they are not registered as native torch ops.

The correct way to do this since pytorch 2.0 is to register C++ ops through torch library and not torch extensions.

Notes

It's still unclear whether or not the different op registration will avoid splitting graphs, and upon my first attempt I'm finding that the torch API for registering custom ops is probably incomplete.

For example, defining signatures with int arguments works fine, as it should, and native torch ops definitely use int args as well.
But when we get to bind a C++ symbol to said defined signature, we hit a static assertion failure that explicitly prevents int/int32_t, but allows int8_t and int64_t.
If we force our custom ops to take in int64_t instead, then we'll hit an error when we try to call these ops from python saying there's no conversion between int and int64_t.

References

The text was updated successfully, but these errors were encountered:

zou3519 · 2024-04-11T12:43:39Z

I'm on the PyTorch team working on custom ops. "int" in the schema string corresponds to a C++ function that accepts int64_t.

If we force our custom ops to take in int64_t instead, then we'll hit an error when we try to call these ops from python saying there's no conversion between int and int64_t.

We've never seen this error before. Do you have a branch that demonstrates what you've tried?

alihassanijr · 2024-04-11T12:57:17Z

Thank you for your feedback @zou3519 .
So the comment above is a bit outdated. As of last month NATTEN's op interfaces have had a change in their signature, so now we're taking in Tensors, integer tuples, and a boolean tuple.

If I recall correctly, registering would also fail on std::tuples, and I did find an equivalent for ints at the time, but I didn't find anything for std::tuple of boolean types.

I'll try and start a new branch today and replicate the issue.

Unrelated question: since our last update, NATTEN also comes with an "auto-tuner" that can be optionally enabled by the user. Some of our kernels have multiple possible configurations that differ in performance, and the autotuner is that it benchmarks new problem sizes (according to tensor shape and dtype) with all the different configs and picks and caches the fastest one.

Right now it's happening through torch and inside the graph, so I'm guessing this particular feature might not play nicely with graph mode, and torch.compile, but I'm not sure about that. If this will be an issue, would it be better if all of this happened at the C++ level and beyond the point where we use torch API, so that torch is completely unaware?

zou3519 · 2024-04-11T13:13:06Z

If I recall correctly, registering would also fail on std::tuples, and I did find an equivalent for ints at the time, but I didn't find anything for std::tuple of boolean types.

int[] arguments maps to c10::ArrayRef<int64_t>. int[] outputs map to std::vector<int64_t>
bool[] arguments maps to c10::ArrayRef<bool>. bool[] outputs map to std::vector<bool>

Right now it's happening through torch and inside the graph, so I'm guessing this particular feature might not play nicely with graph mode, and torch.compile, but I'm not sure about that. If this will be an issue, would it be better if all of this happened at the C++ level and beyond the point where we use torch API, so that torch is completely unaware?

It's hard to say, torch.compile has gotten better at handling code in Python. You should try torch.compile over that first and if it doesn't work then put it at the C++ level.

alihassanijr · 2024-04-11T13:14:00Z

Oh right, I just opened up the GDoc, and I'm seeing it's all in Python now. That's fantastic!
I'll give it a try right away!

alihassanijr · 2024-05-02T20:27:34Z

@zou3519 I think I might have mislabeled this issue.

torch.compile was already functional without making any changes, as I remember it was back when torch 2.0 was released.

The real issue is that the graph breaks on custom ops, and I seem to have misunderstood that pytorch/pytorch#97295 solves that issue (at least according to pytorch/pytorch#101191).

I am confused by one thing though; if I compile with fullgraph=True, and register the op natively, dynamo fails with this error:

torch._dynamo.exc.Unsupported: autograd.Function with body that accepts non-Tensors as input. Got: <class 'tuple'>

Is there some other way we're supposed to pass int/bool tuples/arrays to autograd functions at the python level?
I tried googling around a bit, bit pytorch/pytorch#114777 was all I found, and that just seems to add support for enum types.

By the way I'm sorry to leave this issue behind for 3 weeks; I had already changed so much in the API that it was difficult to branch out my changes and push them, so I waited until everything got merged.
But anyway, I think I should close this issue and reopen one with a more accurate description until there's a solution to custom ops breaking the graph.

zou3519 · 2024-05-21T13:44:58Z

Sorry I was out of town for a bit.

The real issue is that the graph breaks on custom ops

Do you have a repro for this?

I am confused by one thing though; if I compile with fullgraph=True, and register the op natively, dynamo fails with this error:

This one should be fixed on the latest nightlies

alihassanijr · 2024-05-22T03:10:31Z

No worries, I appreciate the help.

Do you have a repro for this?

Yes, here you go:

https://gist.github.com/alihassanijr/0b1b0277771a72b5eedad4db9e552876

Please let me know if anything else.

This one should be fixed on the latest nightlies

Fantastic. I'll start registering the ops and push out a new release when it makes it into the next torch release.

alihassanijr added the enhancement New feature or request label Jan 14, 2024

alihassanijr added this to the torch.compile support milestone Jan 14, 2024

Chillee mentioned this issue Apr 11, 2024

Performance compared to local-attention + masking #120

Open

alihassanijr linked a pull request Oct 7, 2024 that will close this issue

Support torch.compile #170

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support torch.compile #89

Support torch.compile #89

alihassanijr commented Jan 14, 2024 •

edited

Loading

zou3519 commented Apr 11, 2024

alihassanijr commented Apr 11, 2024

zou3519 commented Apr 11, 2024

alihassanijr commented Apr 11, 2024

alihassanijr commented May 2, 2024

zou3519 commented May 21, 2024

alihassanijr commented May 22, 2024

Support torch.compile #89

Support torch.compile #89

Comments

alihassanijr commented Jan 14, 2024 • edited Loading

Notes

References

zou3519 commented Apr 11, 2024

alihassanijr commented Apr 11, 2024

zou3519 commented Apr 11, 2024

alihassanijr commented Apr 11, 2024

alihassanijr commented May 2, 2024

zou3519 commented May 21, 2024

alihassanijr commented May 22, 2024

alihassanijr commented Jan 14, 2024 •

edited

Loading