State of pytorch infra? #229

baszalmstra · 2024-04-04T13:12:38Z

Comment:

Hey dear maintainers and contributors.

Conda is used quite a bit in the ML ecosystem. It's a great option because installing pytorch for your particular system should be just conda install pytorch which would install a pytorch version targeting your version of Cuda, ROCm, or CPU architecture. I love this!

However, there are some issues that I have been facing.

There are no packages for Windows (Windows builds #32)
The feedstocks latest version 2.1.2 is (almost) 4 months old. (Pytorch 2.2.1 and libabseil 20240116, libgrp 1.61, libprotobuf 4.25.2 #221)
No ROCm support (Add ROCm variant? #198)

I see people switching to pip or using the pytorch channel which introduce their own problems.

It looks like a number of these issues are related to infrastructure problems. I would love to contribute to improve this but Im not entirely sure where to start so Im opening this issue to start a conversation and get in contact with the people who do.

The text was updated successfully, but these errors were encountered:

hmaarrfk · 2024-04-04T13:44:10Z

which one of the challenges would you like to tackle. In my experience it is going to be nearly impossible to tackle them all at once.

Focus on one area of "need" and work toward it.

If you can get the builds to run on Azure, that is really easy to main, we click merge.

I might have dragged my feet on a merge request:
#225 (comment)

I know i'm being "selfish" but co-instability of pytorch and tensorflow is important to me. so i felt like between the two:

Most updated versions
co-installable tensorflow + pytorch

i chose #2. I could be swayed (I have my own channels that I maintain for this reason) but addressing the "tensorflow problem" is also related to this
conda-forge/tensorflow-feedstock#378

baszalmstra · 2024-04-04T14:01:09Z

Personally, I think not having windows builds is a big reason to not use pytorch from conda-forge. Even if the version is not completely up to date, not having a version available is worse. ;)

However, I have been building pytorch (-cpu) with rattler-build on windows and it requires a lot of resources, I think that is also the reason the effort in #134 was halted?

But I would be happy to revive that PR if things have changed in the mean time?

hmaarrfk · 2024-04-04T14:28:28Z

Even if the version is not completely up to date, not having a version available is worse. ;)

I'm not sure that is true. Our migration infrastructure requires all packages to be up to date for all platforms.

So if somebody contributes a windows package one day, then gets pulled away due to other priorities, then pytorch is effectively halted.

rattler likely isn't the cause of the slowdown.

The usage of "git" to clone the large repo is slow.
The multiple GPU architectures is really problematic.

But I would be happy to revive that PR if things have changed in the mean time?

Please do! Lets see how far things go. Typically getting maintainers in sync with contributors (just in terms of time to review) can kill efforts like this.

baszalmstra · 2024-04-04T14:38:39Z

I'm not sure that is true. Our migration infrastructure requires all packages to be up to date for all platforms.

No sorry I meant that not having the entire feedstock on the latest version is is less of a problem than not having a windows build at all. Its a justification for looking at the windows build before looking at bumping to the latest version. :)

rattler likely isn't the cause of the slowdown.

It isnt, but rattler-build makes it easier to iterate on the recipe and build scripts. :)

Please do!

👍

baszalmstra · 2024-04-05T08:09:08Z

The usage of "git" to clone the large repo is slow.

I noticed that pytorch publishes the source (including submodule) as a .tar.gz with every release. Has it been tried to use that instead of using git? I did notice it includes symlinks which might be an issue on windows.

hmaarrfk · 2024-04-05T10:51:55Z

I noticed that pytorch publishes the source (including submodule)

this might be new.

This would be an appreciated change independent of windows as currently about 20-25 mins is spent cloning the repo.

PR welcome.

roaldarbol · 2024-11-22T10:53:09Z

Sorry to jump in - I'm trying to figure out how using the conda-forge pytorch differs from the (soon deprecated) PyTorch channel, and simple pip installation - especially as an end-user.

installing pytorch for your particular system should be just conda install pytorch which would install a pytorch version targeting your version of Cuda, ROCm, or CPU architecture.

@baszalmstra opened with this statement; is that the case? That one's system should get automatically detected irrespective of CUDA presence and version? If so, that's amazing. I've been posed the question (https://gitlab.com/polavieja_lab/idtrackerai/-/issues/88#note_2222650555), since the Start Locally PyTorch page has been a much used starting point to ensure that users installed the right dependencies depending on their setup, so specifying the setup used to be a user action; if one relies on the conda-forge package, is it still reliant on the user knowing their setup or is that automatically figured out by your preferred installation tool (conda/mamba/pixi)?

Additionally, since there's both pytorch, pytorch-cpu and pytorch-gpu - what's the difference? Can one safely use pytorch and it will use the appropriate build (cpu or gpu), or does that need specification?

Currently, users of such projects are asked to install pytorch separately, exactly because of the inability to know the users setup. If, however, this is automatically figured out, would that also mean that it would be possible to instead add conda-forge pytorch to a recipe, and be certain that the user gets the correct installation?

I hope this makes sense and is the appropriate place to ask, otherwise I'm happy to be re-routed. :-)

rgommers · 2024-11-22T11:13:43Z

@baszalmstra opened with this statement; is that the case? That one's system should get automatically detected irrespective of CUDA presence and version?

Yes, that should be the case.

Additionally, since there's both pytorch, pytorch-cpu and pytorch-gpu - what's the difference? Can one safely use pytorch and it will use the appropriate build (cpu or gpu), or does that need specification?

pytorch will try to do the "right thing", which is find the CUDA version you need on a CUDA-capable machine. I've found pytorch-cpu and pytorch-gpu useful to be more explicit, and you need those meta-packages when you want something other than the default. E.g., often you want pytorch-cpu for testing, because it is O(200 MB) installed, while the CUDA variant is O(7 GB).

Some Conda-forge specific PyTorch docs on this would be quite nice, I'm just not sure what a good place is to put them. There's a couple of very brief mentions aimed at maintainers at https://conda-forge.org/docs/maintainer/knowledge_base/, but there isn't a good place in https://conda-forge.org/docs/user/ to put user-focused package-specific documentation I think.

rgommers · 2024-11-22T11:21:18Z

I would say that the conda-forge situation is significantly better than the PyPI situation for dependency management, since you can actually express a dependency on pytorch and when the user installs the package that depends on PyTorch then they get the right version. On PyPI, the dependency cannot be correctly expressed, since you always get the torch version hosted on PyPI itself (currently for CUDA 12.4).

roaldarbol · 2024-11-22T12:07:50Z

Thanks a lot @rgommers that's already immensely helpful! I agree that some user-facing docs would be really helpful; let me know if there's anything I can do to help on that side (I'm currently snowed under wrapping up a PhD thesis, but coming out on the other side on a few months...).

hmaarrfk · 2024-11-22T12:44:17Z

We have some small ones here

https://conda-forge.org/docs/user/tipsandtricks/#installing-cuda-enabled-packages-like-tensorflow-and-pytorch

That could be expanded on in the direction you see best

baszalmstra added the question Further information is requested label Apr 4, 2024

baszalmstra changed the title ~~State of pytorch~~ State of pytorch infra? Apr 4, 2024

hmaarrfk added the help wanted Extra attention is needed label Sep 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

State of pytorch infra? #229

State of pytorch infra? #229

baszalmstra commented Apr 4, 2024

hmaarrfk commented Apr 4, 2024

baszalmstra commented Apr 4, 2024

hmaarrfk commented Apr 4, 2024

baszalmstra commented Apr 4, 2024

baszalmstra commented Apr 5, 2024

hmaarrfk commented Apr 5, 2024

roaldarbol commented Nov 22, 2024 •

edited

Loading

rgommers commented Nov 22, 2024

rgommers commented Nov 22, 2024

roaldarbol commented Nov 22, 2024

hmaarrfk commented Nov 22, 2024

State of pytorch infra? #229

State of pytorch infra? #229

Comments

baszalmstra commented Apr 4, 2024

Comment:

hmaarrfk commented Apr 4, 2024

baszalmstra commented Apr 4, 2024

hmaarrfk commented Apr 4, 2024

baszalmstra commented Apr 4, 2024

baszalmstra commented Apr 5, 2024

hmaarrfk commented Apr 5, 2024

roaldarbol commented Nov 22, 2024 • edited Loading

rgommers commented Nov 22, 2024

rgommers commented Nov 22, 2024

roaldarbol commented Nov 22, 2024

hmaarrfk commented Nov 22, 2024

roaldarbol commented Nov 22, 2024 •

edited

Loading