Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

State of pytorch infra? #229

Open
baszalmstra opened this issue Apr 4, 2024 · 11 comments
Open

State of pytorch infra? #229

baszalmstra opened this issue Apr 4, 2024 · 11 comments
Labels
help wanted Extra attention is needed question Further information is requested

Comments

@baszalmstra
Copy link
Member

Comment:

Hey dear maintainers and contributors.

Conda is used quite a bit in the ML ecosystem. It's a great option because installing pytorch for your particular system should be just conda install pytorch which would install a pytorch version targeting your version of Cuda, ROCm, or CPU architecture. I love this!

However, there are some issues that I have been facing.

I see people switching to pip or using the pytorch channel which introduce their own problems.

It looks like a number of these issues are related to infrastructure problems. I would love to contribute to improve this but Im not entirely sure where to start so Im opening this issue to start a conversation and get in contact with the people who do.

@baszalmstra baszalmstra added the question Further information is requested label Apr 4, 2024
@baszalmstra baszalmstra changed the title State of pytorch State of pytorch infra? Apr 4, 2024
@hmaarrfk
Copy link
Contributor

hmaarrfk commented Apr 4, 2024

which one of the challenges would you like to tackle. In my experience it is going to be nearly impossible to tackle them all at once.

Focus on one area of "need" and work toward it.

If you can get the builds to run on Azure, that is really easy to main, we click merge.

I might have dragged my feet on a merge request:
#225 (comment)

I know i'm being "selfish" but co-instability of pytorch and tensorflow is important to me. so i felt like between the two:

  1. Most updated versions
  2. co-installable tensorflow + pytorch

i chose #2. I could be swayed (I have my own channels that I maintain for this reason) but addressing the "tensorflow problem" is also related to this
conda-forge/tensorflow-feedstock#378

@baszalmstra
Copy link
Member Author

Personally, I think not having windows builds is a big reason to not use pytorch from conda-forge. Even if the version is not completely up to date, not having a version available is worse. ;)

However, I have been building pytorch (-cpu) with rattler-build on windows and it requires a lot of resources, I think that is also the reason the effort in #134 was halted?

But I would be happy to revive that PR if things have changed in the mean time?

@hmaarrfk
Copy link
Contributor

hmaarrfk commented Apr 4, 2024

Even if the version is not completely up to date, not having a version available is worse. ;)

I'm not sure that is true. Our migration infrastructure requires all packages to be up to date for all platforms.

So if somebody contributes a windows package one day, then gets pulled away due to other priorities, then pytorch is effectively halted.

rattler likely isn't the cause of the slowdown.

The usage of "git" to clone the large repo is slow.
The multiple GPU architectures is really problematic.

But I would be happy to revive that PR if things have changed in the mean time?

Please do! Lets see how far things go. Typically getting maintainers in sync with contributors (just in terms of time to review) can kill efforts like this.

@baszalmstra
Copy link
Member Author

I'm not sure that is true. Our migration infrastructure requires all packages to be up to date for all platforms.

No sorry I meant that not having the entire feedstock on the latest version is is less of a problem than not having a windows build at all. Its a justification for looking at the windows build before looking at bumping to the latest version. :)

rattler likely isn't the cause of the slowdown.

It isnt, but rattler-build makes it easier to iterate on the recipe and build scripts. :)

Please do!

👍

@baszalmstra
Copy link
Member Author

The usage of "git" to clone the large repo is slow.

I noticed that pytorch publishes the source (including submodule) as a .tar.gz with every release. Has it been tried to use that instead of using git? I did notice it includes symlinks which might be an issue on windows.

@hmaarrfk
Copy link
Contributor

hmaarrfk commented Apr 5, 2024

I noticed that pytorch publishes the source (including submodule)

this might be new.

This would be an appreciated change independent of windows as currently about 20-25 mins is spent cloning the repo.

PR welcome.

@hmaarrfk hmaarrfk added the help wanted Extra attention is needed label Sep 26, 2024
@roaldarbol
Copy link

roaldarbol commented Nov 22, 2024

Sorry to jump in - I'm trying to figure out how using the conda-forge pytorch differs from the (soon deprecated) PyTorch channel, and simple pip installation - especially as an end-user.

installing pytorch for your particular system should be just conda install pytorch which would install a pytorch version targeting your version of Cuda, ROCm, or CPU architecture.

@baszalmstra opened with this statement; is that the case? That one's system should get automatically detected irrespective of CUDA presence and version? If so, that's amazing. I've been posed the question (https://gitlab.com/polavieja_lab/idtrackerai/-/issues/88#note_2222650555), since the Start Locally PyTorch page has been a much used starting point to ensure that users installed the right dependencies depending on their setup, so specifying the setup used to be a user action; if one relies on the conda-forge package, is it still reliant on the user knowing their setup or is that automatically figured out by your preferred installation tool (conda/mamba/pixi)?

Additionally, since there's both pytorch, pytorch-cpu and pytorch-gpu - what's the difference? Can one safely use pytorch and it will use the appropriate build (cpu or gpu), or does that need specification?

Currently, users of such projects are asked to install pytorch separately, exactly because of the inability to know the users setup. If, however, this is automatically figured out, would that also mean that it would be possible to instead add conda-forge pytorch to a recipe, and be certain that the user gets the correct installation?

I hope this makes sense and is the appropriate place to ask, otherwise I'm happy to be re-routed. :-)

@rgommers
Copy link

@baszalmstra opened with this statement; is that the case? That one's system should get automatically detected irrespective of CUDA presence and version?

Yes, that should be the case.

Additionally, since there's both pytorch, pytorch-cpu and pytorch-gpu - what's the difference? Can one safely use pytorch and it will use the appropriate build (cpu or gpu), or does that need specification?

pytorch will try to do the "right thing", which is find the CUDA version you need on a CUDA-capable machine. I've found pytorch-cpu and pytorch-gpu useful to be more explicit, and you need those meta-packages when you want something other than the default. E.g., often you want pytorch-cpu for testing, because it is O(200 MB) installed, while the CUDA variant is O(7 GB).

Some Conda-forge specific PyTorch docs on this would be quite nice, I'm just not sure what a good place is to put them. There's a couple of very brief mentions aimed at maintainers at https://conda-forge.org/docs/maintainer/knowledge_base/, but there isn't a good place in https://conda-forge.org/docs/user/ to put user-focused package-specific documentation I think.

@rgommers
Copy link

I would say that the conda-forge situation is significantly better than the PyPI situation for dependency management, since you can actually express a dependency on pytorch and when the user installs the package that depends on PyTorch then they get the right version. On PyPI, the dependency cannot be correctly expressed, since you always get the torch version hosted on PyPI itself (currently for CUDA 12.4).

@roaldarbol
Copy link

Thanks a lot @rgommers that's already immensely helpful! I agree that some user-facing docs would be really helpful; let me know if there's anything I can do to help on that side (I'm currently snowed under wrapping up a PhD thesis, but coming out on the other side on a few months...).

@hmaarrfk
Copy link
Contributor

We have some small ones here

https://conda-forge.org/docs/user/tipsandtricks/#installing-cuda-enabled-packages-like-tensorflow-and-pytorch

That could be expanded on in the direction you see best

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants