Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyTorch 2.5.0 #277

Open
wants to merge 16 commits into
base: main
Choose a base branch
from
Open

Conversation

jeongseok-meta
Copy link

@jeongseok-meta jeongseok-meta commented Oct 17, 2024

Checklist

  • Used a personal fork of the feedstock to propose changes
  • Bumped the build number (if the version is unchanged)
  • Reset the build number to 0 (if the version changed)
  • Re-rendered with the latest conda-smithy (Use the phrase @conda-forge-admin, please rerender in a comment in this PR for automated rerendering)
  • Ensured the license file is being packaged.

@jeongseok-meta jeongseok-meta marked this pull request as ready for review October 17, 2024 22:04
@jeongseok-meta jeongseok-meta marked this pull request as draft October 17, 2024 22:04
@jeongseok-meta
Copy link
Author

@conda-forge-admin, please rerender

@conda-forge-admin
Copy link
Contributor

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml) and found it was in an excellent condition.

@conda-forge-admin
Copy link
Contributor

Hi! This is the friendly automated conda-forge-webservice.

I tried to rerender for you, but it looks like there was nothing to do.

This message was generated by GitHub actions workflow run https://github.com/conda-forge/conda-forge-webservices/actions/runs/11393752191.

@hmaarrfk
Copy link
Contributor

You might need to update my patch to help find numpy, my suggestion didn't seem like it went through upstream....

@jeongseok-meta
Copy link
Author

Submitted pytorch/pytorch#138287 for the nvtx patch

@jeongseok-meta
Copy link
Author

Woohoo, it passed the cmake configuration stage and is now in building stage. By the way, I don't have permissions to run several CI jobs. Could I get those permissions or would someone like to take over this PR or create a new one?

@hmaarrfk
Copy link
Contributor

just use azure for now, that CI gets cloggled anyway, then you wait for ever ;)

@hmaarrfk
Copy link
Contributor

if you can get to the 6 hour timeout, we can switch back to the larger runner

@jeongseok-meta
Copy link
Author

@conda-forge-admin, please rerender

1 similar comment
@jeongseok-meta
Copy link
Author

@conda-forge-admin, please rerender

@@ -20,6 +20,6 @@ github_actions:
os_version:
linux_64: cos7
provider:
linux_64: github_actions
linux_64: azure
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a 1 minute timeout stated at the top of the file too.

@jeongseok-meta
Copy link
Author

@conda-forge-admin, please rerender

@hmaarrfk
Copy link
Contributor

There are other things you have to patch out.

torch 2.5.0.post100 has requirement sympy==1.13.1; python_version >= "3.9", but you have sympy 1.13.3.

recipe/meta.yaml Outdated
@@ -303,6 +301,7 @@ outputs:
requires:
- {{ compiler('c') }}
- {{ compiler('cxx') }}
- boto3 # [osx and x86]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the condition of osx and x86 seems dubious. its either always required (should be in the run section), or just required for testing (should be enabled for all architectures).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, it's not in the root requirements.txt but in some CI requirements. Let me remove the condition for now.

recipe/meta.yaml Outdated
@@ -277,7 +277,7 @@ outputs:
# other requirements
- python
- typing_extensions >=4.8.0
- sympy >=1.13.1
- sympy 1.13.1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for the rapid fire:
is there a known bug that makes sympy 1.13.3 incompatible with pytorch 2.4 or 2.5? otherwise i would rather unpin.

This kind of pinning is great for tight distributions, but it goes against the sharing and interoperability of cocnda-forge.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unpin = add a patch to make the pip check more forgiving.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems there ware test failures on windows and mac with 1.13.2: pytorch/pytorch#133235

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if the issue has been resolved in version 1.13.2. Let's patch to allow version 1.13.2 (or > 1.13.2) and see if the failures disappear (uncertain about how to test it yet though..)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jeongseok-meta it may be appropriate to sometimes add !=1.13.2 and adding the PR you have as a reference.

@atalman are you able to point us to the exact tests that were failing so we can test with 1.13.3?

@jeongseok-meta jeongseok-meta force-pushed the pytorch_250 branch 2 times, most recently from 213cb75 to 1b40828 Compare October 18, 2024 16:53
recipe/meta.yaml Outdated Show resolved Hide resolved
pytorchmergebot pushed a commit to pytorch/pytorch that referenced this pull request Oct 20, 2024
`simpy` was pinned to version 1.13.1 due to test failures with version 1.13.2 on Windows and mac, as reported in #133235. Now that a newer version, 1.13.3, has been released, this PR aims to verify if the test failure has been resolved and also allow building with newer versions for packaging purposes (e.g., conda-forge/pytorch-cpu-feedstock#277 (comment)).
Pull Request resolved: #138338
Approved by: https://github.com/Skylion007, https://github.com/malfet

Co-authored-by: Nikita Shulga <[email protected]>
@jeongseok-meta
Copy link
Author

@conda-forge-admin, please rerender

@hmaarrfk
Copy link
Contributor

Thank you @jeongseok-meta this is amazing.

it may be time to re-enable the ci-runners and i can make a dummy commit ;)

@jeongseok-meta
Copy link
Author

Now I'm finally able to get a powerful linux machine and do local testing. So far, linux_64_* builds work fine, but I encountered other issues with linux_aarch6_*. Let me try to fix them and share the results (I might need help).

@jslee02
Copy link

jslee02 commented Oct 25, 2024

(sorry for mixing different accounts)

Here are the build logs: log_files.zip

It seems it fails to build with distutils, which I'm not certain how to resolve it at the moment:

error
  [3887/7514] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/UfuncCUDA_add.cu.o
  [3888/7514] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ReduceNormKernel.cu.o
  [3889/7514] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/LogcumsumexpKernel.cu.o
  [3890/7514] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/sparse/cuda/SparseSemiStructuredLinear.cu.o
  ninja: build stopped: subcommand failed.
  �[1;31merror�[0m: �[1msubprocess-exited-with-error�[0m
  
  �[31m�[0m �[32mpython setup.py bdist_wheel�[0m did not run successfully.
  �[31m│�[0m exit code: �[1;36m1�[0m
  �[31m╰─>�[0m See above for output.
  
  �[1;35mnote�[0m: This error originates from a subprocess, and is likely not a problem with pip.
  �[1;35mfull command�[0m: �[34m/home/conda/feedstock_root/build_artifacts/libtorch_1729889670574/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_pl/bin/python -u -c '�[0m
�[34m  exec(compile('"'"''"'"''"'"'�[0m
�[34m  # This is <pip-setuptools-caller> -- a caller that pip uses to run setup.py�[0m
�[34m  #�[0m
�[34m  # - It imports setuptools before invoking setup.py, to enable projects that directly�[0m
�[34m  #   import from `distutils.core` to work with newer packaging standards.�[0m
�[34m  # - It provides a clear error message when setuptools is not installed.�[0m
�[34m  # - It sets `sys.argv[0]` to the underlying `setup.py`, when invoking `setup.py` so�[0m
�[34m  #   setuptools doesn'"'"'t think the script is `-c`. This avoids the following warning:�[0m
�[34m  #     manifest_maker: standard file '"'"'-c'"'"' not found".�[0m
�[34m  # - It generates a shim setup.py, for handling setup.cfg-only projects.�[0m
�[34m  import os, sys, tokenize�[0m
�[34m  �[0m
�[34m  try:�[0m
�[34m      import setuptools�[0m
�[34m  except ImportError as error:�[0m
�[34m      print(�[0m
�[34m          "ERROR: Can not execute `setup.py` since setuptools is not available in "�[0m
�[34m          "the build environment.",�[0m
�[34m          file=sys.stderr,�[0m
�[34m      )�[0m
�[34m      sys.exit(1)�[0m
�[34m  �[0m
�[34m  __file__ = %r�[0m
�[34m  sys.argv[0] = __file__�[0m
�[34m  �[0m
�[34m  if os.path.exists(__file__):�[0m
�[34m      filename = __file__�[0m
�[34m      with tokenize.open(__file__) as f:�[0m
�[34m          setup_py_code = f.read()�[0m
�[34m  else:�[0m
�[34m      filename = "<auto-generated setuptools caller>"�[0m
�[34m      setup_py_code = "from setuptools import setup; setup()"�[0m
�[34m  �[0m
�[34m  exec(compile(setup_py_code, filename, "exec"))�[0m
�[34m  '"'"''"'"''"'"' % ('"'"'/home/conda/feedstock_root/build_artifacts/libtorch_1729889670574/work/setup.py'"'"',), "<pip-setuptools-caller>", "exec"))' bdist_wheel -d /tmp/pip-wheel-ri5p1vkt�[0m
  �[1;35mcwd�[0m: /home/conda/feedstock_root/build_artifacts/libtorch_1729889670574/work/
�[31m  ERROR: Failed building wheel for torch�[0m�[31m
�[0m  Running command python setup.py clean
  Building wheel torch-2.5.0.post300
  usage: setup.py [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]
     or: setup.py --help [cmd1 cmd2 ...]
     or: setup.py --help-commands
     or: setup.py cmd --help

  error: option --all not recognized
  �[1;31merror�[0m: �[1msubprocess-exited-with-error�[0m
  
  �[31m�[0m �[32mpython setup.py clean�[0m did not run successfully.
  �[31m│�[0m exit code: �[1;36m1�[0m
  �[31m╰─>�[0m See above for output.
  
  �[1;35mnote�[0m: This error originates from a subprocess, and is likely not a problem with pip.
  �[1;35mfull command�[0m: �[34m/home/conda/feedstock_root/build_artifacts/libtorch_1729889670574/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_pl/bin/python -u -c '�[0m
�[34m  exec(compile('"'"''"'"''"'"'�[0m
�[34m  # This is <pip-setuptools-caller> -- a caller that pip uses to run setup.py�[0m
�[34m  #�[0m
�[34m  # - It imports setuptools before invoking setup.py, to enable projects that directly�[0m
�[34m  #   import from `distutils.core` to work with newer packaging standards.�[0m
�[34m  # - It provides a clear error message when setuptools is not installed.�[0m
�[34m  # - It sets `sys.argv[0]` to the underlying `setup.py`, when invoking `setup.py` so�[0m
�[34m  #   setuptools doesn'"'"'t think the script is `-c`. This avoids the following warning:�[0m
�[34m  #     manifest_maker: standard file '"'"'-c'"'"' not found".�[0m
�[34m  # - It generates a shim setup.py, for handling setup.cfg-only projects.�[0m
�[34m  import os, sys, tokenize�[0m
�[34m  �[0m
�[34m  try:�[0m
�[34m      import setuptools�[0m
�[34m  except ImportError as error:�[0m
�[34m      print(�[0m
�[34m          "ERROR: Can not execute `setup.py` since setuptools is not available in "�[0m
�[34m          "the build environment.",�[0m
�[34m          file=sys.stderr,�[0m
�[34m      )�[0m
�[34m      sys.exit(1)�[0m
�[34m  �[0m
�[34m  __file__ = %r�[0m
�[34m  sys.argv[0] = __file__�[0m
�[34m  �[0m
�[34m  if os.path.exists(__file__):�[0m
�[34m      filename = __file__�[0m
�[34m      with tokenize.open(__file__) as f:�[0m
�[34m          setup_py_code = f.read()�[0m
�[34m  else:�[0m
�[34m      filename = "<auto-generated setuptools caller>"�[0m
�[34m      setup_py_code = "from setuptools import setup; setup()"�[0m
�[34m  �[0m
�[34m  exec(compile(setup_py_code, filename, "exec"))�[0m
�[34m  '"'"''"'"''"'"' % ('"'"'/home/conda/feedstock_root/build_artifacts/libtorch_1729889670574/work/setup.py'"'"',), "<pip-setuptools-caller>", "exec"))' clean --all�[0m
  �[1;35mcwd�[0m: /home/conda/feedstock_root/build_artifacts/libtorch_1729889670574/work
�[31m  ERROR: Failed cleaning build dir for torch�[0m�[31m
�[0m�[31mERROR: Failed to build one or more wheels�[0m�[31m
�[0mIgnoring indexes: https://pypi.org/simple
Created temporary directory: /tmp/pip-build-tracker-mdhef1tw
Initialized build tracking at /tmp/pip-build-tracker-mdhef1tw
Created build tracker: /tmp/pip-build-tracker-mdhef1tw
Entered build tracker: /tmp/pip-build-tracker-mdhef1tw
Created temporary directory: /tmp/pip-wheel-hvfsatz6
Created temporary directory: /tmp/pip-ephem-wheel-cache-kyva823a
Processing $SRC_DIR
  Added file://$SRC_DIR to build tracker '/tmp/pip-build-tracker-mdhef1tw'
  Running setup.py (path:$SRC_DIR/setup.py) egg_info for package from file://$SRC_DIR
  Created temporary directory: /tmp/pip-pip-egg-info-ynorohhn
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
  Source in $SRC_DIR has version 2.5.0.post300, which satisfies requirement torch==2.5.0.post300 from file://$SRC_DIR
  Removed torch==2.5.0.post300 from file://$SRC_DIR from build tracker '/tmp/pip-build-tracker-mdhef1tw'
Created temporary directory: /tmp/pip-unpack-y4841kgh
Created temporary directory: /tmp/pip-unpack-1ypvsrms
Building wheels for collected packages: torch
  Created temporary directory: /tmp/pip-wheel-ri5p1vkt
  Building wheel for torch (setup.py): started
  Destination directory: /tmp/pip-wheel-ri5p1vkt
  Building wheel for torch (setup.py): finished with status 'error'
  Running setup.py clean for torch
Failed to build torch
Exception information:
Traceback (most recent call last):
  File "${PREFIX}/lib/python3.12/site-packages/pip/_internal/cli/base_command.py", line 105, in _run_wrapper
    status = _inner_run()
             ^^^^^^^^^^^^
  File "${PREFIX}/lib/python3.12/site-packages/pip/_internal/cli/base_command.py", line 96, in _inner_run
    return self.run(options, args)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "${PREFIX}/lib/python3.12/site-packages/pip/_internal/cli/req_command.py", line 67, in wrapper
    return func(self, options, args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "${PREFIX}/lib/python3.12/site-packages/pip/_internal/commands/wheel.py", line 180, in run
    raise CommandError("Failed to build one or more wheels")
pip._internal.exceptions.CommandError: Failed to build one or more wheels
Removed build tracker: '/tmp/pip-build-tracker-mdhef1tw'
+ [[ libtorch == \l\i\b\t\o\r\c\h ]]
+ mkdir -p /home/conda/feedstock_root/build_artifacts/libtorch_1729889670574/work/dist
+ pushd /home/conda/feedstock_root/build_artifacts/libtorch_1729889670574/work/dist
~/feedstock_root/build_artifacts/libtorch_1729889670574/work/dist ~/feedstock_root/build_artifacts/libtorch_1729889670574/work
+ wheel unpack '../torch-*.whl'
Bad wheel filename 'torch-*.whl'
Traceback (most recent call last):
  File "/opt/conda/lib/python3.12/site-packages/conda_build/build.py", line 2555, in build
    utils.check_call_env(
  File "/opt/conda/lib/python3.12/site-packages/conda_build/utils.py", line 404, in check_call_env
    return _func_defaulting_env_to_os_environ("call", *popenargs, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.12/site-packages/conda_build/utils.py", line 380, in _func_defaulting_env_to_os_environ
    raise subprocess.CalledProcessError(proc.returncode, _args)
subprocess.CalledProcessError: Command '['/bin/bash', '-o', 'errexit', '/home/conda/feedstock_root/build_artifacts/libtorch_1729889670574/work/conda_build.sh']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/conda/bin/conda-build", line 11, in <module>
    sys.exit(execute())
             ^^^^^^^^^
  File "/opt/conda/lib/python3.12/site-packages/conda_build/cli/main_build.py", line 589, in execute
    api.build(
  File "/opt/conda/lib/python3.12/site-packages/conda_build/api.py", line 209, in build
    return build_tree(
           ^^^^^^^^^^^
  File "/opt/conda/lib/python3.12/site-packages/conda_build/build.py", line 3655, in build_tree
    packages_from_this = build(
                         ^^^^^^
  File "/opt/conda/lib/python3.12/site-packages/conda_build/build.py", line 2563, in build
    raise BuildScriptException(str(exc), caused_by=exc) from exc
conda_build.exceptions.BuildScriptException: Command '['/bin/bash', '-o', 'errexit', '/home/conda/feedstock_root/build_artifacts/libtorch_1729889670574/work/conda_build.sh']' returned non-zero exit status 1.
valid configs are {'linux_64_blas_implgenericc_compiler_version13cuda_compilerNonecuda_compiler_versionNonecxx_compiler_version13', 'osx_64_blas_implgenericnumpy2.0python3.10.____cpython', 'osx_arm64_numpy2.0python3.9.____cpython', 'osx_arm64_numpy2.0python3.11.____cpython', 'linux_aarch64_c_compiler_version13c_stdlib_version2.17cuda_compilerNonecuda_compiler_versionNonecxx_compiler_version13', 'osx_64_blas_implmklnumpy2python3.13.____cp313', 'osx_64_blas_implgenericnumpy2python3.13.____cp313', 'osx_64_blas_implmklnumpy2.0python3.9.____cpython', 'osx_64_blas_implmklnumpy2.0python3.12.____cpython', 'osx_arm64_numpy2.0python3.12.____cpython', 'osx_64_blas_implgenericnumpy2.0python3.12.____cpython', 'osx_64_blas_implmklnumpy2.0python3.10.____cpython', 'osx_arm64_numpy2.0python3.10.____cpython', 'linux_64_blas_implmklc_compiler_version13cuda_compilerNonecuda_compiler_versionNonecxx_compiler_version13', 'osx_arm64_numpy2python3.13.____cp313', 'linux_64_blas_implmklc_compiler_version12cuda_compilercuda-nvcccuda_compiler_version12.0cxx_compiler_version12', 'linux_64_blas_implgenericc_compiler_version11cuda_compilernvcccuda_compiler_version11.8cxx_compiler_version11', 'osx_64_blas_implgenericnumpy2.0python3.9.____cpython', 'osx_64_blas_implmklnumpy2.0python3.11.____cpython', 'osx_64_blas_implgenericnumpy2.0python3.11.____cpython', 'linux_64_blas_implgenericc_compiler_version12cuda_compilercuda-nvcccuda_compiler_version12.0cxx_compiler_version12', 'linux_aarch64_c_compiler_version12c_stdlib_version2.28cuda_compilercuda-nvcccuda_compiler_version12.0cxx_compiler_version12', 'linux_64_blas_implmklc_compiler_version11cuda_compilernvcccuda_compiler_version11.8cxx_compiler_version11'}
Using linux_64_blas_implmklc_compiler_version11cuda_compilernvcccuda_compiler_version11.8cxx_compiler_version11 configuration

@hmaarrfk
Copy link
Contributor

It seems that maybe cuda 11.8 isn't working either on the linux64_. I'm not too interested in that in 2024, we can likely drop it.

If you want to drop it, please say so in the first post so we can get other's feedback.

The only

linux_64_blas_implmklc_compiler_version11cuda_compilernvcccuda_compiler_version11.8cxx_compiler_version11-log.txt

For linuxaarch it seems like you are missing qemu-user-static (ubuntu package name) on your host machine. That isn't documented so well.

/python3.11: cannot execute binary file: Exec format error^M   

but compilation seems to be working!!!!

We assume you can "emulate" aarch which requires a host package (outside of docker) like

@jeongseok-meta
Copy link
Author

Sounds good! Made a comment: #270 (comment)

For linuxaarch it seems like you are missing qemu-user-static (ubuntu package name) on your host machine. That isn't documented so well.

Rebuilding with the package installed... ⏳ Thanks!

@hmaarrfk
Copy link
Contributor

Don’t you want ppc as well? You might as well try. You are on a roll

@jslee02
Copy link

jslee02 commented Oct 26, 2024

New build log with qemu-user-static: log_files.zip

Only one failed: linux_64_blas_implmklc_compiler_version11cuda_compilernvcccuda_compiler_version11.8cxx_compiler_version11-log.txt

Don’t you want ppc as well? You might as well try. You are on a roll

Yes, but I'd like to keep this change minimal and address the rest in follow-up PRs.

@jeongseok-meta
Copy link
Author

I am not sure if we want to drop 11.8 + mkl (11.8 + generic works though) or fix it in this PR. Please let me know how you would like to proceed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants