Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault when importing sklearn as well, osx-arm64 #260

Open
1 task done
danpetry opened this issue Sep 9, 2024 · 16 comments
Open
1 task done

Segfault when importing sklearn as well, osx-arm64 #260

danpetry opened this issue Sep 9, 2024 · 16 comments
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@danpetry
Copy link

danpetry commented Sep 9, 2024

Solution to issue cannot be found in the documentation.

  • I checked the documentation.

Issue

This issue affects the conda-forge version too.

The solution is export PACKAGE_TYPE="conda" in the recipe. This was the change that introduced the bug. It puts an openmp into the site-packages directory and uses that one, if the above env variable isn't set.

Some more info:
otool -L on the libraries to see what they’re linking against:

pytorch v2.2:

/Users/dpetry/miniconda3/envs/test_torch_torchonly/lib/python3.12/site-packages/torch/lib/libtorch_cpu.dylib:
    ...
	@rpath/libomp.dylib (compatibility version 5.0.0, current version 5.0.0)
	...

pytorch v.2.3:

/Users/dpetry/miniconda3/envs/test_torch_torchonly/lib/python3.12/site-packages/torch/lib/libomp.dylib:
	@rpath/libomp.dylib (compatibility version 5.0.0, current version 5.0.0)
	...
	
/Users/dpetry/miniconda3/envs/test_torch_torchonly/lib/python3.12/site-packages/torch/lib/libtorch_cpu.dylib:
	...
	@loader_path/libomp.dylib (compatibility version 5.0.0, current version 5.0.0)
	...

Installed packages

n/a

Environment info

n/a
@danpetry danpetry added the bug Something isn't working label Sep 9, 2024
@hmaarrfk
Copy link
Contributor

hmaarrfk commented Sep 9, 2024

The referenced issue ( pytorch/pytorch#132372 (comment) ) points to a user recreating the bug with "anaconda's default channel". We do not use "anaconda default channel" packages.

If you can recreate with conda-forge only channel:

please provide all the requested outputs from the issue template. they are critical when troubleshooting this.

conda create --name pt python=3.11 pytorch scikit-learn --channel conda-forge --override-cannels

then we can start to investigate.

@hmaarrfk
Copy link
Contributor

hmaarrfk commented Sep 9, 2024

I also worked on a similar issue 2 months ago
#244

so getting the full output from you would really help troubleshoot things.

@danpetry
Copy link
Author

danpetry commented Sep 12, 2024

yes, can recreate with c-f channel. will provide more info shortly.

@danpetry
Copy link
Author

Here's the full output. It affects your v2.3.0 but not v2.4.0. In any case, I thought it would be useful to give you a heads-up about the PACKAGE_TYPE="conda" env variable they use, as they seem to assume it's set when building conda packages.

(py312-torch) dpetry@Daniels-MacBook-Pro ~/Sandbox/aggregate (master) $ conda create --name pt python=3.11 pytorch scikit-learn --channel conda-forge --override-channels
Channels:
 - conda-forge
Platform: osx-arm64
Collecting package metadata (repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /Users/dpetry/miniconda3/envs/pt

  added / updated specs:
    - python=3.11
    - pytorch
    - scikit-learn


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    filelock-3.16.0            |     pyhd8ed1ab_0          17 KB  conda-forge
    fsspec-2024.9.0            |     pyhff2d567_0         131 KB  conda-forge
    gmpy2-2.1.5                |  py311hb5ce3a2_2         144 KB  conda-forge
    libtorch-2.4.0             |cpu_generic_h15ee98c_1        27.2 MB  conda-forge
    markupsafe-2.1.5           |  py311h460d6c5_1          26 KB  conda-forge
    mpc-1.3.1                  |       h8f1351a_1         102 KB  conda-forge
    mpfr-4.2.1                 |       hb693164_3         337 KB  conda-forge
    numpy-2.1.1                |  py311h6de8079_0         6.7 MB  conda-forge
    python-3.11.10             |h739c21a_0_cpython        13.9 MB  conda-forge
    python_abi-3.11            |          5_cp311           6 KB  conda-forge
    pytorch-2.4.0              |cpu_generic_py311h8ecd042_1        23.9 MB  conda-forge
    scikit-learn-1.5.2         |  py311h9e23f0f_1         9.3 MB  conda-forge
    scipy-1.14.1               |  py311h2929bc6_0        14.7 MB  conda-forge
    ------------------------------------------------------------
                                           Total:        96.4 MB

The following NEW packages will be INSTALLED:

  bzip2              conda-forge/osx-arm64::bzip2-1.0.8-h99b78c6_7 
  ca-certificates    conda-forge/osx-arm64::ca-certificates-2024.8.30-hf0a4a13_0 
  filelock           conda-forge/noarch::filelock-3.16.0-pyhd8ed1ab_0 
  fsspec             conda-forge/noarch::fsspec-2024.9.0-pyhff2d567_0 
  gmp                conda-forge/osx-arm64::gmp-6.3.0-h7bae524_2 
  gmpy2              conda-forge/osx-arm64::gmpy2-2.1.5-py311hb5ce3a2_2 
  jinja2             conda-forge/noarch::jinja2-3.1.4-pyhd8ed1ab_0 
  joblib             conda-forge/noarch::joblib-1.4.2-pyhd8ed1ab_0 
  libabseil          conda-forge/osx-arm64::libabseil-20240116.2-cxx17_h00cdb27_1 
  libblas            conda-forge/osx-arm64::libblas-3.9.0-23_osxarm64_openblas 
  libcblas           conda-forge/osx-arm64::libcblas-3.9.0-23_osxarm64_openblas 
  libcxx             conda-forge/osx-arm64::libcxx-18.1.8-h3ed4263_7 
  libexpat           conda-forge/osx-arm64::libexpat-2.6.3-hf9b8971_0 
  libffi             conda-forge/osx-arm64::libffi-3.4.2-h3422bc3_5 
  libgfortran        conda-forge/osx-arm64::libgfortran-5.0.0-13_2_0_hd922786_3 
  libgfortran5       conda-forge/osx-arm64::libgfortran5-13.2.0-hf226fd6_3 
  liblapack          conda-forge/osx-arm64::liblapack-3.9.0-23_osxarm64_openblas 
  libopenblas        conda-forge/osx-arm64::libopenblas-0.3.27-openmp_h517c56d_1 
  libprotobuf        conda-forge/osx-arm64::libprotobuf-4.25.3-hbfab5d5_0 
  libsqlite          conda-forge/osx-arm64::libsqlite-3.46.1-hc14010f_0 
  libtorch           conda-forge/osx-arm64::libtorch-2.4.0-cpu_generic_h15ee98c_1 
  libuv              conda-forge/osx-arm64::libuv-1.48.0-h93a5062_0 
  libzlib            conda-forge/osx-arm64::libzlib-1.3.1-hfb2fe0b_1 
  llvm-openmp        conda-forge/osx-arm64::llvm-openmp-18.1.8-hde57baf_1 
  markupsafe         conda-forge/osx-arm64::markupsafe-2.1.5-py311h460d6c5_1 
  mpc                conda-forge/osx-arm64::mpc-1.3.1-h8f1351a_1 
  mpfr               conda-forge/osx-arm64::mpfr-4.2.1-hb693164_3 
  mpmath             conda-forge/noarch::mpmath-1.3.0-pyhd8ed1ab_0 
  ncurses            conda-forge/osx-arm64::ncurses-6.5-h7bae524_1 
  networkx           conda-forge/noarch::networkx-3.3-pyhd8ed1ab_1 
  nomkl              conda-forge/noarch::nomkl-1.0-h5ca1d4c_0 
  numpy              conda-forge/osx-arm64::numpy-2.1.1-py311h6de8079_0 
  openssl            conda-forge/osx-arm64::openssl-3.3.2-h8359307_0 
  pip                conda-forge/noarch::pip-24.2-pyh8b19718_1 
  python             conda-forge/osx-arm64::python-3.11.10-h739c21a_0_cpython 
  python_abi         conda-forge/osx-arm64::python_abi-3.11-5_cp311 
  pytorch            conda-forge/osx-arm64::pytorch-2.4.0-cpu_generic_py311h8ecd042_1 
  readline           conda-forge/osx-arm64::readline-8.2-h92ec313_1 
  scikit-learn       conda-forge/osx-arm64::scikit-learn-1.5.2-py311h9e23f0f_1 
  scipy              conda-forge/osx-arm64::scipy-1.14.1-py311h2929bc6_0 
  setuptools         conda-forge/noarch::setuptools-73.0.1-pyhd8ed1ab_0 
  sleef              conda-forge/osx-arm64::sleef-3.6.1-h7783ee8_3 
  sympy              conda-forge/noarch::sympy-1.13.2-pypyh2585a3b_103 
  threadpoolctl      conda-forge/noarch::threadpoolctl-3.5.0-pyhc1e730c_0 
  tk                 conda-forge/osx-arm64::tk-8.6.13-h5083fa2_1 
  typing_extensions  conda-forge/noarch::typing_extensions-4.12.2-pyha770c72_0 
  tzdata             conda-forge/noarch::tzdata-2024a-h8827d51_1 
  wheel              conda-forge/noarch::wheel-0.44.0-pyhd8ed1ab_0 
  xz                 conda-forge/osx-arm64::xz-5.2.6-h57fd34a_0 


Proceed ([y]/n)? y


Downloading and Extracting Packages:
                                                                                                                                                                                                                      
Preparing transaction: done                                                                                                                                                                                           
Verifying transaction: done                                                                                                                                                                                           
Executing transaction: done                                                                                                                                                                                           
#                                                                                                                                                                                                                     
# To activate this environment, use                                                                                                                                                                                   
#                                                                                                                                                                                                                     
#     $ conda activate pt                                                                                                                                                                                             
#                                                                                                                                                                                                                     
# To deactivate an active environment, use                                                                                                                                                                            
#                                                                                                                                                                                                                     
#     $ conda deactivate                                                                                                                                                                                              
                                                                                                                                                                                                                      
(py312-torch) dpetry@Daniels-MacBook-Pro ~/Sandbox/aggregate (master) $ conda activate pt
(pt) dpetry@Daniels-MacBook-Pro ~/Sandbox/aggregate (master) $ python -c "import sklearn;import torch;import numpy;torch.tensor(numpy.zeros((33000,)))"
(pt) dpetry@Daniels-MacBook-Pro ~/Sandbox/aggregate (master) $ conda install pytorch=2.3.0 -c conda-forge --override-channels
Channels:
 - conda-forge
Platform: osx-arm64
Collecting package metadata (repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /Users/dpetry/miniconda3/envs/pt

  added / updated specs:
    - pytorch=2.3.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    libtorch-2.3.0             |cpu_generic_hac4f340_1        27.0 MB  conda-forge
    pytorch-2.3.0              |cpu_generic_py311h82099cb_1        50.7 MB  conda-forge
    ------------------------------------------------------------
                                           Total:        77.7 MB

The following packages will be DOWNGRADED:

  libtorch                     2.4.0-cpu_generic_h15ee98c_1 --> 2.3.0-cpu_generic_hac4f340_1 
  pytorch                 2.4.0-cpu_generic_py311h8ecd042_1 --> 2.3.0-cpu_generic_py311h82099cb_1 


Proceed ([y]/n)? y


Downloading and Extracting Packages:
                                                                                                                                                                                                                      
Preparing transaction: done                                                                                                                                                                                           
Verifying transaction: done
Executing transaction: done
(pt) dpetry@Daniels-MacBook-Pro ~/Sandbox/aggregate (master) $ python -c "import sklearn;import torch;import numpy;torch.tensor(numpy.zeros((33000,)))"
Segmentation fault: 11
(pt) dpetry@Daniels-MacBook-Pro ~/Sandbox/aggregate (master) $ 

@hmaarrfk
Copy link
Contributor

Thank you for the detailed installation and recreation.

So from your experiments I can see that:

  • Pytorch 2.4.0 no issue
  • Pytorch 2.3.0 issue

Can you try to see if

python -c "import numpy; import torch; torch.zeros((1024, 1024), dtype=torch.uint8)"

recreates things with pytorch 2.3.0 build 0.

I think I hit this issue with: #243

We fixed it in 2.3.1 build 1 #244

I'm honestly not sure how far back to mark the pytorch M1 builds as broken.

If you do the investigation, all with the conda-forge channel channel only, and show your work, I can try to merge in a PR that you make
https://github.com/conda-forge/admin-requests

I unfortunately do not have an OSX arm machine, so it isn't easy for me to test.

@danpetry
Copy link
Author

ok, cool. I can do this if you feel it's useful to your users? I maintain the pytorch recipe at anaconda, so this wouldn't be work that benefits me directly - I just thought making you aware of it might be helpful. But I'm happy to do it if you feel it's valuable. If it's low-priority and it's just making more work for you then no worries. Let me know :)

@danpetry
Copy link
Author

danpetry commented Sep 13, 2024

And in general, let me know to what extent collaboration would be welcome

@hmaarrfk
Copy link
Contributor

If it's low-priority and it's just making more work for you then no worries.

its not low priority as much as "I can't really test it".

But I'm happy to do it if you feel it's valuable.

it is, but it might not be worth your time.

And in general, let me know to what extent collaboration would be welcome

Generally speaking, i miss the days where the recipes were similar between conda-forge and anaconda. sharing ideas was much simpler. However, with things diverging, i'm not sure the extent that it is possible. Where is your pytorch recipe stored today?

My general understanding was that collaboration was still happening on the python-channel which introduced some small compatibility shims:
https://github.com/conda-forge/python-feedstock/blob/main/recipe/meta.yaml#L67

I'm open to adding such shims. We have been having trouble with the aarch64 builds and that has been weighing me down.

Help there would be greatly appreciated! See #256

@danpetry
Copy link
Author

Our recipe is here. It diverges a fair amount but there's some stuff in common too (I want to pull across the single building of libtorch stuff).
It'd be great to collaborate on v2.5.0 of pytorch, and I can have a look at the aarch issue if I get time before then.

@hmaarrfk hmaarrfk added the help wanted Extra attention is needed label Sep 26, 2024
@danpetry
Copy link
Author

@hmaarrfk what was the aarch64 issue you were having specifically? I couldn't really tell from a glance over the issue. Are you still having them?

@danpetry
Copy link
Author

""PyTorch was compiled without NumPy support" error when running on Linux aarch64 + CUDA (on NVIDIA GH200) using the conda-forge build of PyTorch 2.4.0."
was it this?

@hmaarrfk
Copy link
Contributor

See #244 and #243

@danpetry
Copy link
Author

danpetry commented Oct 25, 2024

Thanks. We heard about pytorch dropping their conda package builds. How's the support for win and triton at the moment? as I understand you don't have either, we're keen to help avoid losing users from the conda ecosystem, let me know what the status is there please. I've also seen the open PR for v2.5.0 so will have a look and see if I can debug anything.

@hmaarrfk
Copy link
Contributor

You can find a list of issues #273

I've tagged them all. I mean its all volunteer led. Comment on the appriopriate one, make a PR if you want to see it improve.

So help where you can. To companies that are interested in windows support, I simply suggest volunteering to build things. It goes a long way.

@danpetry
Copy link
Author

Oh, the conda-forge CI can't handle the pytorch build for windows? hmm ok.

@danpetry
Copy link
Author

Good to know

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants