Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

magma: unbreak for cudaPackages_12 #283640

Merged
merged 1 commit into from
Jan 25, 2024

Conversation

SomeoneSerge
Copy link
Contributor

#269639 seems to have broken magma (and torch, and what not); #281656 (comment) suggests a hot fix and it seems to work. CC @samuela @dmayle

Description of changes

Things done

  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandboxing enabled in nix.conf? (See Nix manual)
    • sandbox = relaxed
    • sandbox = true
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 24.05 Release Notes (or backporting 23.05 and 23.11 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

Add a 👍 reaction to pull requests you find important.

Hotfix based on the suggestion from NixOS#281656 (comment)
@SomeoneSerge SomeoneSerge added the 6.topic: cuda Parallel computing platform and API label Jan 25, 2024
Comment on lines +145 to +146
] ++ lists.optionals (cudaPackages.cudaAtLeast "12.0.0") [
(lib.cmakeBool "USE_FORTRAN" false)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason not to do

(lib.cmakeBool "USE_FORTRAN" !(cudaPackages.cudaAtLeast "12.0.0"))

or something similar?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't test how the option affects cuda11 (it's somewhat late in the night...), this way we only specify it for cuda12

@ofborg ofborg bot requested a review from ConnorBaker January 25, 2024 02:15
@ofborg ofborg bot added 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 11-100 labels Jan 25, 2024
@ConnorBaker
Copy link
Contributor

Is there a downside to taking the approach of specifying the fortran name-mangling convention?

Also, is there a downside to disabling Fortran for Magma?

@ConnorBaker
Copy link
Contributor

Well nix-cuda-test built for both CUDA 11 and 12; I'm running the small ViT model to see whether there's any sort of horrible regression caused by disabling Fortran.

@ConnorBaker
Copy link
Contributor

No crazy differences so I think it's good to go as a hot fix. Let's get it merged!

./pr-283640-cuda-11/bin/nix-cuda-test 
Seed set to 42
Using bfloat16 Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
Missing logger folder: /home/connorbaker/nix-cuda-test/lightning_logs
Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to data/cifar-10-python.tar.gz
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 170498071/170498071 [00:02<00:00, 78881307.89it/s]
Extracting data/cifar-10-python.tar.gz to data
Files already downloaded and verified
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name      | Type             | Params
-----------------------------------------------
0 | criterion | CrossEntropyLoss | 0     
1 | model     | ViT              | 86.3 M
-----------------------------------------------
86.3 M    Trainable params
0         Non-trainable params
86.3 M    Total params
345.317   Total estimated model params size (MB)
Epoch 9: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 781/781 [01:28<00:00,  8.80it/s, v_num=0, train_loss=2.350, val_loss=2.330]`Trainer.fit` stopped: `max_epochs=10` reached.                                                                                                                                                                            
Epoch 9: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 781/781 [01:30<00:00,  8.62it/s, v_num=0, train_loss=2.350, val_loss=2.330]
$ ./pr-283640-cuda-12/bin/nix-cuda-test
Seed set to 42
Using bfloat16 Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
Files already downloaded and verified
Files already downloaded and verified
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name      | Type             | Params
-----------------------------------------------
0 | criterion | CrossEntropyLoss | 0     
1 | model     | ViT              | 86.3 M
-----------------------------------------------
86.3 M    Trainable params
0         Non-trainable params
86.3 M    Total params
345.317   Total estimated model params size (MB)
Epoch 9: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 781/781 [01:26<00:00,  9.04it/s, v_num=1, train_loss=2.350, val_loss=2.330]`Trainer.fit` stopped: `max_epochs=10` reached.                                                                                                                                                                            
Epoch 9: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 781/781 [01:28<00:00,  8.86it/s, v_num=1, train_loss=2.350, val_loss=2.330]

@ConnorBaker ConnorBaker merged commit eea875a into NixOS:master Jan 25, 2024
26 checks passed
@delroth delroth added the 12.approvals: 2 This PR was reviewed and approved by two reputable people label Jan 25, 2024
@samuela
Copy link
Member

samuela commented Jan 25, 2024

@ConnorBaker AFAIK the effective linear algebra backend is mediated with the torch.backends.cuda.preferred_linalg_library function in pytorch. Options are "cusolver" and "magma", with the default being a heuristic choice made by PyTorch. So it's possible that if cuSOLVER is available in your environment that magma may not be used at all.

Also, IIUC, magma/cuSOLVER are only utilized for PyTorch functions related to matrix inversion, QR/Cholesky/LU decomposition, and SVD. A ViT training loop will not include any of these AFAIK.

tldr, I'm a big fan of running benchmarks but I'm concerned that we may not have benchmarked quite the right thing in this case

@ConnorBaker
Copy link
Contributor

Ah, that's a good point. I also like benchmarks, especially when they measure the thing they're supposed to be benchmarking :/

I need to look into adding Magma's test cases as a passthru or something and exposing them separately.

In the meantime, I'm going to take the approach of specifying name-mangling as part of a general update to the magma package.

@samuela
Copy link
Member

samuela commented Jan 25, 2024

Yeah, sorry to be the parade rain-er here! But not a big deal ultimately... working magma > broken magma

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
6.topic: cuda Parallel computing platform and API 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 11-100 12.approvals: 2 This PR was reviewed and approved by two reputable people
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

5 participants