Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Segfault fix for Benchmarks #2432

Merged
merged 2 commits into from
Nov 3, 2023
Merged

Conversation

gs-olive
Copy link
Collaborator

@gs-olive gs-olive commented Nov 2, 2023

Description

  • Segfault fix for benchmarking on Docker container with CUDNN 8.8
  • Likely due to Torch 2.1.0 based on CUDNN 8.9

Type of change

  • Bug fix (non-breaking change which fixes an issue)

Checklist:

  • [ x ] My code follows the style guidelines of this project (You can use the linters)
  • [ x ] I have performed a self-review of my own code
  • [ x ] I have commented my code, particularly in hard-to-understand areas and hacks
  • [ x ] I have made corresponding changes to the documentation
  • [ - ] I have added tests to verify my fix or my feature
    • Tested manually in Docker
  • [ x ] New and existing unit tests pass locally with my changes
  • [ x ] I have added the relevant labels to my PR in so that relevant reviewers are notified

- Segfault fix for benchmarking on Docker container with CUDNN 8.8
- Likely due to Torch 2.1.0 based on CUDNN 8.9
@@ -527,7 +528,6 @@ def recordStats(backend, timings, precision, batch_size=1, compile_time_s=None):
)
args = arg_parser.parse_args()

cudnn.benchmark = True
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@narendasan - this line causes a segfault at inference time, when we compile the Docker container with CUDNN 8.8, and Torch 2.1.0 uses the CUDNN 8.9 Python distributions. When removed, inference works as expected.

Do you think it would be necessary/important to upgrade the build stack to CUDNN 8.9 for the upcoming release?

@github-actions github-actions bot added the component: build system Issues re: Build system label Nov 2, 2023
Copy link
Collaborator

@narendasan narendasan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@narendasan narendasan merged commit 19a11c2 into pytorch:main Nov 3, 2023
17 of 19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants