Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

train_imagenet.py fails with float16 + alexnet #16239

Discussion options

You must be logged in to vote

@zhreshold @szha
Thanks for looking into this. Seems like a real issue after upgrading to CUDA 10.1.

I am wondering if this is related to mx.symbol.LRN. I tried to remove LRNs in the AlexNet and the error disappears. FYI.

Replies: 7 comments

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Answer selected by szha
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
5 participants
Converted from issue

This discussion was converted from issue #16239 on September 05, 2020 19:32.