You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First of all, thank you for sharing the PyTorch implementation, it's wonderful.
I've been going over the code and found this line:
binary_weights = binary_weights_no_grad.detach() - cliped_weights.detach() + cliped_weights
in the birealnet.py and was wondering what the purpose of this is for.
My best guess is that it's to merely allow the gradients to exist without actually changing the values of the binary weights, but some helpful clarification would be wonderful!
The text was updated successfully, but these errors were encountered:
The purpose is that the "forward" value is going to be the binarized weight (binary_weights_no_grad) while the value for obtaining the gradient (the "backward" value) is the clamped weight (cliped_weights).
This trick is also used for the binary_activation for using a binarized value for the forward pass while using the approximation for the backward pass.
binary_weights_no_grad is a floating-point tensor, sign(w) * scale. After the training is done, how can be converted to a binary weight?. I tried, in a naive way, to just use sign(w) without positive results. In essence, after using sign(w) over the trained weights, the network did not work anymore.
First of all, thank you for sharing the PyTorch implementation, it's wonderful.
I've been going over the code and found this line:
binary_weights = binary_weights_no_grad.detach() - cliped_weights.detach() + cliped_weights
in the birealnet.py and was wondering what the purpose of this is for.
My best guess is that it's to merely allow the gradients to exist without actually changing the values of the binary weights, but some helpful clarification would be wonderful!
The text was updated successfully, but these errors were encountered: