Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why not do log_softmax("arch_param") in the graph? #1

Open
jxgu1016 opened this issue Aug 17, 2020 · 4 comments
Open

Why not do log_softmax("arch_param") in the graph? #1

jxgu1016 opened this issue Aug 17, 2020 · 4 comments

Comments

@jxgu1016
Copy link

jxgu1016 commented Aug 17, 2020

In train_search.py, I noticed that you do log_softmax() out of the graph, but why? Why not just use param alpha instead and do log_softmax() in each forward step?

@AberHu
Copy link
Owner

AberHu commented Aug 17, 2020

Hi, thanks for your attention to our repo.

Originally, for convenience, we define the variable "log_alphas" as the log probability distribution for operations. After each architecture optimization step, this defination is violated. Following ProxylessNAS(Sec. 3.2.1) and DenseNAS (A.3.), we do log_softmax() out of the graph to rescale the updated values.

I think it's ok to just use param alpha and do log_softmax() in each forward step. I will run experiment for this. Thanks a lot.

@jxgu1016
Copy link
Author

Please keep me updated on your progress.

@touchdreamer
Copy link

				iw_key = 'module.{}.{}.m_ops.{}.inverted_bottleneck.conv.weight'.format(stage, block, op_idx)
				state_dict[iw_key].data[index,:,:,:] = state_dict_from_model[iw_key]
				dw_key = 'module.{}.{}.m_ops.{}.depth_conv.conv.weight'.format(stage, block, op_idx)
				state_dict[dw_key].data[index,:,:,:] = state_dict_from_model[dw_key]
				pw_key = 'module.{}.{}.m_ops.{}.point_linear.conv.weight'.format(stage, block, op_idx)
				state_dict[pw_key].data[:,index,:,:] = state_dict_from_model[pw_key]

Can you explain these lines? Why pw_key's index is in the second dimension?

@AberHu
Copy link
Owner

AberHu commented Oct 10, 2020

@touchdreamer The width search only occurs on depth_conv. The output of depth_conv is the input to point_linear, and the shape of convolutional weights in pytorch is (C_out, C_in/groups, k_h, k_w). Thus, the index of dw_key is in the first dimension but the pw_key's index is in the second one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants