Why not do log_softmax("arch_param") in the graph？ #1

jxgu1016 · 2020-08-17T09:13:47Z

In train_search.py, I noticed that you do log_softmax() out of the graph, but why? Why not just use param alpha instead and do log_softmax() in each forward step?

AberHu · 2020-08-17T10:08:52Z

Hi, thanks for your attention to our repo.

Originally, for convenience, we define the variable "log_alphas" as the log probability distribution for operations. After each architecture optimization step, this defination is violated. Following ProxylessNAS(Sec. 3.2.1) and DenseNAS (A.3.), we do log_softmax() out of the graph to rescale the updated values.

I think it's ok to just use param alpha and do log_softmax() in each forward step. I will run experiment for this. Thanks a lot.

jxgu1016 · 2020-08-19T01:49:49Z

Please keep me updated on your progress.

touchdreamer · 2020-10-07T02:11:21Z

				iw_key = 'module.{}.{}.m_ops.{}.inverted_bottleneck.conv.weight'.format(stage, block, op_idx)
				state_dict[iw_key].data[index,:,:,:] = state_dict_from_model[iw_key]
				dw_key = 'module.{}.{}.m_ops.{}.depth_conv.conv.weight'.format(stage, block, op_idx)
				state_dict[dw_key].data[index,:,:,:] = state_dict_from_model[dw_key]
				pw_key = 'module.{}.{}.m_ops.{}.point_linear.conv.weight'.format(stage, block, op_idx)
				state_dict[pw_key].data[:,index,:,:] = state_dict_from_model[pw_key]

Can you explain these lines? Why pw_key's index is in the second dimension?

AberHu · 2020-10-10T13:56:11Z

@touchdreamer The width search only occurs on depth_conv. The output of depth_conv is the input to point_linear, and the shape of convolutional weights in pytorch is (C_out, C_in/groups, k_h, k_w). Thus, the index of dw_key is in the first dimension but the pw_key's index is in the second one.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why not do log_softmax("arch_param") in the graph？ #1

Why not do log_softmax("arch_param") in the graph？ #1

jxgu1016 commented Aug 17, 2020 •

edited

Loading

AberHu commented Aug 17, 2020

jxgu1016 commented Aug 19, 2020

touchdreamer commented Oct 7, 2020

AberHu commented Oct 10, 2020

Why not do log_softmax("arch_param") in the graph？ #1

Why not do log_softmax("arch_param") in the graph？ #1

Comments

jxgu1016 commented Aug 17, 2020 • edited Loading

AberHu commented Aug 17, 2020

jxgu1016 commented Aug 19, 2020

touchdreamer commented Oct 7, 2020

AberHu commented Oct 10, 2020

jxgu1016 commented Aug 17, 2020 •

edited

Loading