[Feature request] temperature parameter in Softmax and SoftmaxOutput #11016

slitsey · 2018-05-21T22:03:18Z

MXNet does not appear to have a native temperature parameter in its softmax functions. I would like this to be added, as it has many useful applications when learning a categorical probability distribution, especially in a reinforcement learning setting. Should default to 1 to reproduce current behavior.

https://en.wikipedia.org/wiki/Softmax_function#Reinforcement_learning

@eric-haibin-lin

srochel · 2018-06-07T14:37:18Z

@slitsey Would you be able to provide an example we can use to validate the requested feature?

slitsey · 2018-06-07T17:02:22Z

@srochel @eric-haibin-lin I'm expecting behavior similar to the short program below:

import numpy as np

def softmax(x, temperature=1):
    return np.exp(x/temperature)/sum(np.exp(x/temperature))

x = np.array([ 1,  2,  3])

print(softmax(x))
print(softmax(x, temperature=10))
print(softmax(x, temperature=0.1))

which returns

[ 0.09003057  0.24472847  0.66524096]
[ 0.30060961  0.33222499  0.3671654 ]
[  2.06106005e-09   4.53978686e-05   9.99954600e-01]

This allows interpolation between a uniform distribution in the high-temperature limit and a greedy distribution with all probability mass on the most probable index in the zero-temperature limit. Starting with high temperature and decreasing throughout learning allows transitioning from exploration-heavy to exploitation-heavy policies in RL, if the softmax represents the policy's action distribution. Again, there are potential numerical stability issues with this due to the exponentials. But probably something like

import mxnet as mx

data = mx.sym.Variable('data')
net = mx.sym.softmax(data=data, temperature=10)

x = mx.nd.array([ 1,  2,  3])

ex = net.bind(mx.cpu(), args={'data': x, 'softmax2_label': 'softmax2'})
ex.forward()

should return

[
[ 0.30060961  0.33222499  0.3671654 ]
 <NDArray 3 @cpu(0)>]

for example. And of course this behavior should generalize as expected across axes, etc., and also be incorporated into mx.sym.SoftmaxOutput (and anywhere else softmax/Boltzmann distributions might arise in MXNet).

apeforest · 2018-06-14T22:42:12Z

Hi @slitsey I will start working on this Feature Request. There are multiple operators related to softmax. I will first implement the operators mx.sym.softmax and mx.sym.SoftmaxOutput as suggested in your example. Please advise otherwise. Thanks!

apeforest · 2018-06-20T16:32:03Z

I have created a JIRA for this issue (https://issues.apache.org/jira/browse/MXNET-560) and I have started the implementation. Estimated development time: 3 days

apeforest · 2018-07-19T20:07:39Z

@slitsey This feature has been merged. Please pull the latest code and verify. Thanks!

slitsey · 2018-07-24T14:57:13Z

@apeforest Looks great; thanks for the work on this!

apeforest · 2018-07-26T18:08:53Z

@slitsey It would be great if you could share with me how you used this feature in reinforcement training. I am curious to learn and think it will be very helpful to other users who are not aware of this feature. A few lines of code or a pointer to your repo will be ideal. Thanks a lot in advance!

slitsey · 2018-07-30T17:40:20Z

@apeforest I can’t share any code with you because I haven’t written it yet. However, I can explain the idea. One approach to reinforcement learning is to have a function that generates a probability distribution over all possible actions for each state, and then to take each action with the associated probability. If this probability distribution is generated by a softmax at the end of a deep learning model, we can optimize it by taking series of actions, measuring or estimating the rewards of each, and optimizing the distribution to maximize the expected reward. Temperature comes in because early in training, we have little information about the value of each action, so we want to intentionally bias the distribution to be more uniform, encouraging exploration. This can be achieved with a high temperature. As training continues, we can reduce the temperature, shifting probability mass to the most valuable actions, as we learn more about which actions are actually valuable. If there is a unique best action at each state, the zero-temperature limit will put all the probability mass on that best action, yielding an optimal trajectory.

There are other applications outside of RL as well; for example, temperature can be used in model distillation, i.e. training a simple model to mimic a complex one.

sandeep-krishnamurthy added Call for Contribution Feature request labels May 28, 2018

szha added the Operator label Jun 8, 2018

eric-haibin-lin added the good first issue label Jun 10, 2018

This was referenced Jun 21, 2018

[MXNET-560][WIP] Add temperature parameter in Softmax and SoftmaxOutput operator #11356

Closed

[MXNET-560] Add temperature parameter in Softmax operator #11466

Merged

szha closed this as completed Jul 20, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature request] temperature parameter in Softmax and SoftmaxOutput #11016

[Feature request] temperature parameter in Softmax and SoftmaxOutput #11016

slitsey commented May 21, 2018

srochel commented Jun 7, 2018

slitsey commented Jun 7, 2018

apeforest commented Jun 14, 2018

apeforest commented Jun 20, 2018

apeforest commented Jul 19, 2018

slitsey commented Jul 24, 2018

apeforest commented Jul 26, 2018 •

edited

Loading

slitsey commented Jul 30, 2018

[Feature request] temperature parameter in Softmax and SoftmaxOutput #11016

[Feature request] temperature parameter in Softmax and SoftmaxOutput #11016

Comments

slitsey commented May 21, 2018

srochel commented Jun 7, 2018

slitsey commented Jun 7, 2018

apeforest commented Jun 14, 2018

apeforest commented Jun 20, 2018

apeforest commented Jul 19, 2018

slitsey commented Jul 24, 2018

apeforest commented Jul 26, 2018 • edited Loading

slitsey commented Jul 30, 2018

apeforest commented Jul 26, 2018 •

edited

Loading