You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For either policy or ranking usage, it would be helpful to know the confidence associated with any particular suggested action. For example, it would be helpful to be able to distinguish between the following 2 cases:
a) the probability of achieving maximal reward from the first-ranked action >> probability of achieving maximal reward from the second-ranked action
b) the probability of achieving maximal reward from the first-ranked action =~ probability of achieving maximal reward from the second-ranked action
Furthermore, it would be useful to be able to know if a particular suggested action is skewed toward exploration and/or have the ability to prevent this on a per-request basis.
The text was updated successfully, but these errors were encountered:
This is tricky, because you are asking for high precision at the
frontier of what can be estimated. There are some exploration strategies
(i.e. epsilon greedy) where we could estimate this reasonably well. But
if you care, then you should already be using some of the advanced
exploration strategies (i.e. bagging or cover). When you are using
these advanced strategies, there generally isn't any (significant)
probability of exploring obviously suboptimal actions (as determined by
the algorithm) so everything is already in case (b).
-John
On 08/03/2016 07:48 PM, aidancc wrote:
For either policy or ranking usage, it would be helpful to know the
confidence associated with any particular suggested action. For
example, it would be helpful to be able to distinguish between the
following 2 cases:
a) the probability of achieving maximal reward from the first-ranked
action >> probability of achieving maximal reward from the
second-ranked action
b) the probability of achieving maximal reward from the first-ranked
action =~ probability of achieving maximal reward from the
second-ranked action
Furthermore, it would be useful to be able to know if a particular
suggested action is skewed toward exploration and/or have the ability
to prevent this on a per-request basis.
For either policy or ranking usage, it would be helpful to know the confidence associated with any particular suggested action. For example, it would be helpful to be able to distinguish between the following 2 cases:
a) the probability of achieving maximal reward from the first-ranked action >> probability of achieving maximal reward from the second-ranked action
b) the probability of achieving maximal reward from the first-ranked action =~ probability of achieving maximal reward from the second-ranked action
Furthermore, it would be useful to be able to know if a particular suggested action is skewed toward exploration and/or have the ability to prevent this on a per-request basis.
The text was updated successfully, but these errors were encountered: