-
Notifications
You must be signed in to change notification settings - Fork 527
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change mlh threshold effect to create a smooth transition #1844
Conversation
@@ -111,6 +111,11 @@ class MEvaluator { | |||
const float child_m = child.GetM(parent_m_); | |||
float m = std::clamp(m_slope_ * (child_m - parent_m_), -m_cap_, m_cap_); | |||
m *= FastSign(-q); | |||
if (q_threshold_ > 0.0f && q_threshold_ < 1.0f) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a comment what GetM
function does?
I though it returns moves left, but then it returns 0 if |q| < q_thredshol, so it's not clear what it's supposed to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It returns the "moves left head effect" term which gets added to the PUCT score, so that PUCT = Q + U + M
. This PR doesn't change the meaning of GetM()
, but your comment makes clear that it was already confusing before this PR, so a comment (even if it has nothing to do with this PR) would certainly justified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like this before the function declaration.
// Returns adjustment to Q based on Q and M value of a node.
Could you add this in this PR?
Also, GetM()
sounds like a wrong name for this function. Maybe rename to GetQAdjustment()
?
(it's more or less clear that it's based on M because it inside the MEvaluator class).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Settled at GetMUtility()
because it best describes both where it comes from (M) and what it is (a utility).
src/mcts/search.cc
Outdated
m *= a_constant_ + a_linear_ * std::abs(q) + a_square_ * q * q; | ||
return m; | ||
} | ||
|
||
float GetM(Node* child, float q) const { | ||
float GetMUtility(Node* child, float q) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really like that this code is copypasted; probably we should make this one the actual one, and turn the other into a wrapper function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are only two calls of the other version and one of this. Just keeping this variation and adjusting the callers seems cleaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like the better choice for sure. Can you check whether it makes a difference for the DAG compatibility which of the two we keep?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The exact same patch applies to both master and dag-update and seems to work fine.
src/mcts/search.cc
Outdated
m *= a_constant_ + a_linear_ * std::abs(q) + a_square_ * q * q; | ||
return m; | ||
} | ||
|
||
float GetM(Node* child, float q) const { | ||
float GetMUtility(Node* child, float q) const { | ||
if (!enabled_ || !parent_within_threshold_) return 0.0f; | ||
const float child_m = child->GetM(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only difference in the old code: this call is without parent_m_
as a default value, which I think is unintended. Obviously works as intended currently, but should be decided whether this is wanted or unwanted when making one of the functions a wrapper of the other.
…Zero#1844) * changed threshold effect to create a smooth transition * renamed GetM() to GetMUtility() to differentiate it from GetM() * changed duplicate GetMUtility() function into wrapper (cherry picked from commit 905d88e)
Back when we implemented MLH, a mlh threshold was introduced to allow the deactivation of mlh if the game isn't decided yet. However, it was done in a way that creates a discontinuity at the threshold even without using a
constant_
, so we never actually use the mlh threshold for its intended purpose. This minor change removes the discontinuity by using a transformation onq
which has the same value forq = +-1
, is continuous, and 0 between-q_threshold
andq_threshold
.While this change should have been done years ago, the main motivation to do it now is to allow contempt working together with mlh.