-
Notifications
You must be signed in to change notification settings - Fork 264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dbs #976
Dbs #976
Changes from 5 commits
3be6d6c
cc168f4
68b47f6
da4d686
d77383d
6ed2688
46aad13
2182512
70a0b1c
d6dbef6
a152100
1bc6a34
d5fbd47
9f4f3ff
8559ef2
10489ae
bc03a2d
6562eca
fc1ff03
d796578
2a60d7d
e6c9662
e742004
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,354 @@ | ||
from axelrod.actions import Actions, Action | ||
from axelrod.player import Player | ||
|
||
|
||
C, D = Actions.C, Actions.D | ||
|
||
def action_to_int(action): | ||
return (1 if action==C else 0) | ||
|
||
|
||
class DBS(Player): | ||
""" | ||
Desired Belief Strategy as described in: | ||
Accident or Intention: That Is the Question (in the Noisy Iterated Prisoner's Dilemma) by Tsz-Chiu Au and Dana Nau from University of Maryland | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for including the reference. We have a citation file and a specific format we prefer, can you try to convert? We can help... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure, the full citation is: |
||
http://www.cs.utexas.edu/%7Echiu/papers/Au06NoisyIPD.pdf | ||
|
||
A strategy that learns the opponent's strategy, and use symbolic noise de- | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we wrap lines at 80 and not split words with hyphens? |
||
tection for detecting whether anomalies in player’s behavior are deliberate or accidental, hence performing quite well in noisy tournaments. | ||
|
||
From the learned opponent's strategy, a tree search is used to choose the best move | ||
|
||
|
||
|
||
|
||
Default values for the parameters are the suggested values in the article | ||
When more noise you can try to diminish violation_threshold and rejection_threshold | ||
|
||
Parameters: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please see our formatting for other strategies for the parameter docs (and thanks for including them) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The docstring format for these parameters isn't quite correct. We use the numpy standard described at https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMENT.rst.txt e.g. The first parameter description should be:
|
||
|
||
discount_factor: (float, between 0 and 1) discount factor used when computing discounted frequencies to learn opponent's strategy | ||
default: .75 | ||
|
||
promotion_threshold: (int) number of observations needed to promote a change in opponent's strategy | ||
default: 3 | ||
|
||
violation_threshold: (int) number of observation needed to considerate opponent's strategy has changed. Seems good to lower it when noise increase | ||
default: 4 | ||
|
||
reject_threshold: (int) number of observations before forgetting opponents old strategy. Seems good to lower it when noise increase | ||
default: 3 | ||
|
||
tree_depth: (int) depth of the tree for the tree-search algorithm. | ||
default is 5. | ||
Higher depth means more time to coompute the move. | ||
|
||
""" | ||
|
||
# These are various properties for the strategy | ||
# Not accurately precised yet | ||
name = 'DBS' | ||
classifier = { | ||
'memory_depth': float('inf'), | ||
'stochastic': False, | ||
'makes_use_of': set(), | ||
'long_run_time': False, | ||
'inspects_source': False, | ||
'manipulates_source': False, | ||
'manipulates_state': False | ||
} | ||
|
||
|
||
# best value for reject_threshold ? not mentioned in the article | ||
def __init__(self, discount_factor=.75, promotion_threshold=3, violation_threshold=4, reject_threshold=3,tree_depth=5): | ||
super().__init__() | ||
|
||
# default opponent's policy is TitForTat | ||
self.Rd = Policy.prob_policy(1,1,0,0) | ||
self.Rc = Policy() | ||
self.Pi = self.Rd # policy used by MoveGen | ||
self.violation_counts = Policy() | ||
self.reject_threshold = reject_threshold | ||
self.violation_threshold = violation_threshold | ||
self.promotion_threshold = promotion_threshold | ||
self.tree_depth = tree_depth | ||
self.v = 0 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please add a comment explaining the the variable |
||
self.alpha = discount_factor | ||
self.history_by_cond = {} | ||
# to compute the discount frequencies, we need to keep | ||
# up to date an history of what has been played for each | ||
# condition: | ||
# We save it as a dict history_by_cond; keys are conditions (ex (C,C)) | ||
# and values are a tuple of 2 lists (G,F) | ||
# for a condition j: | ||
# G[i] = 1 if cond j was True at turn i-1 and C has been played | ||
# by the opponent; else G[i]=0 | ||
# F[i] = 1 if cond j was True at turn i-1; else G[i]=0 | ||
# initial hypothesized policy is TitForTat | ||
self.history_by_cond[(C,C)]=([1],[1]) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We use PEP8's recommended formatting. For this line it would be: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. thanks, I'll review the code with PEP8's rules |
||
self.history_by_cond[(C,D)]=([1],[1]) | ||
self.history_by_cond[(D,C)]=([0],[1]) | ||
self.history_by_cond[(D,D)]=([0],[1]) | ||
|
||
|
||
def reset(self): | ||
super().reset() | ||
self.Rd = Policy.prob_policy(1,1,0,0) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
self.Rc = Policy() | ||
self.Pi = self.Rd # policy used by MoveGen | ||
self.violation_counts = Policy() | ||
#self.reject_threshold = reject_threshold | ||
#self.violation_threshold = violation_threshold | ||
#self.promotion_threshold = promotion_threshold | ||
#self.tree_depth = tree_depth | ||
self.v = 0 | ||
#self.alpha = discount_factor | ||
self.history_by_cond = {} | ||
self.history_by_cond[(C,C)]=([1],[1]) | ||
self.history_by_cond[(C,D)]=([1],[1]) | ||
self.history_by_cond[(D,C)]=([0],[1]) | ||
self.history_by_cond[(D,D)]=([0],[1]) | ||
|
||
|
||
|
||
def should_promote(self,r_plus,promotion_threshold=3): | ||
if r_plus[1]==C: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. PEP8 spacing: |
||
to_check = 0 | ||
elif r_plus[1]==D: | ||
to_check = 1 | ||
k=1 | ||
count = 0 | ||
while ( | ||
k<len(self.history_by_cond[r_plus[0]][0]) | ||
and not(self.history_by_cond[r_plus[0]][0][1:][-k]==to_check and self.history_by_cond[r_plus[0]][1][1:][-k]==1)): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 4 spaces for indent, please try to wrap the long line or break it up nicely |
||
if (self.history_by_cond[r_plus[0]][1][1:][-k]==1): | ||
count += 1 | ||
k+=1 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. spaces around operators (and next line) |
||
if (count>=promotion_threshold): | ||
return True | ||
return False | ||
|
||
def should_demote(self,r_minus,violation_threshold=4): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
if (self.violation_counts[r_minus[0]]>=violation_threshold): | ||
return True | ||
return False | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Suggestion: This could be done in one line with:
|
||
|
||
|
||
def update_history_by_cond(self,opponent_history): | ||
two_moves_ago = (self.history[-2],opponent_history[-2]) | ||
for outcome,GF in self.history_by_cond.items(): | ||
G,F = GF | ||
if outcome == two_moves_ago: | ||
if opponent_history[-1]==C: | ||
G.append(1) | ||
else: | ||
G.append(0) | ||
F.append(1) | ||
else: | ||
G.append(0) | ||
F.append(0) | ||
|
||
def compute_prob_rule(self,outcome,alpha): | ||
G = self.history_by_cond[outcome][0] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. docstring |
||
F = self.history_by_cond[outcome][1] | ||
discounted_g = 0 | ||
discounted_f = 0 | ||
alpha_k = 1 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. does alpha_k ever change? Seems redundant below if it is always equal to 1 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes alpha_k is iterated in the loop: |
||
for g,f in zip(G[::-1],F[::-1]): | ||
discounted_g += alpha_k*g | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. spaces around operators please |
||
discounted_f += alpha_k*f | ||
alpha_k = alpha*alpha_k | ||
p_cond = discounted_g/discounted_f | ||
return p_cond | ||
|
||
def strategy(self, opponent: Player) -> Action: | ||
"""This is the actual strategy""" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we have a more informative doc string (or none if the strategy description covers it all) |
||
# First move | ||
if not self.history: | ||
return C | ||
|
||
if (len(opponent.history)>=2): | ||
|
||
# update history_by_cond | ||
# (i.e. update Rp) | ||
self.update_history_by_cond(opponent.history) | ||
|
||
two_moves_ago = (self.history[-2],opponent.history[-2]) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. spaces after commas in tuples |
||
r_plus = (two_moves_ago,opponent.history[-1]) | ||
r_minus = (two_moves_ago,({C,D}-{opponent.history[-1]}).pop()) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. spaces around operators |
||
|
||
if r_plus[0] not in self.Rc.keys(): | ||
if self.should_promote(r_plus,self.promotion_threshold): | ||
self.Rc[r_plus[0]]=action_to_int(r_plus[1]) | ||
self.violation_counts[r_plus[0]]=0 | ||
self.violation_counts[r_plus[0]]=0 | ||
|
||
# (if r+ or r- in Rc) | ||
if r_plus[0] in self.Rc.keys(): | ||
to_check = (C if self.Rc[r_plus[0]]==1 else D) | ||
# (if r+ in Rc) | ||
if r_plus[1]==to_check: | ||
# set the violation count of r+ to 0 | ||
self.violation_counts[r_plus[0]]=0 | ||
# (if r- in Rc) | ||
elif r_minus[1]==to_check: | ||
# increment violation count of r- | ||
self.violation_counts[r_plus[0]]+=1 | ||
if self.should_demote(r_minus,self.violation_threshold): | ||
self.Rd.update(self.Rc) | ||
self.Rc.clear() | ||
self.violation_counts.clear() | ||
self.v=0 | ||
|
||
# r+ in Rc | ||
r_plus_in_Rc = (r_plus[0] in self.Rc.keys() and self.Rc[r_plus[0]]==action_to_int(r_plus[1])) | ||
# r- in Rd | ||
r_minus_in_Rd = (r_minus[0] in self.Rd.keys() and self.Rd[r_minus[0]]==action_to_int(r_minus[1])) | ||
|
||
if r_minus_in_Rd: | ||
self.v+=1 | ||
|
||
if self.v > self.reject_threshold or (r_plus_in_Rc and r_minus_in_Rd): | ||
self.Rd.clear() | ||
self.v = 0 | ||
|
||
# compute Rp for conditions that are neither in Rc or Rd | ||
Rp = Policy() | ||
all_cond = [(C,C),(C,D),(D,C),(D,D)] | ||
for outcome in all_cond: | ||
if ((outcome not in self.Rc.keys()) and (outcome not in self.Rd.keys())): | ||
# then we need to compute opponent's C answer probability | ||
Rp[outcome] = self.compute_prob_rule(outcome,self.alpha) | ||
|
||
self.Pi=Policy() | ||
# algorithm ensure no duplicate keys -> no key overwriting | ||
self.Pi.update(self.Rc) | ||
self.Pi.update(self.Rd) | ||
self.Pi.update(Rp) | ||
|
||
# React to the opponent's last move | ||
return MoveGen((self.history[-1],opponent.history[-1]),self.Pi,depth_search_tree=self.tree_depth) | ||
|
||
|
||
# Policy as defined in the article, i.e. a set of (last_move,p) where p is | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we move the Policy class above the strategy? |
||
# the probability to cooperate in the next move considering last move | ||
class Policy(dict): | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we write a good docstring and move the comments just above into the docstring? |
||
def __init__(self): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You can use the default There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have deleted the Policy class. Policies are now represented by a simple dictionary, and there is a create_policy methods to instantiate them. |
||
super().__init__() | ||
|
||
@classmethod | ||
def prob_policy(cls,pCC,pCD,pDC,pDD): | ||
pol = cls() | ||
pol[(C,C)]=pCC | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
pol[(C,D)]=pCD | ||
pol[(D,C)]=pDC | ||
pol[(D,D)]=pDD | ||
return pol | ||
|
||
def proba(self,action1,action2): | ||
return self[(action1,action2)] | ||
|
||
# Nodes used to build a tree for the tree-search procedure | ||
# The tree has Determinist ans Stochastic nodes, as the opponent's | ||
# strategy is learned as a probability distribution | ||
|
||
class Node(object): | ||
|
||
# abstract method | ||
def get_siblings(self): | ||
raise NotImplementedError('subclasses must override get_siblings()!') | ||
|
||
# abstract method | ||
def is_stochastic(self): | ||
raise NotImplementedError('subclasses must override is_stochastic()!') | ||
|
||
class StochasticNode(Node): | ||
"Node that have a probability p to get to each sibling" | ||
"Nodes (C,*) or (D,*)" | ||
|
||
def __init__(self,own_action,pC,depth): | ||
self.pC = pC | ||
self.depth = depth | ||
self.own_action = own_action | ||
|
||
def get_siblings(self): | ||
# siblings of a stochastic node get depth += 1 | ||
opponent_c_choice = DeterministNode(self.own_action,C,self.depth+1) | ||
opponent_d_choice = DeterministNode(self.own_action,D,self.depth+1) | ||
return (opponent_c_choice,opponent_d_choice) | ||
|
||
def is_stochastic(self): | ||
return True | ||
|
||
class DeterministNode(Node): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please rename to |
||
"Node (C,C),(C,D),(D,C), or (D,D) with determinist choice for siblings" | ||
|
||
def __init__(self,action1,action2,depth): | ||
self.action1 = action1 | ||
self.action2 = action2 | ||
self.depth = depth | ||
|
||
def get_siblings(self,policy): | ||
# build 2 siblings (C,*) and (D,*) | ||
# siblings of a DeterministicNode are Stochastic, and are of the | ||
# same depth | ||
c_choice = StochasticNode(C, | ||
policy.proba(self.action1,self.action2),self.depth) | ||
d_choice = StochasticNode(D, | ||
policy.proba(self.action1,self.action2),self.depth) | ||
return (c_choice,d_choice) | ||
|
||
def is_stochastic(self): | ||
return False | ||
|
||
def get_value(self): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Suggestion: this might be better with a dict mapping a tuple of action1 and action2 to a value:
|
||
if (self.action1==C and self.action2==C): | ||
return 3 | ||
elif (self.action1==C and self.action2==D): | ||
return 0 | ||
elif (self.action1==D and self.action2==C): | ||
return 5 | ||
elif (self.action1==D and self.action2==D): | ||
return 1 | ||
|
||
# tree search function (minimax search procedure) | ||
def F(begin_node,policy,max_depth): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let's give |
||
if begin_node.is_stochastic(): | ||
# a stochastic node cannot has the same depth than its parent node hence there is no need to check that his depth < max_depth | ||
siblings = begin_node.get_siblings() | ||
# The stochastic node value is the expected values of siblings | ||
node_value = begin_node.pC*F(siblings[0],policy,max_depth) + (1-begin_node.pC)*F(siblings[1],policy,max_depth) | ||
return node_value | ||
else: # determinist node | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. deterministic |
||
if begin_node.depth==max_depth: | ||
# this is an end node, we just return its outcome value | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For comments please capitalize and use complete sentences when possible |
||
return begin_node.get_value() | ||
elif begin_node.depth==0: | ||
siblings = begin_node.get_siblings(policy) | ||
# this returns the two max expected values, for choice C or D, | ||
# as a tuple | ||
return (F(siblings[0],policy,max_depth) + begin_node.get_value(), | ||
F(siblings[1],policy,max_depth) + begin_node.get_value()) | ||
elif begin_node.depth<max_depth: | ||
siblings = begin_node.get_siblings(policy) | ||
# the determinist node value is the max of both siblings values | ||
# + the score of the outcome of the node | ||
a = F(siblings[0],policy,max_depth) | ||
b = F(siblings[1],policy,max_depth) | ||
node_value = max(a,b) + begin_node.get_value() | ||
return node_value | ||
|
||
# returns the best move considering opponent's policy and last move, | ||
# using tree-search | ||
def MoveGen(outcome,policy,depth_search_tree=5): | ||
current_node = DeterministNode(outcome[0],outcome[1],depth=0) | ||
values_of_choices = F(current_node,policy,depth_search_tree) | ||
# returns the Action which correspond to the best choice in terms of | ||
# expected value. In case value(C)==value(D), returns C | ||
actions_tuple = (C,D) | ||
return actions_tuple[values_of_choices.index(max(values_of_choices))] | ||
|
||
|
||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion for readability:
Or use a dictionary: