Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dbs #976

Merged
merged 23 commits into from
May 5, 2017
Merged

Dbs #976

merged 23 commits into from
May 5, 2017

Conversation

edouardArgenson
Copy link
Contributor

Hello,
I have written code for one of the desired new strategies, (DBS, DesiredBeliefStrategy).
ref: http://www.cs.utexas.edu/%7Echiu/papers/Au06NoisyIPD.pdf

I've followed the algorithm description in the article, and used the same variable names. When some details where not specified I made a few tests to ensure I had a good implementation.

I've checked that it work as expected in some situations, and that it performs good in noisy tournaments as it should.

I've written and run tests, but I don't really know if those are sufficient. Strategy tests, unit tests and integration tests are working, but I get warnings with strategy test.

This is my first contribution to an open source project so I don't really know if I need to check or detail more things, please let me know if I'm missing some points or anything

@marcharper
Copy link
Member

The failed test for Py 3.6 was due to hypothesis, just restarted...

Copy link
Member

@marcharper marcharper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution! On the first pass there are some formatting issues, please address those and I'll make another pass with the reference to check all the logic.

C, D = Actions.C, Actions.D

def action_to_int(action):
return (1 if action==C else 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion for readability:

if action == C:
    return 1
return 0

Or use a dictionary:

d = {C: 1, D: 0}
return d[action]

class DBS(Player):
"""
Desired Belief Strategy as described in:
Accident or Intention: That Is the Question (in the Noisy Iterated Prisoner's Dilemma) by Tsz-Chiu Au and Dana Nau from University of Maryland
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for including the reference. We have a citation file and a specific format we prefer, can you try to convert? We can help...

Copy link
Contributor Author

@edouardArgenson edouardArgenson Apr 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, the full citation is:
T.-C. Au and D. S. Nau. Accident or intention: That is the question (in the iterated prisoner’s dilemma). In Proc. Int. Conf. Auton. Agents and Multiagent Syst. (AAMAS), pp. 561–568, 2006
I guess it becomes [Au2006] in the specific format ? I'll add it to the bibliography.rst file and change the description

Accident or Intention: That Is the Question (in the Noisy Iterated Prisoner's Dilemma) by Tsz-Chiu Au and Dana Nau from University of Maryland
http://www.cs.utexas.edu/%7Echiu/papers/Au06NoisyIPD.pdf

A strategy that learns the opponent's strategy, and use symbolic noise de-
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we wrap lines at 80 and not split words with hyphens?

Default values for the parameters are the suggested values in the article
When more noise you can try to diminish violation_threshold and rejection_threshold

Parameters:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see our formatting for other strategies for the parameter docs (and thanks for including them)

# by the opponent; else G[i]=0
# F[i] = 1 if cond j was True at turn i-1; else G[i]=0
# initial hypothesized policy is TitForTat
self.history_by_cond[(C,C)]=([1],[1])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use PEP8's recommended formatting. For this line it would be:
self.history_by_cond[(C, C)] = ([1], [1])
(and similarly below)

Copy link
Contributor Author

@edouardArgenson edouardArgenson Apr 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, I'll review the code with PEP8's rules

return MoveGen((self.history[-1],opponent.history[-1]),self.Pi,depth_search_tree=self.tree_depth)


# Policy as defined in the article, i.e. a set of (last_move,p) where p is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move the Policy class above the strategy?

# Policy as defined in the article, i.e. a set of (last_move,p) where p is
# the probability to cooperate in the next move considering last move
class Policy(dict):

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we write a good docstring and move the comments just above into the docstring?

# the probability to cooperate in the next move considering last move
class Policy(dict):

def __init__(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use the default __init__ in this case so these two lines are unnecessary

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have deleted the Policy class. Policies are now represented by a simple dictionary, and there is a create_policy methods to instantiate them.

@classmethod
def prob_policy(cls,pCC,pCD,pDC,pDD):
pol = cls()
pol[(C,C)]=pCC
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pol[(C, C) = pCC

return 1

# tree search function (minimax search procedure)
def F(begin_node,policy,max_depth):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's give F a good name. If it's called F in the reference then let's say so in the doc string

@drvinceknight
Copy link
Member

This looks great @edouardArgenson, thanks for the contribution. As well as all of @marcharper's comments this is also failing on coverage (which checks that every line of code in the source files are hit during testing). Currently the following lines are not tested:

133             return True 
198                         self.Rd.update(self.Rc)                                                                                                   
199                         self.Rc.clear()                                                                                                           
200                         self.violation_counts.clear()                                                                                             
201                         self.v=0  
260         raise NotImplementedError('subclasses must override get_siblings()!') 
264         raise NotImplementedError('subclasses must override is_stochastic()!')

If we can assist with writing tests for those please let us know. 👍

@edouardArgenson
Copy link
Contributor Author

@drvinceknight ok I'll look for tests that cover those lines

@edouardArgenson
Copy link
Contributor Author

Hi @drvinceknight,

I added a test that covers the following:
133 return True
and
198 self.Rd.update(self.Rc)
199 self.Rc.clear()
200 self.violation_counts.clear()
201 self.v=0

But I have trouble writing a test for the last 2:
260 raise NotImplementedError('subclasses must override get_siblings()!')
and
264 raise NotImplementedError('subclasses must override is_stochastic()!')
Those come from two abstract methods in the Node class, which is an abstract class with two subclasses StochasticNode and DeterministNode where the methods are implemented.
I don't really know how to test those lines, because it would involve creating a third subclass of Node, in which the two methods are not implemented, hence raising the errors.
I would welcome some help on that !

@edouardArgenson
Copy link
Contributor Author

@marcharper I've pushed some changes, the code should now respect PEP8 and the formatting issues you've raised

@drvinceknight
Copy link
Member

I would welcome some help on that !

No problemo :) This is the kind of thing that would work:

 import axelrod                                                                                                                                       
+import unittest                                                                                                                                      
 from .test_player import TestPlayer                                                                                                                  
                                                                                                                                                      
 C, D = axelrod.Actions.C, axelrod.Actions.D                                                                                                          
                                                                                                                                                      
                                                                                                                                                      
+class TestNode(unittest.TestCase):                                                                                                                   
+    """                                                                                                                                              
+    Tests for the base class                                                                                                                         
+    """                                                                                                                                              
+    node = axelrod.dbs.Node()                                                                                                                        
+                                                                                                                                                     
+    def test_get_siblings(self):                                                                                                                     
+        with self.assertRaises(NotImplementedError) as context:                                                                                      
+            self.node.get_siblings()                                                                                                                 
+                                                                                                                                                     
+    def test_is_stochastic(self):                                                                                                                    
+        with self.assertRaises(NotImplementedError) as context:                                                                                      
+            self.node.is_stochastic()                                                                                                                
+                                                                                                                                                     
+                                                                                                                                                     
+  

So there I'm suggesting testing the base abstract class directly. I've checked those lines and they bump the coverage up to 100% 👍

@drvinceknight
Copy link
Member

There's a failure on the documentation.

This should be:

     def get_siblings(self, policy):                                                                                                             
         """                                                                                                                                     
         build 2 siblings :code:`(C, *)` and :code:`(D, *)`                                                                                      
         siblings of a DeterministicNode are Stochastic, and are of the                                                                          
         same depth                                                                                                                              
         """          

The * is a special rst symbol so there I'm putting in an inline code block,

I also noticed:

class StochasticNode(Node):                                                                                                                     
     "Node that have a probability p to get to each sibling"                                                                                     
     "Nodes (C, *) or (D, *)"    

Could you use """ there please.

@edouardArgenson
Copy link
Contributor Author

edouardArgenson commented Apr 20, 2017

I made some changes on that and pushed it
oh I forgot to change the test

Desired Belief Strategy as described in [Au2006]_
http://www.cs.utexas.edu/%7Echiu/papers/Au06NoisyIPD.pdf

A strategy that learns the opponent's strategy, and use symbolic
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: 'uses' rather than 'use'

article. When noise increases you can try to diminish
violation_threshold and rejection_threshold

Parameters:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring format for these parameters isn't quite correct. We use the numpy standard described at https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMENT.rst.txt

e.g. The first parameter description should be:

    discount_factor: float
        used when computing discounted frequencies to learn opponent's strategy
        Must be between 0 and 1. The default is 0.75

opposite_action = 1
k = 1
count = 0
# We iterates on the history, while we do not encounter
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: 'iterate' rather than 'iterates'

k = 1
count = 0
# We iterates on the history, while we do not encounter
# counter-exemples of r_plus, i.e. while we do not encounter
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: 'example' rather than 'exemple'

== opposite_action
and self.history_by_cond[r_plus[0]][1][1:][-k] == 1)
):
# We count every occurence of r_plus in history
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: 'occurrence' rather than 'occurence'

def should_demote(self, r_minus, violation_threshold=4):
if(self.violation_counts[r_minus[0]] >= violation_threshold):
return True
return False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: This could be done in one line with:

return(self.violation_counts[r_minus[0]] >= violation_threshold)

def is_stochastic(self):
return False

def get_value(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: this might be better with a dict mapping a tuple of action1 and action2 to a value:

values = {
    (C, C): 3,
    (C, D): 0,
    (D, C): 5,
    (D, D): 1
}
return values[(action1, action2)]


Parameters
----------
discount_factor : float, optional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of optional, can we put in the defaults from init into the docstring?

Copy link
Contributor Author

@edouardArgenson edouardArgenson Apr 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The defaults from init are currently specified in the description of each parameters. I can change that, but here I followed the numpy standard suggested by meatballs https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMENT.rst.txt

where p is the probability to cooperate after prev_move,
where prev_move can be (C, C), (C, D), (D, C) or (D, D)
"""
pol = {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine as is but you might consider:
pol = {(C, C): pCC, (C, D): pCD, (D, C): pDC, (D, D): pDD}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it's cleaner

# The stochastic node value is the expected values of siblings
node_value = (
begin_node.pC * minimax_tree_search(
siblings[0],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indent should be <---- (4 spaces deeper than the line above), or:

node_value = (
    begin_node.pC * minimax_tree_search(
        siblings[0], policy, max_depth),
   ...

max_depth)
)
return node_value
else: # determinist node
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deterministic

return node_value
else: # determinist node
if begin_node.depth == max_depth:
# this is an end node, we just return its outcome value
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For comments please capitalize and use complete sentences when possible

@@ -0,0 +1,108 @@
"""Tests DBS strategy."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have tests for the tree search functions as well?


# default opponent's policy is TitForTat
self.Rd = create_policy(1, 1, 0, 0)
self.Rc = {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the short variables can you explain with a comment what they are? I assume these are from the paper but we don't want anyone reading the code to have to read the paper to understand the algorithm.

self.violation_threshold = violation_threshold
self.promotion_threshold = promotion_threshold
self.tree_depth = tree_depth
self.v = 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment explaining the the variable v

self.history_by_cond[(D, D)] = ([0], [1])

def should_promote(self, r_plus, promotion_threshold=3):
if r_plus[1] == C:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function needs a docstring

return False

def should_demote(self, r_minus, violation_threshold=4):
return (self.violation_counts[r_minus[0]] >= violation_threshold)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docstring

def should_demote(self, r_minus, violation_threshold=4):
return (self.violation_counts[r_minus[0]] >= violation_threshold)

def update_history_by_cond(self, opponent_history):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docstring

F.append(0)

def compute_prob_rule(self, outcome, alpha):
G = self.history_by_cond[outcome][0]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docstring

if (self.history_by_cond[r_plus[0]][1][1:][-k] == 1):
count += 1
k += 1
if(count >= promotion_threshold):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space after if

F = self.history_by_cond[outcome][1]
discounted_g = 0
discounted_f = 0
alpha_k = 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does alpha_k ever change? Seems redundant below if it is always equal to 1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes alpha_k is iterated in the loop:
alpha_k = alpha * alpha_k

discounted_f = 0
alpha_k = 1
for g,f in zip(G[::-1], F[::-1]):
discounted_g += alpha_k*g
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spaces around operators please

super().__init__()

# default opponent's policy is TitForTat
self.Rd = create_policy(1, 1, 0, 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've called policies "four vectors" elsewhere (in strategies/memoryone.py), are they always dictionaries of size 4?

Copy link
Contributor Author

@edouardArgenson edouardArgenson Apr 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They can be of size 0, 1, 2, 3 or 4

Copy link
Contributor Author

@edouardArgenson edouardArgenson Apr 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A question: maybe I should put the definition of create_policy inside the DBS class, as it is only used inside the class ?

return True


class DeterministNode(Node):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rename to DeterministicNode

@marcharper
Copy link
Member

Making good progress -- it needs more comments and more tests, and there are some minor style issues.

violation_threshold and rejection_threshold

Parameters
----------
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests are failing because we don't quite use numpy notation exactly for strategies. In this case we need to remove the heading. So not:

Parameters
----------------

but simply

Parameters

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@edouardArgenson
Copy link
Contributor Author

Thanks, working on all that

… and is_stochastic functions. Corrects syntax error when initializing history_by_cond in constructor
@drvinceknight
Copy link
Member

@edouardArgenson there is now a merge conflict in the bibliography.rst file with the master branch. Let us know if you need a hand resolving that :)

@edouardArgenson
Copy link
Contributor Author

merged conflict resolved ;)

@marcharper marcharper merged commit 4891d18 into Axelrod-Python:master May 5, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants