-
Notifications
You must be signed in to change notification settings - Fork 264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Re factor/implement first tournament strategies #1275
Changes from 1 commit
7520eec
b5e8310
0bf356c
8c59778
d4e1552
33c903f
aa417a1
3dc60fb
75ee7c5
b54c272
061e7f2
0dde6e2
eccd7e2
dc7a710
659d997
ce1d6e6
6e6a573
d8cee8d
f4e162b
ecbd9c0
0c31588
e6b9349
e7cf5ad
a6471e8
6c46483
e396646
b9b00a2
49e8983
d24c304
468a829
36e82a2
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -73,107 +73,181 @@ def strategy(self, opponent: Player) -> Action: | |
return D | ||
return C | ||
|
||
# TODO Split this in to ttwo strategies, it's not clear to me from the internet | ||
# sources that the first implentation was buggy as opposed to just "poorly | ||
# thought out". The flaw is actually clearly described in the paper's | ||
# description: "Initially, they are both assumed to be .5, which amounts to the | ||
# pessimistic assumption that the other player is not responsive" | ||
# The revised version should be put in it's own module. | ||
# I also do not understand where the decision rules come from. | ||
# Need to read https://journals.sagepub.com/doi/10.1177/003755007500600402 to | ||
# gain understanding of decision rule. | ||
class RevisedDowning(Player): | ||
class FirstByDowning(Player): | ||
""" | ||
Submitted to Axelrod's first tournament by Downing | ||
|
||
The description written in [Axelrod1980]_ is: | ||
|
||
> "This rule selects its choice to maximize its own long- term expected payoff on | ||
> "This rule selects its choice to maximize its own longterm expected payoff on | ||
> the assumption that the other rule cooperates with a fixed probability which | ||
> depends only on whether the other player cooperated or defected on the previous | ||
> move. These two probabilities estimates are con- tinuously updated as the game | ||
> move. These two probabilities estimates are continuously updated as the game | ||
> progresses. Initially, they are both assumed to be .5, which amounts to the | ||
> pessimistic assumption that the other player is not responsive. This rule is | ||
> based on an outcome maximization interpretation of human performances proposed | ||
> by Downing (1975)." | ||
|
||
This strategy attempts to estimate the next move of the opponent by estimating | ||
the probability of cooperating given that they defected (:math:`p(C|D)`) or | ||
cooperated on the previous round (:math:`p(C|C)`). These probabilities are | ||
continuously updated during play and the strategy attempts to maximise the long | ||
term play. Note that the initial values are :math:`p(C|C)=p(C|D)=.5`. | ||
The Downing (1975) paper is "The Prisoner's Dilemma Game as a | ||
Problem-Solving Phenomenon" [Downing1975]_ and this is used to implement the | ||
strategy. | ||
|
||
# TODO: This paragraph is not correct (see note above) | ||
Downing is implemented as `RevisedDowning`. Apparently in the first tournament | ||
the strategy was implemented incorrectly and defected on the first two rounds. | ||
This can be controlled by setting `revised=True` to prevent the initial defections. | ||
There are a number of specific points in this paper, on page 371: | ||
|
||
This strategy came 10th in Axelrod's original tournament but would have won | ||
if it had been implemented correctly. | ||
> "[...] In these strategies, O's [the opponent's] response on trial N is in | ||
some way dependent or contingent on S's [the subject's] response on trial N- | ||
1. All varieties of these lag-one matching strategies can be defined by two | ||
parameters: the conditional probability that O will choose C folloging C by | ||
marcharper marked this conversation as resolved.
Show resolved
Hide resolved
|
||
S, P(C_o | C_s) and the conditional probability that O will choose C | ||
following D by S, P(C_o, D_s)." | ||
|
||
Throughout the paper the strategy (S) assumes that the opponent (D) is | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I believe you meant |
||
playing a reactive strategy defined by these two conditional probabilities. | ||
|
||
The strategy aims to maximise the long run utility against such a strategy | ||
and the mechanism for this is described in Appendix A (more on this later). | ||
|
||
One final point from the main text is, on page 372: | ||
|
||
> "For the various lag-one matching strategies of O, the maximizing | ||
strategies of S will be 100% C, or 100% D, or for some strategies all S | ||
strategies will be functionaly equivalent." | ||
marcharper marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
This implies that the strategy S will either always cooperate or always | ||
defect (or be indifferent) dependent on the opponent's defining | ||
probabilities. | ||
|
||
To understand the particular mechanism that describes the strategy S, we | ||
refer to Appendix A of the paper on page 389. | ||
|
||
The state goal of the strategy is to maximize (using the notation of the | ||
marcharper marked this conversation as resolved.
Show resolved
Hide resolved
|
||
paper): | ||
|
||
EV_TOT = #CC(EV_CC) + #CD(EV_CD) + #DC(EV_DC) + #DD(EV_DD) | ||
|
||
I.E. The player aims to maximise the expected value of being in each state | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. After our conversation today I feel (not sure) that this might need to re-written. #CC is not a state but the number of times the strategy S cooperated twice... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah I agree, thanks @Nikoleta-v3 I'll work on this tomorrow (long day!). |
||
weighted by the number of times we expect to be in that state. | ||
|
||
On the second page of the appendix, figure 4 (page 390) supposedly | ||
identifies an expression for EV_TOT however it is not clear how some of the | ||
steps are carried out. To the best guess, it seems like an asymptotic | ||
marcharper marked this conversation as resolved.
Show resolved
Hide resolved
|
||
argument is being used. Furthermore, a specific term is made to disappear in | ||
the case of T - R = P - S (which is not the case for the standard | ||
(R, P, S, T) = (3, 1, 0, 5)): | ||
|
||
> "Where (t - r) = (p - s), EV_TOT will be a function of alpha, beta, t, r, | ||
p, s and N are known and V which is unknown. | ||
|
||
V is the total number of cooperations of the player S (this is noted earlier | ||
in the abstract) and as such the final expression (with only V as unknown) | ||
can be used to decide if V should indicate that S always cooperates or not. | ||
|
||
Given the lack of usable details in this paper, the following interpretation | ||
is used to implement this strategy: | ||
|
||
1. On any given turn, the strategy will estimate alpha = P(C_o | C_s) and | ||
beta = P(C_o | D_s). | ||
2. The stragy will calculate the expected utility of always playing C OR | ||
marcharper marked this conversation as resolved.
Show resolved
Hide resolved
|
||
always playing D against the estimage probabilities. This corresponds to: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. estimated? |
||
|
||
a. In the case of the player always cooperating: | ||
|
||
P_CC = alpha and P_CD = 1 - alpha | ||
|
||
b. In the case of the player always defecting: | ||
|
||
P_DC = beta and P_DD = 1 - beta | ||
|
||
|
||
Using this we have: | ||
|
||
E_C = alpha R + (1 - alpha) S | ||
E_D = beta T + (1 - beta) P | ||
|
||
Thus at every turn, the strategy will calculate those two values and | ||
cooperate if E_C > E_D and will defect if E_C < E_D. | ||
|
||
In the case of E_C = E_D, the player will alternate from their previous | ||
move. This is based on specific sentence from Axelrod's original paper: | ||
|
||
> "Under certain circumstances, DOWNING will even determine that the best | ||
> strategy is to alternate cooperation and defection." | ||
|
||
One final important point is the early game behaviour of the strategy. It | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this the "bug" that is later referred to? If so can we explicitly mention that here in the docstring? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So it wasn't actually a "bug", Axelrod discusses it at length in the paper so I'm relatively sure (I wouldn't bet my house on it) that the strategy was implemented as intended. However it seems that it was accepted afterwards that the intent was a mistake. I'll add that in future tournaments this strategy was implemented with a modified initial behaviour. |
||
has been noted that this strategy was implemented in a way that assumed that | ||
alpha and beta were both 1/2: | ||
|
||
> "Initially, they are both assumed to be .5, which amounts to the | ||
> pessimistic assumption that the other player is not responsive." | ||
|
||
Thus, the player opens with a defection in the first two rounds. Note that | ||
from the Axelrod publications alone there is nothing to indicate defections | ||
on the first two rounds, although a defection in the opening round is clear. | ||
However there is a presentation available at | ||
http://www.sci.brooklyn.cuny.edu/~sklar/teaching/f05/alife/notes/azhar-ipd-Oct19th.pdf | ||
That clearly states that Downing defected in the first two rounds, thus this | ||
is assumed to be the behaviour. | ||
|
||
Note that response to the first round allows us to estimate | ||
beta = P(C_o | D_s) and we will use the opening play of the player to | ||
estimate alpha = P(C_o | C_s). This is an assumption with no clear | ||
indication from the literature. | ||
|
||
-- | ||
This strategy came 10th in Axelrod's original tournament. | ||
|
||
Names: | ||
|
||
- Revised Downing: [Axelrod1980]_ | ||
""" | ||
|
||
name = "Revised Downing" | ||
name = "First tournament by Downing" | ||
marcharper marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
classifier = { | ||
"memory_depth": float("inf"), | ||
"stochastic": False, | ||
"makes_use_of": set(), | ||
"makes_use_of": {"game"}, | ||
"long_run_time": False, | ||
"inspects_source": False, | ||
"manipulates_source": False, | ||
"manipulates_state": False, | ||
} | ||
|
||
def __init__(self, revised: bool = True) -> None: | ||
def __init__(self) -> None: | ||
super().__init__() | ||
self.revised = revised | ||
self.good = 1.0 | ||
self.bad = 0.0 | ||
self.nice1 = 0 | ||
self.nice2 = 0 | ||
self.total_C = 0 # note the same as self.cooperations | ||
self.total_D = 0 # note the same as self.defections | ||
self.number_opponent_cooperations_in_response_to_C = 0 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we track this in the state distributions in the history class, but perhaps you want to be more explicit here There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd completely forgotten about our new shiny history class! In this instance though I don't think we do track this: >>> import axelrod as axl
>>> players = (axl.CyclerCCD(), axl.CyclerDC())
>>> match = axl.Match(players)
>>> interactions = match.play()
>>> players[0].history.state_distribution
Counter({(C, D): 67, (C, C): 67, (D, D): 33, (D, C): 33}) Unless I'm mistaken that's returning the strategy distributions however I there need to count the response to I've looked through the other properties and don't see what we need in there but that could be something nice to add? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Gotcha, let's leave it as is for now and we can consider adding something to the history class later. |
||
self.number_opponent_cooperations_in_response_to_D = 0 | ||
|
||
def strategy(self, opponent: Player) -> Action: | ||
round_number = len(self.history) + 1 | ||
# According to internet sources, the original implementation defected | ||
# on the first two moves. Otherwise it wins (if this code is removed | ||
# and the comment restored. | ||
# http://www.sci.brooklyn.cuny.edu/~sklar/teaching/f05/alife/notes/azhar-ipd-Oct19th.pdf | ||
|
||
if self.revised: | ||
if round_number == 1: | ||
return C | ||
elif not self.revised: | ||
if round_number <= 2: | ||
return D | ||
|
||
# Update various counts | ||
if round_number > 2: | ||
if self.history[-1] == D: | ||
if opponent.history[-1] == C: | ||
self.nice2 += 1 | ||
self.total_D += 1 | ||
self.bad = self.nice2 / self.total_D | ||
else: | ||
if opponent.history[-1] == C: | ||
self.nice1 += 1 | ||
self.total_C += 1 | ||
self.good = self.nice1 / self.total_C | ||
# Make a decision based on the accrued counts | ||
c = 6.0 * self.good - 8.0 * self.bad - 2 | ||
alt = 4.0 * self.good - 5.0 * self.bad - 1 | ||
if c >= 0 and c >= alt: | ||
move = C | ||
elif (c >= 0 and c < alt) or (alt >= 0): | ||
move = self.history[-1].flip() | ||
else: | ||
move = D | ||
return move | ||
if round_number == 1: | ||
return D | ||
if round_number == 2: | ||
if opponent.history[-1] == C: | ||
self.number_opponent_cooperations_in_response_to_C += 1 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Seems like this is a response to a D since this strategy always defects on round 1. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah no that's not what's happening here but I understand the confusion so I'm happy to improve how this is written. What is happening: I assume that the player is using these first two turns to build up an early estimate of alpha (P(C|C) - probability the opponent cooperates in turn k + 1 if the player cooperated in turn k) and beta (P(C|D) - probability the opponent cooperates in turn k + 1 if the player defected in turn k). I use the opening player of the opponent (k = 1) to estimate alpha - ie I assume the opponent plays their opening round as if the player had cooperated in round k = 0 (a round that does not exist). I use the second play of the opponent (k=2) to estimate beta as the player defects in round k=1. The player does also defect in round k=2. Thus, There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I see, thanks! |
||
return D | ||
|
||
|
||
if self.history[-2] == C and opponent.history[-1] == C: | ||
self.number_opponent_cooperations_in_response_to_C += 1 | ||
if self.history[-2] == D and opponent.history[-1] == C: | ||
self.number_opponent_cooperations_in_response_to_D += 1 | ||
|
||
alpha = (self.number_opponent_cooperations_in_response_to_C / | ||
(self.cooperations + 1)) # Adding 1 to count for opening move | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm confused again here -- isn't the first move always a defection? Or is this a bug in the original implementation that we're preserving? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I understand the confusion, this can probably be improved. The adjustment here is to account for the "phantom" non existent round 0 cooperation by the player that is used to estimate alpha. So for example, if the plays are: [(D, C), (D, C)] Then the opponent's first cooperation counts as a cooperation in response to the non existent cooperation of round 0. The total number of cooperations in response to a cooperation is 1. We need to take in to account that extra phantom cooperation to estimate the probability alpha=P(C|C) as 1 / 1 = 1. This is all a best guess of course. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok, makes sense. Let's comment this well in the code. |
||
beta = (self.number_opponent_cooperations_in_response_to_D / | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe add a comment staying that we're never dividing by zero here |
||
(self.defections)) | ||
|
||
R, P, S, T = self.match_attributes["game"].RPST() | ||
expected_value_of_cooperating = alpha * R + (1 - alpha) * S | ||
expected_value_of_defecting = beta * T + (1 - beta) * P | ||
|
||
if expected_value_of_cooperating > expected_value_of_defecting: | ||
return C | ||
if expected_value_of_cooperating < expected_value_of_defecting: | ||
return D | ||
return self.history[-1].flip() | ||
|
||
|
||
class FirstByFeld(Player): | ||
|
@@ -278,8 +352,6 @@ class FirstByGraaskamp(Player): | |
so it plays Tit For Tat. If not it cooperates and randomly defects every 5 | ||
to 15 moves. | ||
|
||
# TODO Compare this to Fortran code. | ||
|
||
Note that there is no information about 'Analogy' available thus Step 5 is | ||
a "best possible" interpretation of the description in the paper. | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since RevisedDowning is in the second tournament, can we also preserve that implementation and/or compare to the fortran implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I meant to write that in my PR: we need to consider what we do with
RevisedDowning
. As it's in the second tournament is it worth waiting until we translate https://github.com/Axelrod-Python/TourExec/blob/v0.2.0/src/strategies/K59R.f or implementRevisedDowning
as a modification of the strategy in this PR?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current
RevisedDowning
looks really similar to the Fortran code. If they are basically the same I'm in favor of leavingRevisedDowning
(eliminating therevised
bool) as the second tournament implementation. Maybe comparing the fingerprints will tell us if it's essentially correct or not?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah good call.
I'm struggling to get
axelrod_fortran
to work on my current machine (I blame an OS update), could you or @meatballs if you get time paste fingerprints for "k59r".Something like:
Here are the equivalent for
RevisedDowning
:Ashlock:
Transitive:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are really similar but not 100% identical
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gosh they're incredibly similar though. I'm happy and looking at the history of
RevisedDowning
you implemented it from the Fortran code and I suspect you did it right.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep it and then tweak it in a follow-up PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine by me. 👍