Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite the result set #1166

Merged
merged 11 commits into from
Feb 6, 2018
Merged

Rewrite the result set #1166

merged 11 commits into from
Feb 6, 2018

Commits on Feb 3, 2018

  1. Re write the result set

    At present the result set plays the matches in parallel followed by a
    (sometimes computationally expensive) single process of reading in and
    analysing the interactions.
    
    TLDR: This changes how the internal result set calculations are being
    done. They are more efficiently calculated.
    
    This happens by doing the following:
    
    1. The various match by match calculations are done by the tournament
       (the winner of a match, the cooperation count, the score etc...).
    2. This calculations are written to file (at present we just write the
      actual actions of an interaction).
    3. The form of this file has also changed: for every match there are 2
       rows. One row corresponds to each player. This might seem costly storage
       wise but is done to enable faster analysis (more on that later).
    4. The analysis is now done entirely using `dask`: "Dask is a flexible
       parallel computing library for analytic computing."
       (https://dask.pydata.org/). This ensures all calculations are done on
       disk (so no huge memory problems) but also that they can be done **in
       parallel**. This is all done using the nice Pandas-like API that dask
       has so essentially all calculations for the result set are just done
       by a few `groupby` statements.
    5. There is *some* work being done outside of dask but that's just
       reshaping data. `dask` outputs `pandas.Series` and to be consistent
       with our current setup these are changes to be lists of list etc...
    
    **Some user facing changes** (which is why I suggest this would be for a
    `v4.0.0` release):
    
    - The `result_set.interactions` is no longer possible. This is a choice
      and not forced by the redesign: I don't think this is ever necessary or
      helpful. The data file can always be read in.
    - The ability to `tournament.play(in_memory=True)` has been removed.
      Again, not entirely a forced change (although it would be a tiny bit
      of work to allow this). Given the recent work to make everything work
      on Windows I don't think this is necessary and has allowed for a big
      deletion of code (which is a good thing re maintenance).
    - The interactions data file is now in a different format, this is
      forced by the design choice.
    - I have made a slight modification to `result_set.cooperation`.
      Currently this sets all self interactions to be 0 but I think that's not
      the right way to display it (note that the cooperation rates were all
      being done correctly).
    
    **As well as ensuring the work done in series is reduced and the parallel
    workers also calculate the scores (which I think is more efficient)**
    this also seems to be faster:
    
    On this branch:
    
    ```python
    import axelrod as axl
    players = [s() for s in axl.strategies if s.classifier["memory_depth"] == 1]
    tournament = axl.Tournament(players, turns=200, repetitions=100)
    results = tournament.play(processes=4)
    ```
    
    Took: 1min 49s
    
    ```python
    import axelrod as axl
    players = [s() for s in axl.short_run_time_strategies]
    tournament = axl.Tournament(players, turns=200, repetitions=20)
    results = tournament.play(processes=4)
    ```
    
    Took: 21min 2s
    
    On current master;
    
    ```python
    import axelrod as axl
    players = [s() for s in axl.strategies if s.classifier["memory_depth"] == 1]
    tournament = axl.Tournament(players, turns=200, repetitions=100)
    results = tournament.play(processes=4)
    ```
    
    Took: 2min 36s
    
    ```python
    import axelrod as axl
    players = [s() for s in axl.short_run_time_strategies]
    tournament = axl.Tournament(players, turns=200, repetitions=20)
    results = tournament.play(processes=4)
    ```
    
    Took: 28min 8s
    
    **One final aspect to consider** (I think) is that adding `dask` as a
    dependency open up the potential to use it's `delayed` decorator to do
    all our parallel processing. This would have the benefit of being able
    to use a distributed scheduler that `dask` has. (We'd have to
    investigate if this actually works based on our parallelisation but at
    least the possibility is there).
    drvinceknight authored and marcharper committed Feb 3, 2018
    Configuration menu
    Copy the full SHA
    2e1f4df View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    9c9f335 View commit details
    Browse the repository at this point in the history
  3. Add iu.read_interactions_from_file to docs.

    This highlights it as a user facing function but I'm happy to do
    something further.
    drvinceknight authored and marcharper committed Feb 3, 2018
    Configuration menu
    Copy the full SHA
    1cbd71a View commit details
    Browse the repository at this point in the history
  4. Use defaultdict.

    drvinceknight authored and marcharper committed Feb 3, 2018
    Configuration menu
    Copy the full SHA
    da99c79 View commit details
    Browse the repository at this point in the history
  5. Modify file comparison to not be shallow.

    Hopefully this now works on windows.
    drvinceknight authored and marcharper committed Feb 3, 2018
    Configuration menu
    Copy the full SHA
    f8b53aa View commit details
    Browse the repository at this point in the history
  6. Use pd.Dataframe.equals

    Attempting to make file comparison work on windows.
    drvinceknight authored and marcharper committed Feb 3, 2018
    Configuration menu
    Copy the full SHA
    b0eb05d View commit details
    Browse the repository at this point in the history
  7. Move swapping of states order to outer context

    Note I did not use `reversed()`, this doesn't return a tuple, I also
    didn't use `.reverse()` this only works on lists.
    drvinceknight authored and marcharper committed Feb 3, 2018
    Configuration menu
    Copy the full SHA
    e3da817 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    e10fb4f View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    30dfed1 View commit details
    Browse the repository at this point in the history
  10. Use np namespace.

    drvinceknight authored and marcharper committed Feb 3, 2018
    Configuration menu
    Copy the full SHA
    5470a4c View commit details
    Browse the repository at this point in the history
  11. Configuration menu
    Copy the full SHA
    43cfc53 View commit details
    Browse the repository at this point in the history