-
Notifications
You must be signed in to change notification settings - Fork 264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite the result set #1166
Merged
Merged
Rewrite the result set #1166
Commits on Feb 3, 2018
-
At present the result set plays the matches in parallel followed by a (sometimes computationally expensive) single process of reading in and analysing the interactions. TLDR: This changes how the internal result set calculations are being done. They are more efficiently calculated. This happens by doing the following: 1. The various match by match calculations are done by the tournament (the winner of a match, the cooperation count, the score etc...). 2. This calculations are written to file (at present we just write the actual actions of an interaction). 3. The form of this file has also changed: for every match there are 2 rows. One row corresponds to each player. This might seem costly storage wise but is done to enable faster analysis (more on that later). 4. The analysis is now done entirely using `dask`: "Dask is a flexible parallel computing library for analytic computing." (https://dask.pydata.org/). This ensures all calculations are done on disk (so no huge memory problems) but also that they can be done **in parallel**. This is all done using the nice Pandas-like API that dask has so essentially all calculations for the result set are just done by a few `groupby` statements. 5. There is *some* work being done outside of dask but that's just reshaping data. `dask` outputs `pandas.Series` and to be consistent with our current setup these are changes to be lists of list etc... **Some user facing changes** (which is why I suggest this would be for a `v4.0.0` release): - The `result_set.interactions` is no longer possible. This is a choice and not forced by the redesign: I don't think this is ever necessary or helpful. The data file can always be read in. - The ability to `tournament.play(in_memory=True)` has been removed. Again, not entirely a forced change (although it would be a tiny bit of work to allow this). Given the recent work to make everything work on Windows I don't think this is necessary and has allowed for a big deletion of code (which is a good thing re maintenance). - The interactions data file is now in a different format, this is forced by the design choice. - I have made a slight modification to `result_set.cooperation`. Currently this sets all self interactions to be 0 but I think that's not the right way to display it (note that the cooperation rates were all being done correctly). **As well as ensuring the work done in series is reduced and the parallel workers also calculate the scores (which I think is more efficient)** this also seems to be faster: On this branch: ```python import axelrod as axl players = [s() for s in axl.strategies if s.classifier["memory_depth"] == 1] tournament = axl.Tournament(players, turns=200, repetitions=100) results = tournament.play(processes=4) ``` Took: 1min 49s ```python import axelrod as axl players = [s() for s in axl.short_run_time_strategies] tournament = axl.Tournament(players, turns=200, repetitions=20) results = tournament.play(processes=4) ``` Took: 21min 2s On current master; ```python import axelrod as axl players = [s() for s in axl.strategies if s.classifier["memory_depth"] == 1] tournament = axl.Tournament(players, turns=200, repetitions=100) results = tournament.play(processes=4) ``` Took: 2min 36s ```python import axelrod as axl players = [s() for s in axl.short_run_time_strategies] tournament = axl.Tournament(players, turns=200, repetitions=20) results = tournament.play(processes=4) ``` Took: 28min 8s **One final aspect to consider** (I think) is that adding `dask` as a dependency open up the potential to use it's `delayed` decorator to do all our parallel processing. This would have the benefit of being able to use a distributed scheduler that `dask` has. (We'd have to investigate if this actually works based on our parallelisation but at least the possibility is there).
Configuration menu - View commit details
-
Copy full SHA for 2e1f4df - Browse repository at this point
Copy the full SHA 2e1f4dfView commit details -
Configuration menu - View commit details
-
Copy full SHA for 9c9f335 - Browse repository at this point
Copy the full SHA 9c9f335View commit details -
Add iu.read_interactions_from_file to docs.
This highlights it as a user facing function but I'm happy to do something further.
Configuration menu - View commit details
-
Copy full SHA for 1cbd71a - Browse repository at this point
Copy the full SHA 1cbd71aView commit details -
Configuration menu - View commit details
-
Copy full SHA for da99c79 - Browse repository at this point
Copy the full SHA da99c79View commit details -
Modify file comparison to not be shallow.
Hopefully this now works on windows.
Configuration menu - View commit details
-
Copy full SHA for f8b53aa - Browse repository at this point
Copy the full SHA f8b53aaView commit details -
Attempting to make file comparison work on windows.
Configuration menu - View commit details
-
Copy full SHA for b0eb05d - Browse repository at this point
Copy the full SHA b0eb05dView commit details -
Move swapping of states order to outer context
Note I did not use `reversed()`, this doesn't return a tuple, I also didn't use `.reverse()` this only works on lists.
Configuration menu - View commit details
-
Copy full SHA for e3da817 - Browse repository at this point
Copy the full SHA e3da817View commit details -
Configuration menu - View commit details
-
Copy full SHA for e10fb4f - Browse repository at this point
Copy the full SHA e10fb4fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 30dfed1 - Browse repository at this point
Copy the full SHA 30dfed1View commit details -
Configuration menu - View commit details
-
Copy full SHA for 5470a4c - Browse repository at this point
Copy the full SHA 5470a4cView commit details -
Configuration menu - View commit details
-
Copy full SHA for 43cfc53 - Browse repository at this point
Copy the full SHA 43cfc53View commit details
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.