-
Notifications
You must be signed in to change notification settings - Fork 348
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementation of Double DQN #52
base: master
Are you sure you want to change the base?
Conversation
Awesome! Been hoping someone implements double QN since the paper came out. Thanks!
|
Thanks for the PR. I'm behind on reviewing, but I'm hoping to get caught up in late December / early January. It looks like the changes aren't very disruptive so there shouldn't be an issue merging. |
Excellent. I'm starting a test run on space invaders since it's one where they saw a big increase. I'll let you know how it goes in a couple of days |
Very nice! This is with the double Q-RL? It just switching the network I'm also impressed. My performance with deep_Q_RL never came close to their Best, On Mon, Nov 30, 2015 at 7:06 AM, Alejandro Dubrovsky <
|
@alito Thanks for the examination. Do you mind sharing the results.csv and perhaps the results.csv from any other Space Invaders models that you have trained? Also, here is a newer paper from DeepMind that claims better performance than Double DQN: http://arxiv.org/abs/1511.06581 Could be interesting to implement. |
Here is results.csv for this run (note the extra column in there): I don't seem to have, or at least kept, a recent results.csv. I've got a few from June that didn't learn at all, and a few from the NIPS era. I've put one up from May which seems to be the best I've got, but I don't think there's a good comparison. http://organicrobot.com/deepqrl/results-20150527.csv I'm running a plain version now, but it will take a while to see what's going on. Also, there's this: http://arxiv.org/abs/1511.05952 from last week, which, aside from doing better, it has the plot of epoch vs reward for all 57 games. From those, it seems like even their non-double Q implementation is very stable, or at least more stable than deep_q_rl seems to be at the moment. |
Minor change to update the citation for Double DQN.
Thanks Alejandro. I, for one am curious to see how this comparison shakes The Prioritized Replay paper that you mentioned has been sitting on my I have always suspected that they sample the games data in a more clever Best, On Tue, Dec 1, 2015 at 6:48 AM, Alejandro Dubrovsky <
|
The run without double-q hasn't finished, but it's not going to go anywhere from its current state. I've put the results up: It does better than I expected. Looks stable if nothing else. Double-Q looks like a substantial improvement in this case. @moscow25 they've released their code, so I suspect they are not cheating in any way they haven't mentioned. I haven't tested their code though, but it wouldn't be hard to find out if they aren't doing as well as they claimed on their papers. |
Awesome! I meant that tongue in cheek. Any yes, they released code, so it happened :-) Just saying that it's always hard to specify a tech system precisely, especially in 7 pages. And this presumes that people who wrote the system remember every decision explored and taken. Glad to see the double Q RL working so well. I kept starting ok but then diverging into NaN territory why I ran the (Lasagne version) on this when it came out. Seeing to converge more steady now is great. The idea from that paper is simple and glad it just works. Over-optimism is a huge problem for my high variance poker AI problems. So optimistic to try this version now. Thanks again for running the baseline. Best,
|
There seems to be a bug in your implementation: as far as I can see you are calculating maxaction based on q_vals (which contains the Q values for s_t and NOT s_{t+1}).
` |
Note by @stokasto sounds right. I'll do some testing |
I was interested in implementing Double DQN in this source code, so here are my changes. Feel free to pull these into the main codebase. I didn't change much, since the Double DQN algorithm is not much different from that described in the Nature paper. I couldn't get the original tests to pass, so I was not able to add a test for Double DQN. I did test everything though, by running experiments with Breakout. Here is the performance over time:
Of course, the differences here are negligible and Breakout was named in the Double DQN paper as not having a real change under Double DQN. If I had more computing resources, I could test on the games which Double DQN makes a significant difference. Here is perhaps a more useful plot that shows how Double DQN seems to reduce value overestimates:
And here is the change required for Double DQN:
If you don't have the time to look over the changes or to test them yourself, I understand. At least this PR will allow others to use it easily if need be.
References:
van Hasselt, H., Guez, A., & Silver, D. (2015). Deep Reinforcement Learning with Double Q-learning. arXiv preprint arXiv:1509.06461.