Flappy Bird

My English and Coding ability is limited. If there is any mistakes, please correct me. I will try to make this project better.

Overview

In this project, I try using Reinforcement Learning to play a game, called "FLAPPY BIRD". yenchenlin had provided a version and some idea. BUT what I want to do is a little different (or say, I play it in a different way). I make the pipe move UP and DOWN each frame. If I can play the game under this situation, I estimate that it play the origin game very well.

Introduction

Game

"Flappy Bird" is a side-scrolling game in which the gameplay action is viewed from a side-view camera angle, and the onscreen characters can generally only move to the left or right. It is just like "Super Mario Bros". In the game, there is a flying bird, named "Faby", who moves continuously towards the right. The objective is to direct Faby between sets of Mario-like pipes. If Faby hits the pipe, then player lose. Faby briefly flaps upward each time that the player click left mouse button; if the buttuon is not clicked, Faby falls because of gravity. Each pair of pipes that he navigates between earns the player one point. Therefore, player should fly Faby as far as possible to get high score.

Reinforcement Learning

Charactoristics of Reinforcement Learning

There is no supervisor, only a reward signal.
Feedback is delayed, not instantaneous.
Time really matters.
Agent’s actions affect the subsequent data it receives.

Reward

A reward Rt is a scalar feedback signal, which indicates how well agent is doing at step t. The agent’s job is to maximise cumulative reward. Reinforcement learning is based on the reward hypothesis.

Agent and Environment

Agent is just like a human, it makes decision from the current environment and result. Then the action influences on environment, and the agent gets new environment and result. It learns these situations and gradually finds a good way to make decision and gets better result.

Interaction of Agent and Environment. At each step t: The agent: executes action At, receives observation Ot, receives scalar reward Rt. The environment: receives action At, emits observation Ot+1, emits scalar reward Rt+1. t increments at environment step.

Markov Decision Processes

Markov Property

A state St is Markov if and only if P[St+1 | St] = P[St+1 | S1,...,St]

It means that the future is independent of the past given the present.

Markov Process

A Markov process is a memoryless random process(a sequence of random states S1, S2, ... with the Markov property).

A Markov Process (or Markov Chain) is a tuple <S,P>, S is a (finite) set of states, P is a state transition probability matrix, Pss' = P[St+1=s' | St=s]

Markov Reward Process

Markov Reward Process is a tuple <S, P, R, γ>, S is a finite set of states, P is a state transition probability matrix, Pss' = P[St+1=s' | St=s], R is a reward function, Rs = E[Rt+1 | St = s], γ is a discount factor, γ ∈ [0, 1].

Return

The return Gt is the total discounted reward from time-step t. Gt = Rt+1+γRt+2+... = 􏰋Sigma(k=0->∞)[γkRt+k+1]

Reproduction

First, I'll reproduce what yenchenlin/DeepLearningFlappyBird did in a clearer and more flexible structure(learn from Morvan). Of course, the main idea and methods are not changed. In this procedure, I fix some bugs, record the score per episode using TensorBoard, visualize how images(game screen) processed and make it can run in the "dummy"(headless) mode(say, running without video devices).

Result

Nature dqn

loss

score

Mydqn

loss

score

Mydqn2

loss

score

How to Run?

git clone https://github.com/wenlisong/flappybird.git
cd flappybird
python main.py -g fb -n dqn

Installation Dependencies:

python 3.6
tensorflow 1.8.0
opencv-python
pygame

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
assets		assets
game		game
images		images
.gitignore		.gitignore
Agent.py		Agent.py
README.md		README.md
deep_q_network.py		deep_q_network.py
double_dqn.py		double_dqn.py
dqn.py		dqn.py
dueling_dqn.py		dueling_dqn.py
main.py		main.py
mydqn.py		mydqn.py
mydqn2.py		mydqn2.py
pos_prio_dqn.py		pos_prio_dqn.py
prio_dqn.py		prio_dqn.py
run_dqn.py		run_dqn.py
test_fb.py		test_fb.py
train_fb.py		train_fb.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Flappy Bird

Overview

Introduction

Game

Reinforcement Learning

Charactoristics of Reinforcement Learning

Reward

Agent and Environment

Markov Decision Processes

Markov Property

Markov Process

Markov Reward Process

Return

Reproduction

Result

Nature dqn

Mydqn

Mydqn2

How to Run?

Installation Dependencies:

About

Releases

Packages

Languages

wenlisong/flappybird

Folders and files

Latest commit

History

Repository files navigation

Flappy Bird

Overview

Introduction

Game

Reinforcement Learning

Charactoristics of Reinforcement Learning

Reward

Agent and Environment

Markov Decision Processes

Markov Property

Markov Process

Markov Reward Process

Return

Reproduction

Result

Nature dqn

Mydqn

Mydqn2

How to Run?

Installation Dependencies:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages