Learning how to make a Reinforcement Learning algorithm
We create a population of X individuals with X wallets.
For start each wallets has 1 USD
inside it and 0 BTC tokens
.
We create a neural network with given layers. Each wieghts of the nodes are between [-1, 1].
A prediction method will get what action to make depending on the NN and the inputs.
It will update the score of the wallet (0.25% fees at each BUT/SELL action).
Inputs are a line in a BTC history CSV file. Mostly containing: high
, low
, open
, close
, weight
, volume
. Timestamps should never be feeded to the NN, else it will learn over the date instead of the patterns of candles.
The signal score is the money made at the end of the period in the CSV file.
- Crossover
- Selection: Get best individuals regarding score.
- Save best individuals (This can cause premature convergence over the long term. Because we can keep the best individual for ever)
- Fill rest of population:
- Fully new random
- Random propability to take Father's or Mother's neuron for child1 and child2
-
Average neuronsSeems to have a bad convergence property -
Random between father and mother neuronsSeems to have a bad convergence property - Random crossover method at each epochs?
- Mutation
- Random propability to mutate neurons in the NN.
- Dynamic mutation rate/probabilty depending epochs? (Yes but poorly implemented)
- Random mutation method for each individuals?
- Random propability to mutate neurons in the NN.
- Profit?
This branch uses the high
, low
, open
, close
, weight
, volume
and last_action
to predict the next action. This method seems to be quite effective, BUT I'm sure it won't be effective on new data.
The problem with the real values is that what happends if we break the max value seen, the lower value seen? The data is only between 200$ (2015) up to 19000$ (2017) [2015-2019]. Thus, how would react the Neural Network with unbounded values? How should I generate them in real time? Would I have to retrain a model each time we hit a new max/min ? Would this be efficient?
Even if we never hit new max/min, how would I efficiently append new data to my program? The standardization won't be anymore centered on 0. That could make all the learning useless.
So good results for training data, less sure about real time values. I guess I overfit the curve. Maybe the I need to had some more history feeded to the NN.
In this example, we started with 1 USD and gained 93262941 USD from 2015 to mid 2019
This is the structure of one Neural Network trained by reinforcement. Weights were initialy set to random between [-1, 1] and modifiy over 5000 epochs of selection/crossover/mutation.
This branch should not be the master branch, but the Xhistory
branch. The noHistory
branch should been a specific case of Xhistory
with X = 0
, but I can't manage to get the same results.
Here we feed the NN with X
lines of data from the CSV file. The goal here is to learn on patterns on multiples candle patterns. But I cannot find any good structure of the NN yet. The NN must be bigger, thus longer to train, longer to predict, longer to make an action, thus potential money loss.
I wanted to try with different inputs. Instead of working with normalize and standardize values, I wanted to work with the color of the candles.
I can make an abstraction on value of the currency, playing with color of the candles could be a good way to generate safe revenue. I'm aware this is not optimized, but maybe it will be find a more general rule on when to BUY/SELL/HOLD.
What I hope, is the NN to learn patterns with the history of candle colors.
For example, if each candle is one day, what would be our reaction if we seen 6 red candles? Personnaly I would buy, since Crypto is a very speculative currency. And in my opinion, there will always be someone to try to make money over crashes.
This reflection is heavily biais by my way of thinking of the market. In theory, it should learn it self the patterns with basic data like: high
, low
, open
, close
, weight
, volume
.
This is a self learning project.
In no event I shall be liable for any special, indirect, or consequential damages, or any damages whatsoever resulting from loss of money/cryptoccurency/stocks or profits if you base your decision on this learning project. The aim here is to understand how to build a Reinforcement Learning algoritm with Neurral Networks, not to earn money. Even if training result looks very appealing, this program will NEVER predict the future. At any instant you can lose all your money, invest what you can afford to lose. If you are a game money adict, please consult a doctor and do not hope making money with this project.