-
Notifications
You must be signed in to change notification settings - Fork 7
/
results.txt
71 lines (59 loc) · 1.91 KB
/
results.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
---
results
---
- this is a list of interesting observations made while training
---
sgd vs adam
---
- on a one room maze, sgd will fail to find the optimal policy whereas adam finds it quickly
---
small replay memory vs large replay memory
---
- on the small maze, if you try out a capacity=1000 vs capacity=100000 replay memory, it makes a huge difference
---
numeric vs one-hot state representation
---
- numeric is much worse than one-hot, one-hot is rote learning however
---
5 vs 10 size single room
---
- how much more difficult is the 10 vs 5?
---
10 size room vs 5 size, 2 room maze
---
- how much more difficult do walls make the task?
---
conv vs dense one-hot representations
---
- how does passing the one-hot input as an array to conv compare in performance to the flattened array, dense version?
---
number of hidden units
---
- what impact does number of hidden units have?
- seems that lower number of units tends to get stuck in local optima that is relatively poor compared to the larger networks (both should be more than sufficient to express value function?)
---
number of hidden layers
---
- fewer seems to work better for the large maze - likely due to poor learning setup for the larger networks
- possible that it just also takes less time
- batch norm might improve
- this is also rum on cpu on laptop so also likely due to lack of memories or something
---
prioritized experience replay vs none
---
- how much better does it do?
---
increasing batch size increases stability
---
- increasing the batch size improves learning
---
should you regularize q network?
---
- ? i'm not really sure at all
- seems that it does not make a huge difference
- you would think it might given that it could overfit the data in the experience replay
- used small value of 1e-4 seems to work well
---
clipping td error (i.e., the loss)
---
- does clipping the td error max the loss graphs look pretty / smooth?