-
Notifications
You must be signed in to change notification settings - Fork 0
/
2016_dean_tensorflow_deeplearningschool.txt
128 lines (76 loc) · 2.7 KB
/
2016_dean_tensorflow_deeplearningschool.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
- can use Keras with tensorflow as backend
how stable is this?
- "inference requires less precision than training generally", so can use smaller floats?
they use 8-bits.
- tensor processing unit - specialized hardware
- https://arxiv.org/abs/1604.00981 revisiting synchronous vs async
found that sync is actually faster overall, < noise
##################
PRIMITIVES
------------------
Operations vs kernels
----------------
Operations are abstract operators. Kernels are concrete implementations.
- send and receive are nodes in the graph
Graph
----------------
- graph is built implicitly
- automated differentiation just adds nodes to the graph (similar to theano)
- serialized into graph.proto
- sent via grpc
Sessions and Distributing Computation
---------------
- a few processes in distributed setting
1. Client process
2. Master process
3. worker processes
Distributed Placing
--------------
- A variable can be pinned to a particular device https://www.tensorflow.org/versions/r0.10/how_tos/variables/index.html#device-placement
- There's a placement algorithm
Tools
-----------------
1. TensorBoard
Model structures that optimize for parallel computation
-----------------
- multiple "towers" that are independent (AlexNet - 2 towers)
- local reuse (conv nets)
- parts of models are only activated for some data/examples
Deep LSTM
-----------------
he has example code in slides
tensorflow queues
--------------------
https://www.tensorflow.org/versions/r0.10/how_tos/threading_and_queues/index.html
- uses:
1. input prefetching
2. grouping similar examples
3. randomization and shuffling
network optimization
--------------------------
- they cut off to 16 bits while transferring over network.
- they don't convert to iEEE 16 bit, they preserve exponent bits and cut off mantissa.
Quantization for inference
---------------------------
- quantize weights to 8 bits. (4 8 bit operations per cycle on mobile)
- Harder to do for training bec. the range for weights that you want will vary through the training process
##################################################################################################################
Google applications
#######################
Google Photos Search
--------------
cluster and search based on content of images
Same model repurposed for different purposes
#######################
Street View
----------------
- text detection "in the wild"
- google project sunroof
RankBrain
---------------
- 3rd most important search ranking signal
- traditionally search ranking team want very explainable models
Robotics
----------------
They don't use ROS, but tensorflow generates motor commands for the robotic software.
They also do this for simulated robots.