-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reading - Deep Learning without Poor Local Minima #28
Comments
History of Deep Learning Training
so training Deep NN was classified as an intractable problem at least with the technology available at that time
It means there is no actual need to find the global minimum to make the DNN work, a local minimum is typically good enough for most of the practical purposes This is anyway an empirical evidence: practically, when we train a DNN and it works well of course we can’t assume we have found the actual global minimum (as it is NP Complete) so we conclude we have found a local minimum and that’s good. Classification of critical pointsSo to summarise, this means in a DNN loss landscape we have the following categories of critical points
The loss landscape properties - Paper contributionThis paper, is focused on proving properties of the loss landscape and more specifically about the minima, starting from assumptions In this work, the assumptions are
and one of the core results is there are no suboptimal local minima, in fact quoting the paper
NOTE
So here is a theoretical explanation about why training a DNN with a squared loss function is not hard after all: global minima are abundant. The NP-completeness regards finding one specific global minimum among all the possible one, which is certainly not practically relevant. The paper sketches the proof in 4.3.1 section, following a case a by case approach and showing that every time a point satisfies the local minimum definition then, in that model context, it also satisfies the global minimum definition Unerstanding the loss functionUnderstanding the Loss Function Landscape is important to design fast and efficient optimization methods to navigate this landscape and find good local minima Good MinimaIn the context of Deep Learning, it is important to specify that a minimum is good when it allows the DNN to generalize beyond the training set (it is all about generalization) Generalization can not be checked during training as by definition it involves data which is not in the training set, so it can only be checked ex post training on the test set. Anyway this kind of approach apparently seems to work well so a quite surprising empirical evidence is that it is not only the global minimum (and local minima close enough) which is good but there is plenty of local minima in the landscape which allow the NN work well enough, so to generalize well. Loss Function - Source of HardnessHow to approach the theoretical study of DNN, in order of growing complexity
Challenges in the Navigation of the Loss LandscapeThe complexity of this optimization process is the loss function landscape is highly non convex and while there are plenty of good local minima, so weights configs which make the DNN generalize well (it means there are quite a lot of “low hanging fruits” which is one key thing explaining the success of the first generation of DNNs: they are quite easy to train, with the current technology so GPUs, memory, ...) there are also saddle points making the navigation harder, especially “bad saddle points” which are the ones where the curvature is positive (hence representing basins) |
Overview
Reading Deep Learning without Poor Local Minima : a paper which achieved very interesting theoretical results on DNN back in 2016
The abstract is very interesting
NOTE
For the best rendering please install and activate Tex all the things - Chrome Plugin which provides browser side math rendering
If it is active you should see the following inline math$a=b$ and math equation
correctly
The text was updated successfully, but these errors were encountered: