-
Notifications
You must be signed in to change notification settings - Fork 97
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
b4abb67
commit d1750b4
Showing
85 changed files
with
2,584 additions
and
2,558 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,7 +1,23 @@ | ||
--- | ||
title: Page Not Found | ||
number-sections: false | ||
title: 👓 Eureka! | ||
toc: false | ||
heading: false | ||
--- | ||
|
||
The page you requested cannot be found (perhaps it was moved or renamed). | ||
You've just discovered an uncharted territory on [💎fmin.xyz](/index.md)! | ||
|
||
You may want to try searching to find the page's new location | ||
It seems you've tried to access a page that's as elusive as a global minimum in a non-convex optimization problem. 😄 | ||
|
||
:::{.plotly} | ||
docs/theory/dual_balls.html | ||
::: | ||
|
||
But fear not, intrepid explorer! Here are some tools to navigate back to familiar grounds: | ||
|
||
* [💎fmin.xyz home page](/index.md): Like restarting your gradient descent, head back to start. | ||
* 👆 Search with Precision: Use our search engine, more reliable than the Newton Method with far starting point. | ||
|
||
Keep Calm and Optimize On! | ||
|
||
Who knew a 404 error could be an opportunity for an adventure in learning? Happy exploring and may your journey be gradient-vanishing-free! 🚀 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3,6 +3,7 @@ project: | |
render: | ||
- /docs/**/*.md | ||
- index.md | ||
- 404.md | ||
output-dir: _site | ||
resources: | ||
- "docs/**/*.mp4" | ||
|
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,42 +1,42 @@ | ||
--- | ||
title: Deep learning | ||
--- | ||
|
||
# Problem | ||
|
||
![Illustration](dl.png) | ||
|
||
A lot of practical tasks nowadays are being solved using the deep learning approach, which is usually implies finding local minimum of a non-convex function, that generalizes well (enough 😉). The goal of this short text is to show you the importance of the optimization behind neural network training. | ||
|
||
## Cross entropy | ||
One of the most commonly used loss functions in classification tasks is the normalized categorical cross-entropy in $K$ class problem: | ||
|
||
$$ | ||
L(\theta) = - \dfrac{1}{n}\sum_{i=1}^n (y_i^\top\log(h_\theta(x_i)) + (1 - y_i)^\top\log(1 - h_\theta(x_i))), \qquad h_\theta^k(x_i) = \dfrac{e^{\theta_k^\top x_i}}{\sum_{j = 1}^K e^{\theta_j^\top x_i}} | ||
$$ | ||
|
||
Since in Deep Learning tasks the number of points in a dataset could be really huge, we usually use {%include link.html title='Stochastic gradient descent'%} based approaches as a workhorse. | ||
|
||
In such algorithms one uses the estimation of a gradient at each step instead of the full gradient vector, for example, in cross-entropy we have: | ||
|
||
$$ | ||
\nabla_\theta L(\theta) = \dfrac{1}{n} \sum\limits_{i=1}^n \left( h_\theta(x_i) - y_i \right) x_i^\top | ||
$$ | ||
|
||
The simplest approximation is statistically judged unbiased estimation of a gradient: | ||
|
||
$$ | ||
g(\theta) = \dfrac{1}{b} \sum\limits_{i=1}^b \left( h_\theta(x_i) - y_i \right) x_i^\top\approx \nabla_\theta L(\theta) | ||
$$ | ||
|
||
where we initially sample randomly only $b \ll n$ points and calculate sample average. It can be also considered as a noisy version of the full gradient approach. | ||
|
||
![Illustration](MLP_optims.svg) | ||
|
||
|
||
# Code | ||
[Open In Colab](https://colab.research.google.com/github/MerkulovDaniil/optim/blob/master/assets/Notebooks/Deep%20learning.ipynb){: .btn } | ||
|
||
# References | ||
* [Optimization for Deep Learning Highlights in 2017](http://ruder.io/deep-learning-optimization-2017/) | ||
* [An overview of gradient descent optimization algorithms](http://ruder.io/optimizing-gradient-descent/) | ||
--- | ||
title: Deep learning | ||
--- | ||
|
||
## Problem | ||
|
||
![Illustration](dl.png) | ||
|
||
A lot of practical tasks nowadays are being solved using the deep learning approach, which is usually implies finding local minimum of a non-convex function, that generalizes well (enough 😉). The goal of this short text is to show you the importance of the optimization behind neural network training. | ||
|
||
### Cross entropy | ||
One of the most commonly used loss functions in classification tasks is the normalized categorical cross-entropy in $K$ class problem: | ||
|
||
$$ | ||
L(\theta) = - \dfrac{1}{n}\sum_{i=1}^n (y_i^\top\log(h_\theta(x_i)) + (1 - y_i)^\top\log(1 - h_\theta(x_i))), \qquad h_\theta^k(x_i) = \dfrac{e^{\theta_k^\top x_i}}{\sum_{j = 1}^K e^{\theta_j^\top x_i}} | ||
$$ | ||
|
||
Since in Deep Learning tasks the number of points in a dataset could be really huge, we usually use {%include link.html title='Stochastic gradient descent'%} based approaches as a workhorse. | ||
|
||
In such algorithms one uses the estimation of a gradient at each step instead of the full gradient vector, for example, in cross-entropy we have: | ||
|
||
$$ | ||
\nabla_\theta L(\theta) = \dfrac{1}{n} \sum\limits_{i=1}^n \left( h_\theta(x_i) - y_i \right) x_i^\top | ||
$$ | ||
|
||
The simplest approximation is statistically judged unbiased estimation of a gradient: | ||
|
||
$$ | ||
g(\theta) = \dfrac{1}{b} \sum\limits_{i=1}^b \left( h_\theta(x_i) - y_i \right) x_i^\top\approx \nabla_\theta L(\theta) | ||
$$ | ||
|
||
where we initially sample randomly only $b \ll n$ points and calculate sample average. It can be also considered as a noisy version of the full gradient approach. | ||
|
||
![Illustration](MLP_optims.svg) | ||
|
||
|
||
## Code | ||
[Open In Colab](https://colab.research.google.com/github/MerkulovDaniil/optim/blob/master/assets/Notebooks/Deep%20learning.ipynb){: .btn } | ||
|
||
## References | ||
* [Optimization for Deep Learning Highlights in 2017](http://ruder.io/deep-learning-optimization-2017/) | ||
* [An overview of gradient descent optimization algorithms](http://ruder.io/optimizing-gradient-descent/) |
Oops, something went wrong.