Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RU] 12-2, 12-1, 12-3, 12 tranlsation, index fix, [EN] 12-3 fix #703

Merged
merged 10 commits into from
Dec 11, 2020
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions docs/_config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -640,3 +640,8 @@ ru:
- path: ru/week01/01-1.md
- path: ru/week01/01-2.md
- path: ru/week01/01-3.md
- path: ru/week12/12.md
sections:
- path: ru/week12/12-1.md
- path: ru/week12/12-2.md
- path: ru/week12/12-3.md
4 changes: 2 additions & 2 deletions docs/en/week12/12-3.md
Original file line number Diff line number Diff line change
Expand Up @@ -284,7 +284,7 @@ Throughout the training of a transformer, many hidden representations are genera

We will now see the blocks of transformers discussed above in a far more understandable format, code!

The first module we will look at the multi-headed attention block. Depenending on query, key, and values entered into this block, it can either be used for self or cross attention.
The first module we will look at the multi-headed attention block. Depending on query, key, and values entered into this block, it can either be used for self or cross attention.


```python
Expand Down Expand Up @@ -392,7 +392,7 @@ Recall that self attention by itself does not have any recurrence or convolution

$$
\begin{aligned}
E(p, 2) &= \sin(p / 10000^{2i / d}) \\
E(p, 2i) &= \sin(p / 10000^{2i / d}) \\
E(p, 2i+1) &= \cos(p / 10000^{2i / d})
\end{aligned}
$$
Expand Down
17 changes: 17 additions & 0 deletions docs/ru/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -325,6 +325,23 @@ lang: ru
<a href="https://youtu.be/DL7iew823c0">🎥</a>
</td>
</tr>
<!-- =============================== WEEK 15 =============================== -->
<tr>
<td rowspan="2" align="center"><a href="{{site.baseurl}}/ru/week15/15">⑮</a></td>
<td rowspan="2">Практикум</td>
<td><a href="{{site.baseurl}}/ru/week15/15-1">Вывод для энергетических моделей со скрытыми переменными</a></td>
<td rowspan="1">
<a href="https://github.com/Atcold/pytorch-Deep-Learning/blob/master/slides/12%20-%20EBM.pdf">🖥️</a>
<a href="https://youtu.be/sbhr2wjU1-I">🎥</a>
</td>
</tr>
<tr>
<td><a href="{{site.baseurl}}/ru/week15/15-2">Обучение энергетических моделей со скрытыми переменными</a></td>
<td rowspan="1">
<a href="https://github.com/Atcold/pytorch-Deep-Learning/blob/master/slides/12%20-%20EBM.pdf">🖥️</a>
<a href="https://youtu.be/XLSb1Cs1Jao">🎥</a>
</td>
</tr>
</tbody>
</table>

Expand Down
431 changes: 431 additions & 0 deletions docs/ru/week12/12-1.md

Large diffs are not rendered by default.

638 changes: 638 additions & 0 deletions docs/ru/week12/12-2.md

Large diffs are not rendered by default.

886 changes: 886 additions & 0 deletions docs/ru/week12/12-3.md

Large diffs are not rendered by default.

30 changes: 30 additions & 0 deletions docs/ru/week12/12.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
---
lang: ru
lang-ref: ch.12
title: Неделя 12
translation-date: 01 Dec 2020
translator: Evgeniy Pak
---


<!-- ## Lecture part A -->
## Часть A лекции

<!-- In this section we discuss the various architectures used in NLP applications, beginning with CNNs, RNNs, and eventually covering the state of-the art architecture, transformers. We then discuss the various modules that comprise transformers and how they make transformers advantageous for NLP tasks. Finally, we discuss tricks that allow transformers to be trained effectively. -->

В этом разделе мы обсуждаем различные архитектуры, используемые в приложениях обработки естественного языка, начиная с CNNs, RNNs, и, в конечном итоге, рассматривая state-of-the-art архитектуру, трансформеры. Затем мы обсуждаем различные модули, которые включают трансформеры и то, как они дают преимущество трансформерам в задачах естественной обработки языка. В итоге мы обсудим приёмы, позволяющие эффективно обучать трансформеры.


<!-- ## Lecture part B -->
## Часть B лекции

<!-- In this section we introduce beam search as a middle ground between greedy decoding and exhaustive search. We consider the case of wanting to sample from the generative distribution (*i.e.* when generating text) and introduce "top-k" sampling. Subsequently, we introduce sequence to sequence models (with a transformer variant) and backtranslation. We then introduce unsupervised learning approaches for learning embeddings and discuss word2vec, GPT, and BERT. -->

В этом разделе мы знакомим с лучевым поиском как золотой серединой между жадным декодированием и полным перебором. Мы рассматриваем случай, когда требуется выборка из порождающего распределения (*т.e.* при генерации текста) и вводим понятие "top-k" выборки. Затем мы знакомим с моделями sequence to sequence (в варианте трансформера) и обратным переводом. После рассматриваем подход обучения без учителя к обучению характеристик и обсуждаем word2vec, GPT и BERT.

<!-- ## Practicum -->
## Практикум

<!-- We introduce attention, focusing on self-attention and its hidden layer representations of the inputs. Then, we introduce the key-value store paradigm and discuss how to represent queries, keys, and values as rotations of an input. Finally, we use attention to interpret the transformer architecture, taking a forward pass through a basic transformer, and comparing the encoder-decoder paradigm to sequential architectures. -->

Вводим понятие внимания, фокусируясь на self-attention и его представлениях входов на скрытом слое. Затем мы представляем парадигму хранилища ключ-значение и обсуждаем, как представить запросы, ключи и значения, как повороты входов. Наконец мы используем внимание для интерпретации архитектуры трансформер, взяв результат прямого прохода через базовый трансформер и сравнивая парадигму кодирования-декодирования с последовательной архитектурой.