Skip to content

Commit

Permalink
[RU] 12-2, 12-1, 12-3, 12 tranlsation, index fix, [EN] 12-3 fix (Atco…
Browse files Browse the repository at this point in the history
…ld#703)

* 12 week translation started

* 12-1 init

* 12-1 translation fixes

* [RU] translation of 12-1.md

* [RU] index fixed 15th week added

* [RU] 12-2 translated

* [RU] 12-3 translation

* [EN] 12-3 fixes

* [RU] config fixes

Co-authored-by: Alfredo Canziani <[email protected]>
  • Loading branch information
2 people authored and t46 committed Dec 17, 2020
1 parent b6bce1a commit bb77203
Show file tree
Hide file tree
Showing 7 changed files with 2,009 additions and 2 deletions.
5 changes: 5 additions & 0 deletions docs/_config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -688,6 +688,11 @@ ru:
- path: ru/week01/01-1.md
- path: ru/week01/01-2.md
- path: ru/week01/01-3.md
- path: ru/week12/12.md
sections:
- path: ru/week12/12-1.md
- path: ru/week12/12-2.md
- path: ru/week12/12-3.md

################################## Vietnamese ##################################
vi:
Expand Down
4 changes: 2 additions & 2 deletions docs/en/week12/12-3.md
Original file line number Diff line number Diff line change
Expand Up @@ -284,7 +284,7 @@ Throughout the training of a transformer, many hidden representations are genera

We will now see the blocks of transformers discussed above in a far more understandable format, code!

The first module we will look at the multi-headed attention block. Depenending on query, key, and values entered into this block, it can either be used for self or cross attention.
The first module we will look at the multi-headed attention block. Depending on query, key, and values entered into this block, it can either be used for self or cross attention.


```python
Expand Down Expand Up @@ -392,7 +392,7 @@ Recall that self attention by itself does not have any recurrence or convolution

$$
\begin{aligned}
E(p, 2) &= \sin(p / 10000^{2i / d}) \\
E(p, 2i) &= \sin(p / 10000^{2i / d}) \\
E(p, 2i+1) &= \cos(p / 10000^{2i / d})
\end{aligned}
$$
Expand Down
17 changes: 17 additions & 0 deletions docs/ru/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -325,6 +325,23 @@ lang: ru
<a href="https://youtu.be/DL7iew823c0">🎥</a>
</td>
</tr>
<!-- =============================== WEEK 15 =============================== -->
<tr>
<td rowspan="2" align="center"><a href="{{site.baseurl}}/ru/week15/15">⑮</a></td>
<td rowspan="2">Практикум</td>
<td><a href="{{site.baseurl}}/ru/week15/15-1">Вывод для энергетических моделей со скрытыми переменными</a></td>
<td rowspan="1">
<a href="https://github.com/Atcold/pytorch-Deep-Learning/blob/master/slides/12%20-%20EBM.pdf">🖥️</a>
<a href="https://youtu.be/sbhr2wjU1-I">🎥</a>
</td>
</tr>
<tr>
<td><a href="{{site.baseurl}}/ru/week15/15-2">Обучение энергетических моделей со скрытыми переменными</a></td>
<td rowspan="1">
<a href="https://github.com/Atcold/pytorch-Deep-Learning/blob/master/slides/12%20-%20EBM.pdf">🖥️</a>
<a href="https://youtu.be/XLSb1Cs1Jao">🎥</a>
</td>
</tr>
</tbody>
</table>

Expand Down
431 changes: 431 additions & 0 deletions docs/ru/week12/12-1.md

Large diffs are not rendered by default.

638 changes: 638 additions & 0 deletions docs/ru/week12/12-2.md

Large diffs are not rendered by default.

886 changes: 886 additions & 0 deletions docs/ru/week12/12-3.md

Large diffs are not rendered by default.

30 changes: 30 additions & 0 deletions docs/ru/week12/12.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
---
lang: ru
lang-ref: ch.12
title: Неделя 12
translation-date: 01 Dec 2020
translator: Evgeniy Pak
---


<!-- ## Lecture part A -->
## Часть A лекции

<!-- In this section we discuss the various architectures used in NLP applications, beginning with CNNs, RNNs, and eventually covering the state of-the art architecture, transformers. We then discuss the various modules that comprise transformers and how they make transformers advantageous for NLP tasks. Finally, we discuss tricks that allow transformers to be trained effectively. -->

В этом разделе мы обсуждаем различные архитектуры, используемые в приложениях обработки естественного языка, начиная с CNNs, RNNs, и, в конечном итоге, рассматривая state-of-the-art архитектуру, трансформеры. Затем мы обсуждаем различные модули, которые включают трансформеры и то, как они дают преимущество трансформерам в задачах естественной обработки языка. В итоге мы обсудим приёмы, позволяющие эффективно обучать трансформеры.


<!-- ## Lecture part B -->
## Часть B лекции

<!-- In this section we introduce beam search as a middle ground between greedy decoding and exhaustive search. We consider the case of wanting to sample from the generative distribution (*i.e.* when generating text) and introduce "top-k" sampling. Subsequently, we introduce sequence to sequence models (with a transformer variant) and backtranslation. We then introduce unsupervised learning approaches for learning embeddings and discuss word2vec, GPT, and BERT. -->

В этом разделе мы знакомим с лучевым поиском как золотой серединой между жадным декодированием и полным перебором. Мы рассматриваем случай, когда требуется выборка из порождающего распределения (*т.e.* при генерации текста) и вводим понятие "top-k" выборки. Затем мы знакомим с моделями sequence to sequence (в варианте трансформера) и обратным переводом. После рассматриваем подход обучения без учителя к обучению характеристик и обсуждаем word2vec, GPT и BERT.

<!-- ## Practicum -->
## Практикум

<!-- We introduce attention, focusing on self-attention and its hidden layer representations of the inputs. Then, we introduce the key-value store paradigm and discuss how to represent queries, keys, and values as rotations of an input. Finally, we use attention to interpret the transformer architecture, taking a forward pass through a basic transformer, and comparing the encoder-decoder paradigm to sequential architectures. -->

Вводим понятие внимания, фокусируясь на self-attention и его представлениях входов на скрытом слое. Затем мы представляем парадигму хранилища ключ-значение и обсуждаем, как представить запросы, ключи и значения, как повороты входов. Наконец мы используем внимание для интерпретации архитектуры трансформер, взяв результат прямого прохода через базовый трансформер и сравнивая парадигму кодирования-декодирования с последовательной архитектурой.

0 comments on commit bb77203

Please sign in to comment.