[RU] 12-2, 12-1, 12-3, 12 tranlsation, index fix, [EN] 12-3 fix (Atco…

…ld#703) * 12 week translation started * 12-1 init * 12-1 translation fixes * [RU] translation of 12-1.md * [RU] index fixed 15th week added * [RU] 12-2 translated * [RU] 12-3 translation * [EN] 12-3 fixes * [RU] config fixes Co-authored-by: Alfredo Canziani <[email protected]>
t46 · Dec 17, 2020 · bb77203 · bb77203
1 parent b6bce1a
commit bb77203
Show file tree

Hide file tree

Showing 7 changed files with 2,009 additions and 2 deletions.
diff --git a/docs/_config.yml b/docs/_config.yml
@@ -688,6 +688,11 @@ ru:
         - path: ru/week01/01-1.md
         - path: ru/week01/01-2.md
         - path: ru/week01/01-3.md
+    - path: ru/week12/12.md
+      sections:
+        - path: ru/week12/12-1.md
+        - path: ru/week12/12-2.md
+        - path: ru/week12/12-3.md
 
 ################################## Vietnamese ##################################
 vi:

diff --git a/docs/en/week12/12-3.md b/docs/en/week12/12-3.md
@@ -284,7 +284,7 @@ Throughout the training of a transformer, many hidden representations are genera
 
 We will now see the blocks of transformers discussed above in a far more understandable format, code!
 
-The first module we will look at the multi-headed attention block. Depenending on query, key, and values entered into this block, it can either be used for self or cross attention.
+The first module we will look at the multi-headed attention block. Depending on query, key, and values entered into this block, it can either be used for self or cross attention.
 
 
 ```python
@@ -392,7 +392,7 @@ Recall that self attention by itself does not have any recurrence or convolution
 
 $$
 \begin{aligned}
-E(p, 2)    &= \sin(p / 10000^{2i / d}) \\
+E(p, 2i)    &= \sin(p / 10000^{2i / d}) \\
 E(p, 2i+1) &= \cos(p / 10000^{2i / d})
 \end{aligned}
 $$

diff --git a/docs/ru/index.md b/docs/ru/index.md
@@ -325,6 +325,23 @@ lang: ru
         <a href="https://youtu.be/DL7iew823c0">🎥</a>
       </td>
     </tr>
+<!-- =============================== WEEK 15 =============================== -->
+    <tr>
+      <td rowspan="2" align="center"><a href="{{site.baseurl}}/ru/week15/15">⑮</a></td>
+      <td rowspan="2">Практикум</td>
+      <td><a href="{{site.baseurl}}/ru/week15/15-1">Вывод для энергетических моделей со скрытыми переменными</a></td>
+      <td rowspan="1">
+        <a href="https://github.com/Atcold/pytorch-Deep-Learning/blob/master/slides/12%20-%20EBM.pdf">🖥️</a>
+        <a href="https://youtu.be/sbhr2wjU1-I">🎥</a>
+      </td>
+    </tr>
+    <tr>
+      <td><a href="{{site.baseurl}}/ru/week15/15-2">Обучение энергетических моделей со скрытыми переменными</a></td>
+      <td rowspan="1">
+        <a href="https://github.com/Atcold/pytorch-Deep-Learning/blob/master/slides/12%20-%20EBM.pdf">🖥️</a>
+        <a href="https://youtu.be/XLSb1Cs1Jao">🎥</a>
+      </td>
+    </tr>
   </tbody>
 </table>
 

diff --git a/docs/ru/week12/12-1.md b/docs/ru/week12/12-1.md
diff --git a/docs/ru/week12/12-2.md b/docs/ru/week12/12-2.md
diff --git a/docs/ru/week12/12-3.md b/docs/ru/week12/12-3.md
diff --git a/docs/ru/week12/12.md b/docs/ru/week12/12.md
@@ -0,0 +1,30 @@
+---
+lang: ru
+lang-ref: ch.12
+title: Неделя 12
+translation-date: 01 Dec 2020
+translator: Evgeniy Pak
+---
+
+
+<!-- ## Lecture part A -->
+## Часть A лекции
+
+<!-- In this section we discuss the various architectures used in NLP applications, beginning with CNNs, RNNs, and eventually covering the state of-the art architecture, transformers. We then discuss the various modules that comprise transformers and how they make transformers advantageous for NLP tasks. Finally, we discuss tricks that allow transformers to be trained effectively. -->
+
+В этом разделе мы обсуждаем различные архитектуры, используемые в приложениях обработки естественного языка, начиная с CNNs, RNNs, и, в конечном итоге, рассматривая state-of-the-art архитектуру, трансформеры. Затем мы обсуждаем различные модули, которые включают трансформеры и то, как они дают преимущество трансформерам в задачах естественной обработки языка. В итоге мы обсудим приёмы, позволяющие эффективно обучать трансформеры.
+
+
+<!-- ## Lecture part B -->
+## Часть B лекции
+
+<!-- In this section we introduce beam search as a middle ground between greedy decoding and exhaustive search. We consider the case of wanting to sample from the generative distribution (*i.e.* when generating text) and introduce "top-k" sampling. Subsequently, we introduce sequence to sequence models (with a transformer variant) and backtranslation. We then introduce unsupervised learning approaches for learning embeddings and discuss word2vec, GPT, and BERT. -->
+
+В этом разделе мы  знакомим с лучевым поиском как золотой серединой между жадным декодированием и полным перебором. Мы рассматриваем случай, когда требуется выборка из порождающего распределения (*т.e.* при генерации текста) и вводим понятие "top-k" выборки. Затем мы знакомим с моделями sequence to sequence (в варианте трансформера) и обратным переводом. После рассматриваем подход обучения без учителя к  обучению характеристик и обсуждаем word2vec, GPT и BERT.
+
+<!-- ## Practicum -->
+## Практикум
+
+<!-- We introduce attention, focusing on self-attention and its hidden layer representations of the inputs. Then, we introduce the key-value store paradigm and discuss how to represent queries, keys, and values as rotations of an input. Finally, we use attention to interpret the transformer architecture, taking a forward pass through a basic transformer, and comparing the encoder-decoder paradigm to sequential architectures. -->
+
+Вводим понятие внимания, фокусируясь на self-attention и его представлениях входов на скрытом слое. Затем мы представляем парадигму хранилища ключ-значение и обсуждаем, как представить запросы, ключи и значения, как повороты входов. Наконец мы используем внимание для интерпретации архитектуры трансформер, взяв результат прямого прохода через базовый трансформер и сравнивая парадигму кодирования-декодирования с последовательной архитектурой.