Skip to content

Commit

Permalink
image test3 240910
Browse files Browse the repository at this point in the history
  • Loading branch information
SeokHwanHong committed Sep 10, 2024
1 parent 11f5fd3 commit 81a29e3
Showing 1 changed file with 4 additions and 4 deletions.
8 changes: 4 additions & 4 deletions _posts/2024-02-22-Attention is all you need copy.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ sequential 계산을 줄이기 위해 CNN 기반 모델들(Extended Neural GPU,
#### - Attention


<p align = "center"><img src = "./images/attentionisallyouneed/selfattention.jpg">
<p align = "center"><img src = "images/attentionisallyouneed/selfattention.jpg">

Attention Mechanism은 다양한 작업에서 강력한 sequence modeling 및 transductive model의 필수적인 부분이 되었으며 input과 output sequence에서 거리와 관계없이 의존성을 모델링 가능. 본 논문에서 순환과정(recurrence)을 피하는 대신 input과 output 사이의 global dependency를 찾는 attention mechanism만 사용. 또한 Transformer 구조는 더 많은 병렬처리가 가능해 최고 수준까지도 도달.

Expand All @@ -52,14 +52,14 @@ Query, Key, Value 의 시작값이 동일. 자기 자신과의 내적을 통해

- overall architecture

<p align = "center"><img src = "./images/attentionisallyouneed/model architecture.jpg">
<p align = "center"><img src = "images/attentionisallyouneed/model architecture.jpg">


## 3.1. Attention

- Scaled Dot-Product Attention

<p align = "center"><img src = "./images/attentionisallyouneed/sdpa-1.jpg">
<p align = "center"><img src = "images/attentionisallyouneed/sdpa-1.jpg">

$Attention(Q,K,V) = softmax({Q{K^{T}}/\sqrt{d_v}}) * V$

Expand All @@ -77,7 +77,7 @@ input : queries and keys of dimensions $d_{k}$ (= $d_{q}$), values of $d_{v}$

- Multi-Head Attention

<p align = "center"><img src = "./images/attentionisallyouneed/mha-1.jpg">
<p align = "center"><img src = "images/attentionisallyouneed/mha-1.jpg">

## 3.1. Encoder & Decoder Stacks

Expand Down

0 comments on commit 81a29e3

Please sign in to comment.