Skip to content

Latest commit

 

History

History
88 lines (64 loc) · 2.04 KB

Week 4 : Transfer Learning and Transformers.md

File metadata and controls

88 lines (64 loc) · 2.04 KB

Week 4 : Transfer Learning and Transformers

1. Transfer Learning in Computer Vision

Background

  • ResNet-50이 좋은 성능을 보인다 but 신경망이 너무 크고 과적합 우려
  • Solution : ImageNet의 데이터(1만장의 image)를 NN으로 학습 -> fine-tuning

Transfer Learning (전이학습)

  • ImageNet으로 모델 학습 -> 마지막 layer를 추가하거나 대체
  • 더 적은 데이터로 빠르고 정확하게 학습이 가능하다
  • Tensorflow, Pytorch


2. Embeddings and Language Models

  • Input data

    • in NLP : sequence of words
    • in Deep Learning : vectors
  • word를 vector로 변환하려면?

    • one-hot encoding

One-hot encoding

  • embedding : 문자를 기계가 이해할 수 있는 숫자로 바꾼 결과
  • Solution 1 : Learn as part of the task
  • Solution 2 : Learn a Language Model


3. "NLP's ImageNet Moment" : ELMO/ULMFit

Beyond Embeddings

  • Word2Vec and GloVe embeddings became popular in ~2013-14
  • But these representations are shallow:
    • only first layer would have benefit of seeing all of Wikipedia
    • rest of the model -- LSTMs, etc -- would be trained only on the task dataset (much smaller)

  • Elmo (2018)
    • Bidirectional stacked LSTM
    • SQuAD dataset
    • SNLI dataset
    • GLUE dataset
  • ULMFit
    • similar to ELMO


4. Transformers


Attention in detail

Basic self-attention

  • Not a learned weights, but a function of x_i, x_j

Attention Function

  • Query, Key, Value

Transformer

  • self-attention layer - Layer normalization - Dense layer

BERT, GPT-2, DistillBERT, T5

GPT/GPT-2

  • Generative Pre-trained Transformer

BERT

  • Bidirectional Encoder Representation from Transformers

T5

  • Text-to-Text Transfer Transformer

GPT-3

DistillBERT

  • a smaller model is trained to reproduce the output of a larger model


Reference