Reading List

Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism. September 2019
Memorizing Transformers. March 2022
Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM. April 2021
Scaling Language Models: Methods, Analysis & Insights from Training Gopher. December 2021
LaMDA: Language Models for Dialog Applications. January 2022
Improving language models by retrieving from trillions of tokens. December 2021
[DALL-E] Zero-Shot Text-to-Image Generation. February 2021
[Transformer] Attention Is All You Need. June 2017
Scaling Laws for Neural Language Models. January 2020
[CLIP] Learning Transferable Visual Models From Natural Language Supervision. February 2021
RoBERTa: A Robustly Optimized BERT Pretraining Approach. July 2019
Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model. January 2022
Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers. September 2021
[Chinchilla] Training Compute-Optimal Large Language Models. March 2022
A data-driven approach for learning to control computers. February 2022. DeepMind
Data Distributional Properties Drive Emergent Few-Shot Learning in Transformers. April 2022. Stanford, University College London, DeepMind
CvT: Introducing Convolutions to Vision Transformers. March 2021. Microsoft, Microsoft Cloud + AI, McGill University
[DETR] End-to-End Object Detection with Transformers. May 2020. Facebook
[Vision Transformer] An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. October 2020. Google Research, Google, Google Brain
It's Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners. September 2020. Sulzer, LMU Munich
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity. January 2021. Google
Pathways: Asynchronous Distributed Dataflow for ML. March 2022. Google
GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding. June 2020. Google
Flamingo: a Visual Language Model for Few-Shot Learning. April 2022. DeepMind
Carbon Emissions and Large Neural Network Training. April 2021. Google, Berkeley.
[Meena] Towards a Human-like Open-Domain Chatbot. January 2020. Google, Google Brain, Google Research
The Evolved Transformer. January 2019. Google, Google Research, Google Brain
Decision Transformer: Reinforcement Learning via Sequence Modeling. June 2021. Google, Facebook, UC Berkeley, Google Brain
Perceiver IO: A General Architecture for Structured Inputs & Outputs. July 2021. DeepMind
An Evolutionary Approach to Dynamic Introduction of Tasks in Large-scale Multitask Learning Systems. May 2022. Google Research, Google
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness. May 2022. University at Buffalo, Stanford
Linformer: Self-Attention with Linear Complexity. June 2020. Facebook
A Simple Framework for Contrastive Learning of Visual Representations. February 2020. Google Brain, Google Research, Google
Momentum Contrast for Unsupervised Visual Representation Learning. November 2019. Facebook
Deep Double Descent: Where Bigger Models and More Data Hurt. December 2019. OpenAI, Harvard
Extracting Training Data from Large Language Models. December 2020. Northeastern, Harvard, UC Berkeley, Google, Apple, OpenAI, Stanford
An Empirical Model of Large-Batch Training. December 2018. OpenAI, Johns Hopkins. Blog
LayoutLM: Pre-training of Text and Layout for Document Image Understanding. December 2019.
Understanding the Capabilities, Limitations, and Societal Impact of Large Language Models. February 2021. Stanford, OpenAI
Pretrained Transformers as Universal Computation Engines. March 2021. Facebook, Google Brain, UC Berkeley, Google
PaLM: Scaling Language Modeling with Pathways. April 2022. Google, Google Research
GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding. June 2020. Google
Pathways: Asynchronous Distributed Dataflow for ML. March 2022. Google
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity. January 2021. Google
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. October 2019. Google
An Evolutionary Approach to Dynamic Introduction of Tasks in Large-scale Multitask Learning Systems. May 2022. Google, Google Research

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reading_list.md

reading_list.md

Reading List

Files

reading_list.md

Latest commit

History

reading_list.md

File metadata and controls

Reading List