- Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism. September 2019
- Memorizing Transformers. March 2022
- Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM. April 2021
- Scaling Language Models: Methods, Analysis & Insights from Training Gopher. December 2021
- LaMDA: Language Models for Dialog Applications. January 2022
- Improving language models by retrieving from trillions of tokens. December 2021
- [DALL-E] Zero-Shot Text-to-Image Generation. February 2021
- [Transformer] Attention Is All You Need. June 2017
- Scaling Laws for Neural Language Models. January 2020
- [CLIP] Learning Transferable Visual Models From Natural Language Supervision. February 2021
- RoBERTa: A Robustly Optimized BERT Pretraining Approach. July 2019
- Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model. January 2022
- Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers. September 2021
- [Chinchilla] Training Compute-Optimal Large Language Models. March 2022
- A data-driven approach for learning to control computers. February 2022. DeepMind
- Data Distributional Properties Drive Emergent Few-Shot Learning in Transformers. April 2022. Stanford, University College London, DeepMind
- CvT: Introducing Convolutions to Vision Transformers. March 2021. Microsoft, Microsoft Cloud + AI, McGill University
- [DETR] End-to-End Object Detection with Transformers. May 2020. Facebook
- [Vision Transformer] An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. October 2020. Google Research, Google, Google Brain
- It's Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners. September 2020. Sulzer, LMU Munich
- Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity. January 2021. Google
- Pathways: Asynchronous Distributed Dataflow for ML. March 2022. Google
- GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding. June 2020. Google
- Flamingo: a Visual Language Model for Few-Shot Learning. April 2022. DeepMind
- Carbon Emissions and Large Neural Network Training. April 2021. Google, Berkeley.
- [Meena] Towards a Human-like Open-Domain Chatbot. January 2020. Google, Google Brain, Google Research
- The Evolved Transformer. January 2019. Google, Google Research, Google Brain
- Decision Transformer: Reinforcement Learning via Sequence Modeling. June 2021. Google, Facebook, UC Berkeley, Google Brain
- Perceiver IO: A General Architecture for Structured Inputs & Outputs. July 2021. DeepMind
- An Evolutionary Approach to Dynamic Introduction of Tasks in Large-scale Multitask Learning Systems. May 2022. Google Research, Google
- FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness. May 2022. University at Buffalo, Stanford
- Linformer: Self-Attention with Linear Complexity. June 2020. Facebook
- A Simple Framework for Contrastive Learning of Visual Representations. February 2020. Google Brain, Google Research, Google
- Momentum Contrast for Unsupervised Visual Representation Learning. November 2019. Facebook
- Deep Double Descent: Where Bigger Models and More Data Hurt. December 2019. OpenAI, Harvard
- Extracting Training Data from Large Language Models. December 2020. Northeastern, Harvard, UC Berkeley, Google, Apple, OpenAI, Stanford
- An Empirical Model of Large-Batch Training. December 2018. OpenAI, Johns Hopkins. Blog
- LayoutLM: Pre-training of Text and Layout for Document Image Understanding. December 2019.
- Understanding the Capabilities, Limitations, and Societal Impact of Large Language Models. February 2021. Stanford, OpenAI
- Pretrained Transformers as Universal Computation Engines. March 2021. Facebook, Google Brain, UC Berkeley, Google
- PaLM: Scaling Language Modeling with Pathways. April 2022. Google, Google Research
- GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding. June 2020. Google
- Pathways: Asynchronous Distributed Dataflow for ML. March 2022. Google
- Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity. January 2021. Google
- Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. October 2019. Google
- An Evolutionary Approach to Dynamic Introduction of Tasks in Large-scale Multitask Learning Systems. May 2022. Google, Google Research