-
Notifications
You must be signed in to change notification settings - Fork 3
Home
(Image credit: Steve Johnson, Unsplash.com)
Generative AI models use deep learning to create new data that's similar to the original. They learn from basic things like text or artwork. Variational autoencoders were some of the first models to use this ability for images and voice.
Generative models have long been used in statistics for numerical data analysis. With the advent of deep learning, these models have been extended to handle complex data types such as images and speech. Variational autoencoders (VAEs), introduced in 2013, were among the first to achieve this transition. VAEs marked their significance as the first deep learning models to generate realistic images and speech.
Autoencoders function by encoding unlabeled data into a simpler format, and then decoding it back to its initial form. Regular autoencoders have been used for diverse tasks, such as fixing blurred or damaged images. Variational autoencoders improved this by being able to create variations of the original data, not just recreating it.
This skill to create new data sparked a fast-paced series of new technologies, like generative adversarial networks (GANs) and diffusion models. These are able to make even more lifelike — but still fake — images. So, VAEs prepared the way for the generative AI we see today.
Blocks of encoders and decoders form the basis of these structures, an architecture that also supports today's large language models. Encoders compress a dataset into a dense representation, grouping similar data points closer together in an abstract space. Decoders sample from this space to generate new content while maintaining the most significant features of the dataset.
Transformers, introduced by Google in 2017 in a landmark paper “Attention Is All You Need”, combined the encoder-decoder architecture with a text-processing mechanism called attention to change how language models were trained. An encoder converts raw unannotated text into representations known as embeddings; the decoder takes these embeddings together with previous outputs of the model, and successively predicts each word in a sentence.
Through fill-in-the-blank guessing games, the encoder learns how words and sentences relate to each other, building up a powerful representation of language without anyone having to label parts of speech and other grammatical features. Transformers, in fact, can be pre-trained at the outset without a particular task in mind. Once these powerful representations are learned, the models can later be specialized — with much less data — to perform a given task.
Transformers revolutionized language processing by handling entire sentences at once, learning word positions and contexts. This allowed faster training and better inference than previous methods like RNNs and LSTMs. They eliminated the need for task-specific training, making it possible to pre-train models on large text volumes and fine-tune them for various tasks. Known as foundation models, they can learn from vast unlabeled data, generalizing well to diverse tasks.
Transformers are utilized for classification, entity extraction, machine translation, automatic summarization, and question answering. They've recently impressed with their ability to generate dialogue and essays. They can be categorized as encoder-only, decoder-only, or encoder-decoder models.
Encoder-only models like BERT are used in search engines and customer-service bots, including IBM’s Watson Assistant. They're ideal for tasks such as classifying feedback and extracting information from extensive documents.
Decoder-only models like GPT predict the next word without an encoded representation. OpenAI released GPT-4, a 1.76 trillion parameter model, in March 2023. Since then, other large models, such as Google's Gemma, Meta Llama, Microsoft Phi-3, Mistral AI and BigScience BLOOM, have emerged.
You can check the Open LLM Leaderborad at the Hugging Face website to see models performance.
Models like Google's T5 combine Google BERT and OpenAI GPT features. They can perform generative tasks like decoder-only models but are faster and cheaper due to their compact size.
Large language models (LLMs) are a category of foundation language models trained on immense amounts of data making them capable of understanding and generating natural language and other types of content to perform a wide range of tasks.
LLMs can be used for a variety of tasks, including: Generatingand translating text, recognizing speech, performing natural language processing (NLP) tasks, creating chatbots, summarizing text, and answering questions.
The efficacy of an LLM depends largely on its training process, which includes pre-training the model on massive amounts of text data, such as books, articles, or web pages.
Here you will find a collection for learning resources on Generative AI, Large Language Models and Applications.
- Sep 5: Introduction to Hugging Face
- Sep 12: Getting started with Ollama
- Sep 19: Getting started with LibreChat
- Sep 26: Getting started with Phi 1.5
- Oct 3: Getting started with Gemini API
- Oct 10: Gradio
- Oct 17: LangChain
- Oct 24: Fine-Tuning LLMs
- Oct 31: RAG
- Nov 7: Build a LLMs from Scratch
- A LLM Reading List. Evan Miller. Github. 2023.
- Chatbot Arena Leaderboard: LLMs ratings & performance. LMSYS.
- GPT-4 Technical Report. OpenAI. Mar 27, 2023.
- HuggingFace Arxiv Daily Papers. A Khalik.
- HuggingFace Models.
- Ollama. Running LLMs locally. Downloadable models.
- Papers with Code State of the Art
- Sparks of Artificial General Intelligence: Early experiments with GPT-4. Sebastien Bubeck. Apr 13, 2023.
- State of GPT. Andrej Karpathy. OpenAI. May 23, 2023.
- The Practical Guide for Large Language Models. (Based on Arxiv paper: Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond)
Created: 06/10/2024 (C. Lizárraga)
Updated: 06/30/2024 (C. Lizárraga)
Data Lab, Data Science Institute, University of Arizona.
UArizona DataLab, Data Science Institute, University of Arizona, 2024.