Skip to content
Carlos Lizarraga-Celaya edited this page Sep 11, 2024 · 24 revisions

Welcome to Exploring the LLM Frontier: From Hugging Face to RAG and Beyond

(Image credit: Steve Johnson, Unsplash.com)


What is Generative AI?

Generative AI models use deep learning to create new data that's similar to the original. They learn from basic things like text or artwork. Variational autoencoders were some of the first models to use this ability for images and voice.

Generative models have long been used in statistics for numerical data analysis. With the advent of deep learning, these models have been extended to handle complex data types such as images and speech. Variational autoencoders (VAEs), introduced in 2013, were among the first to achieve this transition. VAEs marked their significance as the first deep learning models to generate realistic images and speech.

Autoencoders

Autoencoders function by encoding unlabeled data into a simpler format, and then decoding it back to its initial form. Regular autoencoders have been used for diverse tasks, such as fixing blurred or damaged images. Variational autoencoders improved this by being able to create variations of the original data, not just recreating it.

This skill to create new data sparked a fast-paced series of new technologies, like generative adversarial networks (GANs) and diffusion models. These are able to make even more lifelike — but still fake — images. So, VAEs prepared the way for the generative AI we see today.

Blocks of encoders and decoders form the basis of these structures, an architecture that also supports today's large language models. Encoders compress a dataset into a dense representation, grouping similar data points closer together in an abstract space. Decoders sample from this space to generate new content while maintaining the most significant features of the dataset.

Transformers

Transformers, introduced by Google in 2017 in  a landmark paper  “Attention Is All You Need”, combined the encoder-decoder architecture with a text-processing mechanism called attention to change how language models were trained. An encoder converts raw unannotated text into representations known as embeddings; the decoder takes these embeddings together with previous outputs of the model, and successively predicts each word in a sentence.

Through fill-in-the-blank guessing games, the encoder learns how words and sentences relate to each other, building up a powerful representation of language without anyone having to label parts of speech and other grammatical features. Transformers, in fact, can be pre-trained at the outset without a particular task in mind. Once these powerful representations are learned, the models can later be specialized — with much less data — to perform a given task.

Transformers revolutionized language processing by handling entire sentences at once, learning word positions and contexts. This allowed faster training and better inference than previous methods like RNNs and LSTMs. They eliminated the need for task-specific training, making it possible to pre-train models on large text volumes and fine-tune them for various tasks. Known as foundation models, they can learn from vast unlabeled data, generalizing well to diverse tasks.

Transformers are utilized for classification, entity extraction, machine translation, automatic summarization, and question answering. They've recently impressed with their ability to generate dialogue and essays. They can be categorized as encoder-only, decoder-only, or encoder-decoder models.

Encoder-only models like BERT are used in search engines and customer-service bots, including IBM’s Watson Assistant. They're ideal for tasks such as classifying feedback and extracting information from extensive documents.

Decoder-only models like GPT predict the next word without an encoded representation. OpenAI released GPT-4, a 1.76 trillion parameter model, in March 2023. Since then, other large models, such as Google's Gemma, Meta Llama, Microsoft Phi-3, Mistral AI and BigScience BLOOM, have emerged.

You can check the Open LLM Leaderborad at the Hugging Face website to see models performance.

Models like Google's T5 combine Google BERT and OpenAI GPT features. They can perform generative tasks like decoder-only models but are faster and cheaper due to their compact size.


Large Language Models

Large language models (LLMs) are a category of foundation language models trained on immense amounts of data making them capable of understanding and generating natural language and other types of content to perform a wide range of tasks.

LLMs can be used for a variety of tasks, including: Generatingand translating text, recognizing speech, performing natural language processing (NLP) tasks, creating chatbots, summarizing text, and answering questions.

The efficacy of an LLM depends largely on its training process, which includes pre-training the model on massive amounts of text data, such as books, articles, or web pages.


Here you will find a collection for learning resources on Generative AI, Large Language Models and Applications.

General references


Topics

  1. Introduction to NLP with Hugging Face Transformers
  2. Computer Vision with Hugging Face Transformers
  3. Multimodal LLM with Hugging Face Transformers
  4. Running LLM locally: Ollama
  5. Introduction to Langchain
  6. Getting started with Phi-3
  7. Getting started with Gemini/Gemma
  8. Introduction to Gradio
  9. Introduction to Retrieval Augmented Generation (RAG)

Software Tools

MLflow

DVC (Data Version Control)

Other IDE


General References


Created: 06/10/2024 (C. Lizárraga)

Updated: 09/11/2024 (C. Lizárraga)

Data Lab, Data Science Institute, University of Arizona.

CC BY-NC-SA 4.0