Large Language Models

What are Large Language Models?

(Image credit: Google DeepMind. Unsplash)

Summary of Large Language Models

Large Language Models (LLMs) are deep learning models that are trained to work on human languages. They are capable of understanding and generating text in a human-like fashion. LLMs are based on large transformer models that use a combination of artificial neural networks and natural language processing techniques to process and generate text.

Overall, LLMs represent a significant advance in the field of natural language processing and have the potential to revolutionize the way we interact with computers and other digital devices.

Large language models (LLMs) are trained on massive datasets of text and code. This is known as Generative AI, and allows the models to learn the patterns and relationships between words, which in turn allows them to perform a variety of tasks, including:

Generating text: LLMs can be used to generate text that is similar to human-written text. This can be used for a variety of purposes, such as creating chatbots, writing articles, or generating creative content.
Translating languages: LLMs can be used to translate text from one language to another. This can be useful for businesses that need to communicate with customers in multiple languages, or for individuals who want to read content in a language they don't speak.
Answering questions - Chatbots: LLMs can be used to answer questions about the world. This can be useful for students who are doing research, or for people who want to learn more about a particular topic.
Summarizing text: LLMs can be used to summarize text, extracting the key points and presenting them in a concise way. This can be useful for people who don't have time to read long articles, or for students who need to summarize their research findings.

LLMs are still under development, but they have the potential to revolutionize the way we interact with computers. In the future, LLMs could be used to create more natural and engaging user interfaces, or to provide us with personalized recommendations and advice.

Here are some of the current applications of LLMs:

Chatbots: LLMs are being used to power chatbots that can have conversations with humans in a natural way. This is being used in a variety of industries, such as customer service, healthcare, and education.
Virtual assistants: LLMs are also being used to power virtual assistants like Amazon Alexa and Google Assistant. These assistants can answer questions, set reminders, and control smart devices.
Content generation: LLMs are being used to generate content, such as articles, blog posts, and even creative writing. This is being used by businesses to create marketing content, and by individuals to share their thoughts and ideas.
Machine translation: LLMs are being used to improve machine translation. This is making it possible to translate text from one language to another more accurately and fluently.

Large Language Models development

(Image credit: Mooler0410 )

LLMs are still a new technology, but they have the potential to have a major impact on the way we interact with computers. As they continue to develop, we can expect to see even more innovative and exciting applications for this technology.

Multiple large language models have been developed, including

GPT-3 (Jun 2020), GPT-3.5 (Mar 2022) and GPT-4 (Mar 2023) in ChatGPT from OpenAI,
LLaMA (Feb 2023), LLaMA-2 (Jul 2023), LLaMA-3 (Apr 2024) from Meta,
Gemini/Bard (Mar 2023) from Google,
Claude (Jul 2023) from Anthropic A.I.,
Mistral (Apr 2023) from Mistral AI.

There is a LLMs ratings & performance Leaderboard, where you can follow the recent LLMs performance tests (Apr 2024 LLMs ratings below):

LLMs can understand language, generate text, and are based on transformer architecture with attention mechanisms to capture context and generate text based on previously generated tokens.

Transformer Architecture (CC Image credit: Wikimedia Commons)

Attention Mechanism (CC Image credit: Wikimedia Commons)

ChatGPT

OpenAI ChatGPT, (and its variants Google Gemini, and Anthropic Claude AI)a large language model, gained popularity and attention when it was introduced, but many people still don't understand how it works.

Unlike conventional software, ChatGPT is built on a neural network trained on billions of words of ordinary language, making it difficult for anyone to fully comprehend its inner workings.

Although researchers are working to gain a better understanding, it will take years or even decades to fully comprehend LLMs. This article aims to explain the inner workings of language models by discussing word vectors, transformers, and the training process.

(See: LLM Models Architecture Visualization)

Word vectors

Relative frequency of letters in English language root words (CC Image credit: Wikimedia Commons)

In languages like English, words are represented as a sequence of letters, like C-A-T for cat. Language models use vectors (a list of 300 numbers, for example). Words live in this 300-dimensional "word space". Words with similar meanings are placed closed together. Example of words close to cat are: kitten, dog, pet.

Similarly, words that are close together are: big and small (size), german and Germany (nationality, language), Berlin and Germany (capital), mouse and mice (plural), man to woman (gender), king and queen (role).

(Image credit: Understanding AI, Timothy B. Lee and Sean Trott)

Word meaning depends on context

In Linguistics, there are several ways to classify words.

Polysemy, words with closely related meanings: magazine, bank, man, ...
Homonyms, words having the same name but unrelated meanings: bank, row, bark, ...
Homophones / Homgraphs, words that are pronounced the same but have different meaning: rose, read, by (buy), merry (marry, Mary), depend (deep end), example (egg sample), ...
Ambiguous: "the professor urged the student to do her homework", "fruit flies like a banana"

We usually resolve ambiguities based on context, with no deterministic rules for doing this. Language models with the help of word vectors, provide a way of having a precise meaning in a specific context.

Transforming word vectors into word predictions

The [Generative Pre-Trained Transformer](https://en.wikipedia.org/wiki/Generative_pre-trained_transformer 3) (GPT-3) is behind the original of chatGPT large language model released by OpenAI in 2020. GPT-3 Transformer is organized in dozens of neural network layers.

Suppose we have the partial sentence: "John wants his bank to cash the". These words, represented as word2vec-style vectors are fed into the first transformer, as shown in the figure below.

(Image credit: Understanding AI, Timothy B. Lee and Sean Trott)

The transformer figures out that wants and cash are verbs (which are also nouns). The next layer figures out that bank is a financial institution and not a river bank, and that the pronoun his refers to John.

This is a simple example of a two layer model. GPT-3 has 96 layers, and word vectors of 12,288 dimensions.

In a 1,000 word story, the 60th layer might include a vector vector with comment like "John (main character, male, married to Cherryl, cousin of Donald, from Minnesota, currently in Boise, trying to find his missing wallet)".