Skip to content

Commit

Permalink
Add AI onboarding directory to python directory (#13)
Browse files Browse the repository at this point in the history
  • Loading branch information
cld-vasconcelos authored Aug 2, 2024
1 parent 51c930c commit c09b4ec
Show file tree
Hide file tree
Showing 2 changed files with 79 additions and 0 deletions.
35 changes: 35 additions & 0 deletions docs/python/AI-onboarding/1_machine_learning.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# Machine Learning

This machine learning onboarding consists on a set of free courses and competitions in [Kaggle](https://www.kaggle.com/), which requires registration (also free).

## Week 1
1. [Pandas](https://www.kaggle.com/learn/pandas)
- Optional, only for people with no experience with Pandas
2. [Intro to Machine Learning](https://www.kaggle.com/learn/intro-to-machine-learning)
- Explains the basic of machine learning, shows how to build and evaluate ML models
- Focus on regression models, namely decision trees and then random forests
3. [Intermediate Machine Learning](https://www.kaggle.com/learn/intermediate-machine-learning)
- Explain basic data preparation
- Expands on model evaluation
- Also introduces gradient boosting
4. [Data Cleaning](https://www.kaggle.com/learn/data-cleaning)
- Continuation of presentation data preparation techniques already introduced in the previous course
5. [Feature Engineering](https://www.kaggle.com/learn/feature-engineering)
- Describes techniques to identify important features or potential new features
- Suggests entering a competition in the end, great to apply the knowledge gathered from the previous courses
6. [House Prices - Advanced Regression Techniques](https://www.kaggle.com/competitions/house-prices-advanced-regression-techniques/overview)
- Competition suggested at the end of the previous course
- Useful as an exercise for consolidating the knowledge from all the previous courses

## Week 2
1. Continue working on your solution to the [House Prices - Advanced Regression Techniques](https://www.kaggle.com/competitions/house-prices-advanced-regression-techniques/overview) competition

## Week 3
1. [Intro to Deep Learning](https://www.kaggle.com/learn/intro-to-deep-learning)
- Shows how to build, evaluate and tune neural networks
- In addition to regression, explains how to use neural networks to solve classification problems
2. Enter one of the following courses suggested in the previous course:
- [Petals to the Metal - Flower Classification on TPU](https://www.kaggle.com/c/tpu-getting-started)
- [I’m Something of a Painter Myself](https://www.kaggle.com/c/gan-getting-started)
- [Natural Language Processing with Disaster Tweets](https://www.kaggle.com/c/nlp-getting-started)
- [Contradictory, My Dear Watson](https://www.kaggle.com/c/contradictory-my-dear-watson)
44 changes: 44 additions & 0 deletions docs/python/AI-onboarding/2_large_language_models.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# Large Language Models

## Large Language Models (LLMs)

### Relevant Topics
- Pre-training
- Fine-tuning

### Resources
| Type | Title | Comments |
| ----------- | ----------- | ----------- |
| Course | [Introduction to Large Language Models](https://www.cloudskillsboost.google/course_templates/539) | <ul><li>8 hour free course. Requires enrollment</li><li>Begins with a very good 15 minute video describing LLMs and their main concepts</li><li>Includes a collection of useful pages of information about LLMs.</li><li>Has a mini quiz at the end about LLMs</li></ul> |
| Video | [A Hackers' Guide to Language Models](https://www.youtube.com/watch?v=jkrNMKz9pWU) | <ul><li>1h30 YouTube video</li><li>Describes LLMs and important concepts (e.g., tokenization, training, fine-tuning)</li><li>Explains some actual models (e.g., GPT-4) and their limitations</li></ul> |
| Course | [Training & Fine-Tuning LLMs for Production](https://learn.activeloop.ai/courses/llms) | <ul><li>Hands-on course for training, fine-tuning and adapting LLMs to specific tasks</li><li>Running all of the course’s examples will cost around $100, although it is not necessary to complete the course</li></ul> |
| Video (Playlist) | [Training & Fine-Tuning LLMs Course](https://www.youtube.com/playlist?list=PLD80i8An1OEGqqXeNZ5w0IBmeZcxpZEYL) | <ul><li>Series of 4 YouTube videos of about 1 hour each</li><li>They explain the basics of LLMs and important concepts (e.g., evaluation, data, training and fine-tuning) in a practical way</li><li>Skip video #4 since it has audio isues, video #5 is a reupload with fixed audio</li></ul> |

## Retrieval Augmented Generation (RAG)

### Relevant Topics
- Embeddings
- Vector Databases
- Document Retrieval
- LangChain

### Resources
| Type | Title | Comments |
| ----------- | ----------- | ----------- |
| Course | [LangChain for LLM Application Development](https://learn.deeplearning.ai/langchain/) | <ul><li>Series of 8 videos</li><li>Each video (except for the first and last) is accompanied by a notebook and you are encouraged to explore it, testing different prompts to produce different outputs</li><li>The course was partially created and taught by the creator of LangChain</li></ul> |
| Course | [LangChain Chat with Your Data](https://learn.deeplearning.ai/langchain-chat-with-your-data) | <ul><li>Series of 8 videos</li><li>Follows up the first course</li><li>Thoroughly explains how to implement the RAG pipeline using LangChain, offering several different approaches for each of the steps</li><li>Explains how to produce an end-to-end chatbot that can answer questions about a certain dataset</li></ul> |
| Video | [Vector Embeddings for Beginners](https://www.youtube.com/watch?v=PR7xz5vQKGg) | <ul><li>35 minute video</li><li>Covers vector embeddings, vector databases and LangChain</li></ul> |
| Video | [What is Retrieval-Augmented Generation](https://www.youtube.com/watch?v=T-D1OfcDW1M) | <ul><li>6 minute video by IBM</li><li>Describes some of the challenges presented by LLMs</li><li>Describes what RAG is and how it solves those problems</li></ul> |
| Video | [Chatbots with RAG: LangChain Full Walkthrough](https://www.youtube.com/watch?v=LhnCsygAvzY) | <ul><li>35 minute video</li><li>Good explanation of the RAG pipeline</li><li>Explains how to build a RAG chatbot with code</li><li>Requires API keys for OpenAI and Pinecone</li></ul> |


## PandasAI

### Relevant Topics
- Smart Data Frames & Datalakes

### Resources
| Type | Title | Comments |
| ----------- | ----------- | ----------- |
| Video | [​PandasAI - Talk to Your Data](https://www.youtube.com/watch?v=mQmRi2QTebM) | <ul><li>27 minute video, presented by the creator of PandasAI</li><li>Includes an explanation of how it works</li><li>Shows several examples of PandasAI’s functionalities</li></ul> |
| Webpage | [An Introduction to Pandas AI](https://www.datacamp.com/blog/an-introduction-to-pandas-ai) | <ul><li>Short introduction to PandasAI with examples</li><li>Covers basic aspects like setting up PandasAI, prompting a dataframe for basic answers and charts</li></ul> |

0 comments on commit c09b4ec

Please sign in to comment.