diff --git a/_images/1f2ee11f4771a887b67c05df37494adfb8aecbd44a7a9f5fbd5afc1cb1c75920.png b/_images/1f2ee11f4771a887b67c05df37494adfb8aecbd44a7a9f5fbd5afc1cb1c75920.png
new file mode 100644
index 000000000..8d370da61
Binary files /dev/null and b/_images/1f2ee11f4771a887b67c05df37494adfb8aecbd44a7a9f5fbd5afc1cb1c75920.png differ
diff --git a/_images/274f0024bb02add113fae8af2240bd62251e182ddbe6c52fb96ef0a75fbd28dd.png b/_images/274f0024bb02add113fae8af2240bd62251e182ddbe6c52fb96ef0a75fbd28dd.png
new file mode 100644
index 000000000..bfb3c03be
Binary files /dev/null and b/_images/274f0024bb02add113fae8af2240bd62251e182ddbe6c52fb96ef0a75fbd28dd.png differ
diff --git a/_images/4d22edbf348f43256d1e465ca1629b5d492e11651a4230519554fc5c37f7d2f0.png b/_images/4d22edbf348f43256d1e465ca1629b5d492e11651a4230519554fc5c37f7d2f0.png
deleted file mode 100644
index a23e11040..000000000
Binary files a/_images/4d22edbf348f43256d1e465ca1629b5d492e11651a4230519554fc5c37f7d2f0.png and /dev/null differ
diff --git a/_images/4d4a3e47e3571e377430828ed1dbfee87be0cc52411c5507ffb75aff723921cd.png b/_images/4d4a3e47e3571e377430828ed1dbfee87be0cc52411c5507ffb75aff723921cd.png
new file mode 100644
index 000000000..ba291cbcd
Binary files /dev/null and b/_images/4d4a3e47e3571e377430828ed1dbfee87be0cc52411c5507ffb75aff723921cd.png differ
diff --git a/_images/4f0ebdab72143a09a7cd1f17453e98aeb23cd8e7a08dcb9d395a24f24a8dd991.png b/_images/4f0ebdab72143a09a7cd1f17453e98aeb23cd8e7a08dcb9d395a24f24a8dd991.png
new file mode 100644
index 000000000..db5963180
Binary files /dev/null and b/_images/4f0ebdab72143a09a7cd1f17453e98aeb23cd8e7a08dcb9d395a24f24a8dd991.png differ
diff --git a/_images/526bbfc5df5c554dcedb08f917749ac49a6112047439f59c84109ab3df812acb.png b/_images/526bbfc5df5c554dcedb08f917749ac49a6112047439f59c84109ab3df812acb.png
deleted file mode 100644
index 55db7f5b6..000000000
Binary files a/_images/526bbfc5df5c554dcedb08f917749ac49a6112047439f59c84109ab3df812acb.png and /dev/null differ
diff --git a/_images/5eed64af5ddd3df53dc20ef443825b72e14ad0f98e1f03e24b4c7dfcd5a27566.png b/_images/5eed64af5ddd3df53dc20ef443825b72e14ad0f98e1f03e24b4c7dfcd5a27566.png
new file mode 100644
index 000000000..1f4812101
Binary files /dev/null and b/_images/5eed64af5ddd3df53dc20ef443825b72e14ad0f98e1f03e24b4c7dfcd5a27566.png differ
diff --git a/_images/5f42c5cf3a26d34a551c635ee83a8eabadc455be26c75bc2f7485177a27c255d.png b/_images/5f42c5cf3a26d34a551c635ee83a8eabadc455be26c75bc2f7485177a27c255d.png
new file mode 100644
index 000000000..39ede6218
Binary files /dev/null and b/_images/5f42c5cf3a26d34a551c635ee83a8eabadc455be26c75bc2f7485177a27c255d.png differ
diff --git a/_images/64058d37f289e3656e6d296491b14d5cfc7641127067f41d0c6a3f6a91b7e179.png b/_images/64058d37f289e3656e6d296491b14d5cfc7641127067f41d0c6a3f6a91b7e179.png
deleted file mode 100644
index 41956a1ec..000000000
Binary files a/_images/64058d37f289e3656e6d296491b14d5cfc7641127067f41d0c6a3f6a91b7e179.png and /dev/null differ
diff --git a/_images/6b176037749651980a9a02a512bd69e994655b36cab66cb2d1a25f8d9af0c528.png b/_images/6b176037749651980a9a02a512bd69e994655b36cab66cb2d1a25f8d9af0c528.png
deleted file mode 100644
index 21c96c5e9..000000000
Binary files a/_images/6b176037749651980a9a02a512bd69e994655b36cab66cb2d1a25f8d9af0c528.png and /dev/null differ
diff --git a/_images/7acdaa9daf0507c86354349586991516c97f08178468e1d51655fc8892312fe9.png b/_images/7acdaa9daf0507c86354349586991516c97f08178468e1d51655fc8892312fe9.png
deleted file mode 100644
index 208e02118..000000000
Binary files a/_images/7acdaa9daf0507c86354349586991516c97f08178468e1d51655fc8892312fe9.png and /dev/null differ
diff --git a/_images/8a1a0f9de4ef832c52f09191ecc0214d8b6996df8d6613698f36e29e05832d63.png b/_images/8a1a0f9de4ef832c52f09191ecc0214d8b6996df8d6613698f36e29e05832d63.png
deleted file mode 100644
index 5d055831b..000000000
Binary files a/_images/8a1a0f9de4ef832c52f09191ecc0214d8b6996df8d6613698f36e29e05832d63.png and /dev/null differ
diff --git a/_images/8b7e2067a571cd81babf409c1cd9ca7c041032ef0e13cfdd4b3ed56ebe978122.png b/_images/8b7e2067a571cd81babf409c1cd9ca7c041032ef0e13cfdd4b3ed56ebe978122.png
new file mode 100644
index 000000000..6343ea505
Binary files /dev/null and b/_images/8b7e2067a571cd81babf409c1cd9ca7c041032ef0e13cfdd4b3ed56ebe978122.png differ
diff --git a/_images/ee0219c6b1af7da294c7b8282f6483798df6b41b8798f2663e1a8b3c19d3e74f.png b/_images/ee0219c6b1af7da294c7b8282f6483798df6b41b8798f2663e1a8b3c19d3e74f.png
deleted file mode 100644
index 67ea915ef..000000000
Binary files a/_images/ee0219c6b1af7da294c7b8282f6483798df6b41b8798f2663e1a8b3c19d3e74f.png and /dev/null differ
diff --git a/_sources/tutorials/W3D1_TimeSeriesAndNaturalLanguageProcessing/student/W3D1_Tutorial2.ipynb b/_sources/tutorials/W3D1_TimeSeriesAndNaturalLanguageProcessing/student/W3D1_Tutorial2.ipynb
index 35ff008aa..88f4cdb3e 100644
--- a/_sources/tutorials/W3D1_TimeSeriesAndNaturalLanguageProcessing/student/W3D1_Tutorial2.ipynb
+++ b/_sources/tutorials/W3D1_TimeSeriesAndNaturalLanguageProcessing/student/W3D1_Tutorial2.ipynb
@@ -8,7 +8,7 @@
"id": "view-in-github"
},
"source": [
- " "
+ " "
]
},
{
@@ -23,7 +23,7 @@
"\n",
"**By Neuromatch Academy**\n",
"\n",
- "__Content creators:__ Lyle Ungar, Jordan Matelsky, Konrad Kording, Shaonan Wang\n",
+ "__Content creators:__ Lyle Ungar, Jordan Matelsky, Konrad Kording, Shaonan Wang, Alish Dipani\n",
"\n",
"__Content reviewers:__ Shaonan Wang, Weizhe Yuan, Dalia Nasr, Stephen Kiilu, Alish Dipani, Dora Zhiyu Yang, Adrita Das\n",
"\n",
@@ -378,7 +378,7 @@
"\n",
"In classical transformer systems, a core principle is encoding and decoding. We can encode an input sequence as a vector (that implicitly codes what we just read). And we can then take this vector and decode it, e.g., as a new sentence. So a sequence-to-sequence (e.g., sentence translation) system may read a sentence (made out of words embedded in a relevant space) and encode it as an overall vector. It then takes the resulting encoding of the sentence and decodes it into a translated sentence.\n",
"\n",
- "In modern transformer systems, such as GPT, all words are used in parallel. In that sense, the transformers generalize the encoding/decoding idea. Examples of this strategy include all the modern large language models (such as GPT)."
+ "In modern transformer systems, such as GPT, all words are used parallelly. In that sense, the transformers generalize the encoding/decoding idea. Examples of this strategy include all the modern large language models (such as GPT)."
]
},
{
@@ -601,7 +601,6 @@
},
"outputs": [],
"source": [
- "# Try playing with these hyperparameters!\n",
"VOCAB_SIZE = 12_000"
]
},
@@ -677,6 +676,15 @@
"])"
]
},
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "execution": {}
+ },
+ "source": [
+ "**Note:** In practice, it is not necessary to use pre-tokenizers, but we use it for demonstration purposes. For instance, \"2-3\" is not the same as \"23\", so removing punctuation or splitting up digits or punctuation is a bad idea! Moreover, the current tokenizer is powerful enough to deal with punctuation."
+ ]
+ },
{
"cell_type": "markdown",
"metadata": {
@@ -708,6 +716,26 @@
")"
]
},
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "execution": {}
+ },
+ "source": [
+ "### Special Tokens\n",
+ "\n",
+ "Tokenizers often have special tokens representing certain concepts such as:\n",
+ "* [PAD]: Added to the end of shorter input sequences to ensure equal input length for the whole batch\n",
+ "* [START]: Start of the sequence\n",
+ "* [END]: End of the sequence\n",
+ "* [UNK]: Unknown characters not present in the vocabulary\n",
+ "* [BOS]: Beginning of sentence\n",
+ "* [EOS]: End of sentence\n",
+ "* [SEP]: Separation between two sentences in a sequence\n",
+ "* [CLS]: Token used for classification tasks to represent the whole sequence\n",
+ "* [MASK]: Used in pre-training phase for masked language modeling tasks in models like BERT"
+ ]
+ },
{
"cell_type": "markdown",
"metadata": {
@@ -794,50 +822,7 @@
"execution": {}
},
"source": [
- "### Think 2.1! Is it a good idea to do pre_tokenizers?"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "execution": {}
- },
- "source": [
- "[*Click for solution*](https://github.com/NeuromatchAcademy/course-content-dl/tree/main/tutorials/W3D1_TimeSeriesAndNaturalLanguageProcessing/solutions/W3D1_Tutorial2_Solution_802b4f3d.py)\n",
- "\n"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "#### Submit your feedback\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "cellView": "form",
- "execution": {},
- "tags": [
- "hide-input"
- ]
- },
- "outputs": [],
- "source": [
- "# @title Submit your feedback\n",
- "content_review(f\"{feedback_prefix}_Is_it_a_good_idea_to_do_pre_tokenizers_Discussion\")"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "execution": {}
- },
- "source": [
- "### Think 2.2! Tokenizer good practices\n",
+ "### Think 2.1! Tokenizer good practices\n",
"\n",
"We established that the tokenizer is a better move than the One-Hot-Encoder because it can handle out-of-vocabulary words. But what if we just made a one-hot encoding where the vocabulary is all possible two-character combinations? Would there still be an advantage to the tokenizer?\n",
"\n",
@@ -884,7 +869,7 @@
"execution": {}
},
"source": [
- "### Think 2.3: Chinese and English tokenizer\n",
+ "### Think 2.2: Chinese and English tokenizer\n",
"\n",
"Let's think about a language like Chinese, where words are each composed of a relatively fewer number of characters compared to English (`hungry` is six unicode characters, but `饿` is one unicode character), but there are many more unique Chinese characters than there are letters in the English alphabet.\n",
"\n",
@@ -1487,7 +1472,7 @@
"execution": {}
},
"source": [
- "### Coding Exercise 4.1: Implement the code to fine-tune the model\n",
+ "### Implement the code to fine-tune the model\n",
"\n",
"Here are the big pieces of what we do below:\n",
"\n",
@@ -1538,7 +1523,15 @@
" tokenizer=tokenizer, mlm=False,\n",
")\n",
"\n",
- "trainer = ..."
+ "# Trainer:\n",
+ "trainer = Trainer(\n",
+ " model=model,\n",
+ " args=training_args,\n",
+ " train_dataset=encoded_dataset,\n",
+ " tokenizer=tokenizer,\n",
+ " compute_metrics=compute_metrics,\n",
+ " data_collator=data_collator,\n",
+ ")"
]
},
{
@@ -1549,18 +1542,19 @@
},
"outputs": [],
"source": [
- "trainer = ..."
+ "# Run the actual training:\n",
+ "trainer.train()"
]
},
{
"cell_type": "markdown",
"metadata": {
- "colab_type": "text",
"execution": {}
},
"source": [
- "[*Click for solution*](https://github.com/NeuromatchAcademy/course-content-dl/tree/main/tutorials/W3D1_TimeSeriesAndNaturalLanguageProcessing/solutions/W3D1_Tutorial2_Solution_b453433d.py)\n",
- "\n"
+ "### Coding Exercise 4.1: Implement the code to generate text after fine-tuning.\n",
+ "\n",
+ "To generate text, we provide input tokens to the model, let it generate the next token and append it into the input tokens. Now, keep repeating this process until you reach the desired output length."
]
},
{
@@ -1571,17 +1565,64 @@
},
"outputs": [],
"source": [
- "# Run the actual training:\n",
- "trainer.train()"
+ "# Number of tokens to generate\n",
+ "num_tokens = 100\n",
+ "\n",
+ "# Move the model to the CPU for inference\n",
+ "model.to(\"cpu\")\n",
+ "\n",
+ "# Print input prompt\n",
+ "print(f'Input prompt: \\n{input_prompt}')\n",
+ "\n",
+ "#################################################\n",
+ "# Implement a the correct tokens and outputs\n",
+ "raise NotImplementedError(\"Text Generation\")\n",
+ "#################################################\n",
+ "\n",
+ "# Encode the input prompt\n",
+ "# https://huggingface.co/docs/transformers/en/main_classes/tokenizer\n",
+ "input_tokens = ...\n",
+ "\n",
+ "# Turn off storing gradients\n",
+ "with torch.no_grad():\n",
+ " # Keep iterating until num_tokens are generated\n",
+ " for tkn_idx in tqdm(range(num_tokens)):\n",
+ " # Forward pass through the model\n",
+ " # The model expects the tensor to be of Long or Int dtype\n",
+ " output = ...\n",
+ " # Get output logits\n",
+ " logits = output.logits[-1, :]\n",
+ " # Convert into probabilities\n",
+ " probs = nn.functional.softmax(logits, dim=-1)\n",
+ " # Get the index of top token\n",
+ " top_token = ...\n",
+ " # Append the token into the input sequence\n",
+ " input_tokens.append(top_token)\n",
+ "\n",
+ "# Decode and print the generated text\n",
+ "# https://huggingface.co/docs/transformers/en/main_classes/tokenizer\n",
+ "decoded_text = ...\n",
+ "print(f'Generated text: \\n{decoded_text}')"
]
},
{
"cell_type": "markdown",
"metadata": {
+ "colab_type": "text",
"execution": {}
},
"source": [
- "Finally, we will try our model on the same code snippet to see how it performs after fine-tuning:"
+ "[*Click for solution*](https://github.com/NeuromatchAcademy/course-content-dl/tree/main/tutorials/W3D1_TimeSeriesAndNaturalLanguageProcessing/solutions/W3D1_Tutorial2_Solution_0f765585.py)\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "execution": {}
+ },
+ "source": [
+ "We can also directly generate text using the generation_pipeline:"
]
},
{
@@ -1801,9 +1842,7 @@
"source": [
"## Play around with LLMs\n",
"\n",
- "1. Try using LLMs' API to do tasks, such as utilizing the GPT-2 API to extend text from a provided context. To achieve this, ensure you have a HuggingFace account and secure an API token.\n",
- "\n",
- "\n"
+ "1. Try using LLMs' API to do tasks, such as utilizing the GPT-2 API to extend text from a provided context. To achieve this, ensure you have a HuggingFace account and secure an API token."
]
},
{
@@ -1817,10 +1856,10 @@
"import requests\n",
"\n",
"def query(payload, model_id, api_token):\n",
- " headers = {\"Authorization\": f\"Bearer {api_token}\"}\n",
- " API_URL = f\"https://api-inference.huggingface.co/models/{model_id}\"\n",
- " response = requests.post(API_URL, headers=headers, json=payload)\n",
- " return response.json()\n",
+ " headers = {\"Authorization\": f\"Bearer {api_token}\"}\n",
+ " API_URL = f\"https://api-inference.huggingface.co/models/{model_id}\"\n",
+ " response = requests.post(API_URL, headers=headers, json=payload)\n",
+ " return response.json()\\\n",
"\n",
"model_id = \"gpt2\"\n",
"api_token = \"hf_****\" # get yours at hf.co/settings/tokens\n",
diff --git a/projects/ComputerVision/data_augmentation.html b/projects/ComputerVision/data_augmentation.html
index 414908acf..1422c1683 100644
--- a/projects/ComputerVision/data_augmentation.html
+++ b/projects/ComputerVision/data_augmentation.html
@@ -1763,8 +1763,8 @@
Cutout
@@ -2424,7 +2424,7 @@ Submit your feedback
@@ -1489,7 +1489,7 @@ Section 1: Welcome to Neuromatch Deep learning course
This will be an intensive 3 week adventure. We will all learn Deep Learning (DL) in a group. Groups need standards. Read our
Code of Conduct.
@@ -1510,14 +1510,14 @@ Submit your feedback
@@ -1536,7 +1536,7 @@ Submit your feedback
@@ -1549,8 +1549,8 @@ Coding Exercise 2: Implement a general-purpose MLP in PytorchLeaky ReLU) in all hidden layers
Leaky ReLU is described by the following mathematical formula:
-
-
(35)\[\begin{align}
+
+(35)\[\begin{align}
\text{LeakyReLU}(x) &= \text{max}(0,x) + \text{negative_slope} \cdot \text{min}(0, x) \\
&=
\left\{
@@ -1664,7 +1664,7 @@
Submit your feedback
+
@@ -1674,7 +1674,7 @@ Section 2.1: Classification with MLPs
@@ -1693,7 +1693,7 @@ Submit your feedback
@@ -1802,8 +1802,8 @@ Submit your feedback
Before we could start optimizing these loss functions, we need a dataset!
Let’s turn this fancy-looking equation into a classification dataset
-
-
(38)\[\begin{equation}
+
+
(38)\[\begin{equation}
\begin{array}{c}
X_{k}(t)=t\left(\begin{array}{c}
\sin \left[\frac{2 \pi}{K}\left(2 t+k-1\right)\right]+\mathcal{N}\left(0, \sigma\right) \\
@@ -1874,7 +1874,7 @@
Section 2.3: Training and Evaluation
@@ -1893,7 +1893,7 @@ Submit your feedback
+
@@ -1985,7 +1985,7 @@ Submit your feedback
@@ -2279,7 +2279,7 @@ Bonus: Neuron Physiology and Motivation to Deep Learning
@@ -2298,23 +2298,23 @@ Submit your feedback
Leaky Integrate-and-fire (LIF) neuronal model
The basic idea of LIF neuron was proposed in 1907 by Louis Édouard Lapicque, long before we understood the electrophysiology of a neuron (see a translation of Lapicque’s paper ). More details of the model can be found in the book Theoretical neuroscience by Peter Dayan and Laurence F. Abbott.
The model dynamics is defined with the following formula,
-
-
(39)\[\begin{equation}
+
+(39)\[\begin{equation}
\frac{d V_m}{d t}=\left\{\begin{array}{cc}
\frac{1}{C_m}\left(-\frac{V_m}{R_m} + I \right) & t>t_{rest} \\
0 & \text { otherwise }
\end{array}\right.
\end{equation}\]
Note that \(V_{m}\), \(C_{m}\), and \(R_{m}\) are the membrane voltage, capacitance, and resitance of the neuron, respectively, so the \(-\frac{V_{m}}{R_{m}}\) denotes the leakage current. When \(I\) is sufficiently strong such that \(V_{m}\) reaches a certain threshold value \(V_{\rm th}\), it momentarily spikes and then \(V_{m}\) is reset to \(V_{\rm reset}< V_{\rm th}\), and voltage stays at \(V_{\rm reset}\) for \(\tau_{\rm ref}\) ms, mimicking the refractoriness of the neuron during an action potential (note that \(V_{\rm reset}\) and \(\tau_{\rm ref}\) is assumed to be zero in the lecture):
-
-
(40)\[\begin{eqnarray}
+
+(40)\[\begin{eqnarray}
V_{m}(t)=V_{\rm reset} \text{ for } t\in(t_{\text{sp}}, t_{\text{sp}} + \tau_{\text{ref}}]
\end{eqnarray}\]
where \(t_{\rm sp}\) is the spike time when \(V_{m}(t)\) just exceeded \(V_{\rm th}\).
@@ -2331,8 +2331,8 @@
Leaky Integrate-and-fire (LIF) neuronal model
In the cell below is given a function for LIF neuron model with it’s arguments described.
Note that we will use Euler’s method to make a numerical approximation to a derivative. Hence we will use the following implementation of the model dynamics,
-
-
(41)\[\begin{equation}
+
+(41)\[\begin{equation}
V_m^{[n]}=\left\{\begin{array}{cc}
V_m^{[n-1]} + \frac{1}{C_m}\left(-\frac{V_m^{[n-1]}}{R_m}+I \right) \Delta t & t>t_{r e s t} \\
0 & \text { otherwise }
@@ -2471,7 +2471,7 @@
-
+
@@ -2496,7 +2496,7 @@
Submit your feedback
+
diff --git a/tutorials/W1D3_MultiLayerPerceptrons/student/W1D3_Tutorial2.html b/tutorials/W1D3_MultiLayerPerceptrons/student/W1D3_Tutorial2.html
index a638278d1..47d0253ab 100644
--- a/tutorials/W1D3_MultiLayerPerceptrons/student/W1D3_Tutorial2.html
+++ b/tutorials/W1D3_MultiLayerPerceptrons/student/W1D3_Tutorial2.html
@@ -50,7 +50,7 @@
-
+
@@ -989,7 +989,7 @@ Tutorial Objectives