From 0d282f0645c2856654467bd0a7c1055985105aff Mon Sep 17 00:00:00 2001 From: Oliver Zhang Date: Wed, 11 Dec 2024 17:04:59 -0800 Subject: [PATCH] Fix ML workshop and add Spotify to header/footer. (#529) * Fix ML workshop and add Spotify to header/footer. * Fix. --------- Co-authored-by: Oliver Zhang --- .../ml-machine-learning/02-simple-linear-regression.md | 4 ++-- .../ml-machine-learning/03-confidence-intervals.md | 9 ++++----- content/english/ml-machine-learning/04-model-fit.md | 2 +- .../english/ml-machine-learning/05-making-predictions.md | 4 ++-- themes/docdock/layouts/partials/custom-footer.html | 4 ++++ themes/docdock/layouts/partials/header.html | 6 ++++++ 6 files changed, 19 insertions(+), 10 deletions(-) diff --git a/content/english/ml-machine-learning/02-simple-linear-regression.md b/content/english/ml-machine-learning/02-simple-linear-regression.md index 53f31141f..cc6d8ff6f 100644 --- a/content/english/ml-machine-learning/02-simple-linear-regression.md +++ b/content/english/ml-machine-learning/02-simple-linear-regression.md @@ -150,7 +150,7 @@ In our linear equation, let's add that error with the greek letter **ε**. Scikit-learn is a machine-learning library that will help us analyze and use the built-in simple linear regression model to predict data. In the Replit window below, you can run the program `02-e1.py` which will use a data set of employees alongside their years of experience. The program will plot a sample of 30 employees out of the employees within the company: - +Launch Replit # Exercise 2: Finding the Slope and Intercept @@ -187,7 +187,7 @@ model = linear_model.LinearRegression() model.fit(x,y) ``` - +Launch Replit As you can see, the code has returned the value for the **coefficient** and **intercept** of our linear equation. Let's update our linear equation with this. diff --git a/content/english/ml-machine-learning/03-confidence-intervals.md b/content/english/ml-machine-learning/03-confidence-intervals.md index d91e91fd7..26f69aba1 100644 --- a/content/english/ml-machine-learning/03-confidence-intervals.md +++ b/content/english/ml-machine-learning/03-confidence-intervals.md @@ -37,7 +37,7 @@ We need to run the linear model with more random samples. Suppose you were able to find over 10,000 records for employees within your company 😯! This is amazing since, in the world of machine learning, the more data you have the better the results you will obtain. Now let's take 30 random records from that 10,000+ dataset and check if the intercept and coefficient values differ from the original sample we had. On the Replit window below run the code as many times as you want but notice how the **intercept** and **coefficient** values are somewhat similar to the ones we calculated before. - +Launch Replit Why is this happening? Why are the **intercept** and **coefficient** values different every time? Why is the original sample line (i.e: green line) very close to the blue line (i.e: new sample line). @@ -64,8 +64,7 @@ We can use the [StatsModels](https://www.statsmodels.org/stable/index.html) libr |:--:| |Summary of Statsmodel run| - - +Launch Replit Within the table lets focus on the standard error which is the value labeled **stderr**. In this case, is **409.40**. This means is that, for any random sample set, the **coefficient** or **slope** of our line will vary by 409.40 or, in other words: @@ -93,7 +92,7 @@ As you can see, it is very unlikely to see a very short person or a very tall pe If you run the Replit below you will see how the generated histogram resembles the bell curve. The program creates a [histogram](https://corporatefinanceinstitute.com/resources/excel/histogram/) which shows the amount of times a value shows up in our data set. Meaning, several employees having the same salary. - +Launch Replit When this happens we can use the standard error and the following equation to say: "We are 95% confident that the value of the coefficient will be in this range". But what is that range? This range is our **confidence interval**. @@ -130,4 +129,4 @@ What we are saying with the **coefficient range** above is: The code below will take a sample of 100 random employees and create histograms to show you how they resemble a bell curve. As you can see the values never lie outside of the coefficient range. - \ No newline at end of file +Launch Replit \ No newline at end of file diff --git a/content/english/ml-machine-learning/04-model-fit.md b/content/english/ml-machine-learning/04-model-fit.md index f8f77d16a..bd73ab35e 100644 --- a/content/english/ml-machine-learning/04-model-fit.md +++ b/content/english/ml-machine-learning/04-model-fit.md @@ -29,7 +29,7 @@ There are many ways to find this but in the world of machine learning and statis The `Experience_vs_Salary-More_Data` file has over 10,000+ entries where you can see the salary and years of experience of employees of the company you work for. The Replit code below will take the initial sample of 30 employees and find the standard error and R2. - +Launch Replit As you can see, the value of R2 is 0.973. Now if anyone asks us if there is any relation in our data, we can say that "we are 97.3% confident that the years of experience of an employee is related to the salary they have". diff --git a/content/english/ml-machine-learning/05-making-predictions.md b/content/english/ml-machine-learning/05-making-predictions.md index 807feeeb6..fad9e3baa 100644 --- a/content/english/ml-machine-learning/05-making-predictions.md +++ b/content/english/ml-machine-learning/05-making-predictions.md @@ -20,10 +20,10 @@ As the number of employees increases so will the dataset and the value of will R In the Replit below, you can see how the code creates a "training dataset" and a "testing dataset" by splitting the data from the 10,000+ record file and running predictions for both data sets. - +Launch Replit As you can see, the prediction line generated in both graphs is very similar for both the training and test datasets. You can also see that the R2 for both sets is almost identical or sometimes identical. You can now use the code below and change the `experience` variable to whatever you want, the plot will show the predicted salary based on the experience you add. - \ No newline at end of file +Launch Replit \ No newline at end of file diff --git a/themes/docdock/layouts/partials/custom-footer.html b/themes/docdock/layouts/partials/custom-footer.html index 38c72de9f..c7a497d93 100644 --- a/themes/docdock/layouts/partials/custom-footer.html +++ b/themes/docdock/layouts/partials/custom-footer.html @@ -25,6 +25,10 @@ class="footer-pics"> + + + diff --git a/themes/docdock/layouts/partials/header.html b/themes/docdock/layouts/partials/header.html index 76be1d488..52599f500 100644 --- a/themes/docdock/layouts/partials/header.html +++ b/themes/docdock/layouts/partials/header.html @@ -75,6 +75,12 @@ +
+ + + +