Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ML workshop and add Spotify to header/footer. #529

Merged
merged 2 commits into from
Dec 12, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -150,7 +150,7 @@ In our linear equation, let's add that error with the greek letter **ε**.

Scikit-learn is a machine-learning library that will help us analyze and use the built-in simple linear regression model to predict data. In the Replit window below, you can run the program `02-e1.py` which will use a data set of employees alongside their years of experience. The program will plot a sample of 30 employees out of the employees within the company:

<iframe height="500px" width="100%" src="https://replit.com/@nuevofoundation/LinearRegression-ConsoleApp#src/02-e1.py" scrolling="no" frameborder="no" allowtransparency="true" allowfullscreen="true" sandbox="allow-forms allow-pointer-lock allow-popups allow-same-origin allow-scripts allow-modals"></iframe>
<a class="my-2 mx-4 btn btn-info" href="https://replit.com/@nuevofoundation/LinearRegression-ConsoleApp#src/02-e1.py" target="_blank">Launch Replit</a>

# Exercise 2: Finding the Slope and Intercept

Expand Down Expand Up @@ -187,7 +187,7 @@ model = linear_model.LinearRegression()
model.fit(x,y)
```

<iframe height="500px" width="100%" src="https://replit.com/@nuevofoundation/LinearRegression-ConsoleApp#src/02-e2.py" scrolling="no" frameborder="no" allowtransparency="true" allowfullscreen="true" sandbox="allow-forms allow-pointer-lock allow-popups allow-same-origin allow-scripts allow-modals"></iframe>
<a class="my-2 mx-4 btn btn-info" href="https://replit.com/@nuevofoundation/LinearRegression-ConsoleApp#src/02-e2.py" target="_blank">Launch Replit</a>

As you can see, the code has returned the value for the **coefficient** and **intercept** of our linear equation. Let's update our linear equation with this.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ We need to run the linear model with more random samples.

Suppose you were able to find over 10,000 records for employees within your company 😯! This is amazing since, in the world of machine learning, the more data you have the better the results you will obtain. Now let's take 30 random records from that 10,000+ dataset and check if the intercept and coefficient values differ from the original sample we had. On the Replit window below run the code as many times as you want but notice how the **intercept** and **coefficient** values are somewhat similar to the ones we calculated before.

<iframe height="500px" width="100%" src="https://replit.com/@nuevofoundation/LinearRegression-ConsoleApp#src/03-e1.py" scrolling="no" frameborder="no" allowtransparency="true" allowfullscreen="true" sandbox="allow-forms allow-pointer-lock allow-popups allow-same-origin allow-scripts allow-modals"></iframe>
<a class="my-2 mx-4 btn btn-info" href="https://replit.com/@nuevofoundation/LinearRegression-ConsoleApp#src/03-e1.py" target="_blank">Launch Replit</a>

Why is this happening? Why are the **intercept** and **coefficient** values different every time? Why is the original sample line (i.e: green line) very close to the blue line (i.e: new sample line).

Expand All @@ -64,8 +64,7 @@ We can use the [StatsModels](https://www.statsmodels.org/stable/index.html) libr
|:--:|
|Summary of Statsmodel run|


<iframe height="500px" width="100%" src="https://replit.com/@nuevofoundation/LinearRegression-ConsoleApp#src/03-e2.py" scrolling="no" frameborder="no" allowtransparency="true" allowfullscreen="true" sandbox="allow-forms allow-pointer-lock allow-popups allow-same-origin allow-scripts allow-modals"></iframe>
<a class="my-2 mx-4 btn btn-info" href="https://replit.com/@nuevofoundation/LinearRegression-ConsoleApp#src/03-e2.py" target="_blank">Launch Replit</a>

Within the table lets focus on the standard error which is the value labeled **stderr**. In this case, is **409.40**. This means is that, for any random sample set, the **coefficient** or **slope** of our line will vary by 409.40 or, in other words:

Expand Down Expand Up @@ -93,7 +92,7 @@ As you can see, it is very unlikely to see a very short person or a very tall pe

If you run the Replit below you will see how the generated histogram resembles the bell curve. The program creates a [histogram](https://corporatefinanceinstitute.com/resources/excel/histogram/) which shows the amount of times a value shows up in our data set. Meaning, several employees having the same salary.

<iframe height="500px" width="100%" src="https://replit.com/@nuevofoundation/LinearRegression-ConsoleApp#src/03-e3.py" scrolling="no" frameborder="no" allowtransparency="true" allowfullscreen="true" sandbox="allow-forms allow-pointer-lock allow-popups allow-same-origin allow-scripts allow-modals"></iframe>
<a class="my-2 mx-4 btn btn-info" href="https://replit.com/@nuevofoundation/LinearRegression-ConsoleApp#src/03-e3.py" target="_blank">Launch Replit</a>

When this happens we can use the standard error and the following equation to say: "We are 95% confident that the value of the coefficient will be in this range". But what is that range? This range is our **confidence interval**.

Expand Down Expand Up @@ -130,4 +129,4 @@ What we are saying with the **coefficient range** above is:

The code below will take a sample of 100 random employees and create histograms to show you how they resemble a bell curve. As you can see the values never lie outside of the coefficient range.

<iframe height="500px" width="100%" src="https://replit.com/@nuevofoundation/LinearRegression-ConsoleApp#src/03-e3.py" scrolling="no" frameborder="no" allowtransparency="true" allowfullscreen="true" sandbox="allow-forms allow-pointer-lock allow-popups allow-same-origin allow-scripts allow-modals"></iframe>
<a class="my-2 mx-4 btn btn-info" href="https://replit.com/@nuevofoundation/LinearRegression-ConsoleApp#src/03-e3.py" target="_blank">Launch Replit</a>
2 changes: 1 addition & 1 deletion content/english/ml-machine-learning/04-model-fit.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ There are many ways to find this but in the world of machine learning and statis

The `Experience_vs_Salary-More_Data` file has over 10,000+ entries where you can see the salary and years of experience of employees of the company you work for. The Replit code below will take the initial sample of 30 employees and find the standard error and R<sup>2</sup>.

<iframe height="500px" width="100%" src="https://replit.com/@nuevofoundation/LinearRegression-ConsoleApp#src/04-e1.py" scrolling="no" frameborder="no" allowtransparency="true" allowfullscreen="true" sandbox="allow-forms allow-pointer-lock allow-popups allow-same-origin allow-scripts allow-modals"></iframe>
<a class="my-2 mx-4 btn btn-info" href="https://replit.com/@nuevofoundation/LinearRegression-ConsoleApp#src/04-e1.py" target="_blank">Launch Replit</a>

As you can see, the value of R<sup>2</sup> is 0.973. Now if anyone asks us if there is any relation in our data, we can say that "we are 97.3% confident that the years of experience of an employee is related to the salary they have".

Expand Down
4 changes: 2 additions & 2 deletions content/english/ml-machine-learning/05-making-predictions.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,10 +20,10 @@ As the number of employees increases so will the dataset and the value of will R

In the Replit below, you can see how the code creates a "training dataset" and a "testing dataset" by splitting the data from the 10,000+ record file and running predictions for both data sets.

<iframe height="500px" width="100%" src="https://replit.com/@nuevofoundation/LinearRegression-ConsoleApp#src/05-e1.py" scrolling="no" frameborder="no" allowtransparency="true" allowfullscreen="true" sandbox="allow-forms allow-pointer-lock allow-popups allow-same-origin allow-scripts allow-modals"></iframe>
<a class="my-2 mx-4 btn btn-info" href="https://replit.com/@nuevofoundation/LinearRegression-ConsoleApp#src/05-e1.py" target="_blank">Launch Replit</a>

As you can see, the prediction line generated in both graphs is very similar for both the training and test datasets. You can also see that the R<sup>2</sup> for both sets is almost identical or sometimes identical.

You can now use the code below and change the `experience` variable to whatever you want, the plot will show the predicted salary based on the experience you add.

<iframe height="500px" width="100%" src="https://replit.com/@nuevofoundation/LinearRegression-ConsoleApp#src/05-e2.py" scrolling="no" frameborder="no" allowtransparency="true" allowfullscreen="true" sandbox="allow-forms allow-pointer-lock allow-popups allow-same-origin allow-scripts allow-modals"></iframe>
<a class="my-2 mx-4 btn btn-info" href="https://replit.com/@nuevofoundation/LinearRegression-ConsoleApp#src/05-e2.py" target="_blank">Launch Replit</a>
4 changes: 4 additions & 0 deletions themes/docdock/layouts/partials/custom-footer.html
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,10 @@
class="footer-pics">
<i aria-label="Nuevo Foundation YouTube" class="fab fa-2x fa-youtube"></i>
</a>
<a target="_blank" rel="noopener noreferrer" href="https://open.spotify.com/playlist/0uQ8AwLs4SIpY4A4G52pTB?si=e30542420a8c40b1"
class="footer-pics">
<i aria-label="Nuevo Foundation Spotify" class="fab fa-2x fa-spotify"></i>
</a>
<a target="_blank" rel="noopener noreferrer" href="https://github.com/NuevoFoundation"
class="footer-pics">
<i aria-label="Nuevo Foundation Github" class="fab fa-2x fa-github"></i>
Expand Down
6 changes: 6 additions & 0 deletions themes/docdock/layouts/partials/header.html
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,12 @@
<i aria-label="Nuevo Foundation YouTube" class="fab fa-youtube"></i>
</a>
</div>
<div class="sc-bZQynM dXgQaH">
<a class="sc-gzVnrw cZJBlR" target="_blank" rel="noopener noreferrer"
href="https://open.spotify.com/playlist/0uQ8AwLs4SIpY4A4G52pTB?si=e30542420a8c40b1">
<i aria-label="Nuevo Foundation Spotify" class="fab fa-spotify"></i>
</a>
</div>
<div class="sc-bZQynM dXgQaH">
<a class="sc-gzVnrw cZJBlR" target="_blank" rel="noopener noreferrer"
href="https://github.com/NuevoFoundation">
Expand Down
Loading