-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Term Entry] Neural Networks ai gradient descent #5785
Changes from 2 commits
f706149
94b6bcf
84811d5
a1d8e43
2547392
e5be106
712a695
9a74c1b
11122a6
d4dfea6
ea4e9e8
8eaf907
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
@@ -0,0 +1,112 @@ | ||||||||||
--- | ||||||||||
Title: 'Gradient Descent' | ||||||||||
Description: 'Gradient Descent is an optimization algorithm used in machine learning and neural networks to minimize a cost function by iteratively moving towards the minimum using the gradient of the function.' | ||||||||||
Subjects: | ||||||||||
- 'Machine Learning' | ||||||||||
- 'Data Science' | ||||||||||
- 'Computer Science' | ||||||||||
Tags: | ||||||||||
- 'AI' | ||||||||||
- 'Machine Learning' | ||||||||||
- 'Neural Networks' | ||||||||||
- 'Functions' | ||||||||||
CatalogContent: | ||||||||||
- 'machine-learning' | ||||||||||
- 'path/data-science' | ||||||||||
--- | ||||||||||
|
||||||||||
**Gradient Descent** is an optimization algorithm commonly used to minimize a cost function in machine learning and neural networks. The goal of gradient descent is to find the optimal parameters (weights) for a model that minimizes the error or loss function. | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||
|
||||||||||
In the context of neural networks, gradient descent adjusts the model’s parameters by computing the gradient (or derivative) of the cost function with respect to each parameter. The algorithm then updates the parameters in the direction that reduces the cost. | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||
|
||||||||||
## Types of Gradient Descent | ||||||||||
|
||||||||||
There are three main types of gradient descent: | ||||||||||
|
||||||||||
| Type | Description | | ||||||||||
| ------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | ||||||||||
| **Batch Gradient Descent** | Uses the entire dataset to compute the gradient and update the weights. Typically slower but more accurate for large datasets. | | ||||||||||
| **Stochastic Gradient Descent (SGD)** | Uses a single sample to compute the gradient and update the weights. Faster, but the updates are noisy and can lead to fluctuations in the convergence path. | | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||
| **Mini-batch Gradient Descent** | A compromise between batch and stochastic gradient descent, using a small batch of samples to compute the gradient. It balances the speed and accuracy of the learning process. | | ||||||||||
|
||||||||||
## Gradient Descent Update Rule | ||||||||||
|
||||||||||
The basic update rule for gradient descent is: | ||||||||||
|
||||||||||
```pseudo | ||||||||||
theta = theta - learning_rate * gradient_of_cost_function | ||||||||||
``` | ||||||||||
|
||||||||||
- `theta`: The parameter (weight) of the model. | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||
- `learning_rate`: A hyperparameter that controls the step size. | ||||||||||
- `gradient_of_cost_function`: The gradient (derivative) of the cost function with respect to the parameters. | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||
|
||||||||||
## Syntax | ||||||||||
|
||||||||||
Here's a basic syntax for Gradient Descent in the context of machine learning, specifically for updating the model parameters (weights) in order to minimize the cost function: | ||||||||||
|
||||||||||
```pseudo | ||||||||||
# Initialize parameters (weights) and learning rate | ||||||||||
theta = initial_value # Parameters (weights) | ||||||||||
learning_rate = value # Learning rate | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||
iterations = number_of_iterations # Number of iterations | ||||||||||
|
||||||||||
# Repeat until convergence | ||||||||||
for i in range(iterations): | ||||||||||
# Calculate the gradient of the cost function | ||||||||||
gradient = compute_gradient(X, y, theta) | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||
|
||||||||||
# Update the parameters (weights) | ||||||||||
theta = theta - learning_rate * gradient # Update rule | ||||||||||
|
||||||||||
# Optionally, compute and store the cost (for monitoring convergence) | ||||||||||
cost = compute_cost(X, y, theta) | ||||||||||
store(cost) | ||||||||||
``` | ||||||||||
|
||||||||||
## Example | ||||||||||
|
||||||||||
In the following example, we implement simple gradient descent to minimize the cost function of a linear regression problem: | ||||||||||
|
||||||||||
```py | ||||||||||
import numpy as np | ||||||||||
|
||||||||||
# Sample data (X: inputs, y: actual outputs) | ||||||||||
X = np.array([1, 2, 3, 4, 5]) | ||||||||||
y = np.array([1, 2, 1.3, 3.75, 2.25]) | ||||||||||
|
||||||||||
# Parameters initialization | ||||||||||
theta = 0.0 # Initial weight | ||||||||||
learning_rate = 0.01 # Step size | ||||||||||
iterations = 1000 # Number of iterations | ||||||||||
|
||||||||||
# Cost function (Mean Squared Error) | ||||||||||
def compute_cost(X, y, theta): | ||||||||||
m = len(y) | ||||||||||
cost = (1/(2*m)) * np.sum((X*theta - y)**2) | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||
return cost | ||||||||||
|
||||||||||
# Gradient Descent function | ||||||||||
def gradient_descent(X, y, theta, learning_rate, iterations): | ||||||||||
m = len(y) | ||||||||||
cost_history = [] | ||||||||||
|
||||||||||
for i in range(iterations): | ||||||||||
gradient = (1/m) * np.sum(X * (X*theta - y)) # Derivative of cost function | ||||||||||
theta = theta - learning_rate * gradient # Update theta | ||||||||||
cost_history.append(compute_cost(X, y, theta)) # Track cost | ||||||||||
return theta, cost_history | ||||||||||
|
||||||||||
# Run Gradient Descent | ||||||||||
theta_optimal, cost_history = gradient_descent(X, y, theta, learning_rate, iterations) | ||||||||||
|
||||||||||
print(f"Optimal Theta: {theta_optimal}") | ||||||||||
``` | ||||||||||
|
||||||||||
The output for the above code will be something like this: | ||||||||||
|
||||||||||
```shell | ||||||||||
Optimal Theta: 0.6390909090909086 | ||||||||||
``` | ||||||||||
|
||||||||||
> **Note**: The optimal `theta` value will be an approximation, as the gradient descent approach iteratively updates the weight to reduce the cost function. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The description should not be more than 160 characters