Skip to content

Commit

Permalink
docs: Improve Machine-Learning example (#382)
Browse files Browse the repository at this point in the history
Related to Issue #380 

### Summary of Changes

Added Links to other classes
Added sub-headings where appropriate
Replaced "=" with ":" where appropriate 
Fixed indent
  • Loading branch information
WinPlay02 authored Jan 28, 2023
1 parent 641a966 commit f8d7e04
Showing 1 changed file with 27 additions and 12 deletions.
39 changes: 27 additions & 12 deletions docs/Stdlib/python/Tutorials/machine_learning.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,10 @@
# Machine Learning Tutorial

## Create SupervisedDataset

Here is a short introduction to train and predict with a machine learning model in safe-ds.

First we need to create a SupervisedDataset from the training data.
First we need to create a [SupervisedDataset][safe_ds.data.SupervisedDataset] from the training data.

```python
from safe_ds.data import Table, SupervisedDataset
Expand All @@ -21,38 +23,51 @@ to_be_predicted_table = Table({
sup_dataset = SupervisedDataset(table, target_column="target")
```

SupervisedDatasets are used in safe-DS to train supervised machine learning models, because they keep track of the target
vector. A SupervisedDataset can be created from a table and specifying the target vector in the table.
[SupervisedDatasets][safe_ds.data.SupervisedDataset] are used in safe-DS to train supervised machine learning models
(e.g. [RandomForest][safe_ds.classification.RandomForest] for classification and
[LinearRegression][safe_ds.regression.LinearRegression] as a regression model), because they keep track of the target
vector. A [SupervisedDataset][safe_ds.data.SupervisedDataset] can be created from a [Table][safe_ds.data.Table] and
specifying the target vector in the table.

## Create and train model

In this code example, we want to predict the sum of a row. The `table` contains the target vector we want to
train with (the sum of the rows). The `to_predicted_table` is the table we want to make predictions with, so it
does not contain a target vector.

In order to train the `LinearRegression`-model we need to make the following calls in safe-DS.
In order to train the [LinearRegression][safe_ds.regression.LinearRegression]-model we need to make the following calls
in safe-DS:

```python
linear_reg_model = LinearRegression()
linear_reg_model.fit(sup_dataset)
```

As we can see, a `LinearRegression`-object is created.
As we can see, a [LinearRegression][safe_ds.regression.LinearRegression]-object is created.

In safe-DS machine learning models are separated in different classes where the different fit and predictions methods
are implemented for the given machine learning model.

So in order to train a linear regression model we create a `LinearRegression`-object and call then the `.fit()`
-method on this object. Now the `linear_reg_model` is a fitted linear regression model, and we can call
the `predict(dataset = SupervisedDataset)`-method on this model.
## Predicting new values

So in order to train a linear regression model we create a [LinearRegression][safe_ds.regression.LinearRegression]-object
and call then the [`.fit()`][safe_ds.regression._linear_regression.LinearRegression.fit]
-method on this object. Now the `linear_reg_model` is a fitted linear regression model, and we can call the
[`predict(dataset: SupervisedDataset)`][safe_ds.regression._linear_regression.LinearRegression.predict]-method
on this model.

```python
prediction = linear_reg_model.predict(dataset=to_be_predicted_table,
target_name="predicted_values")
```

After we trained the `linear_reg_model`-object we can make predictions with the model. To do this we call the
`predict(dataset = Table, target_name = Optional[str])`-method on the trained model. The `target_name`-parameter
is optional, so you do not need to specify it. If you do not specify the `target_name`, the name of
the `target_vector` in the given `SupervisedDataset`will be used.
[`predict(dataset: Table, target_name: Optional[str])`][safe_ds.regression._linear_regression.LinearRegression.predict]-method
on the trained model. The `target_name`-parameter is optional, so you do not need to specify it.
If you do not specify the `target_name`, the name of the `target_vector` in the given
[SupervisedDataset][safe_ds.data.SupervisedDataset] will be used.

## Results

So for the call above we will get the following output:

Expand All @@ -65,4 +80,4 @@ So for the call above we will get the following output:
| 4 | 7 | 1 | 12.0 |

!!! note
Your target-vector may differ from our result.
Your target-vector may differ from our result.

0 comments on commit f8d7e04

Please sign in to comment.