From f8d7e0457bc3ca38565b80672c2b2f6014a2041c Mon Sep 17 00:00:00 2001 From: WinPlay02 Date: Sat, 28 Jan 2023 11:18:37 +0100 Subject: [PATCH] docs: Improve Machine-Learning example (#382) Related to Issue #380 ### Summary of Changes Added Links to other classes Added sub-headings where appropriate Replaced "=" with ":" where appropriate Fixed indent --- .../python/Tutorials/machine_learning.md | 39 +++++++++++++------ 1 file changed, 27 insertions(+), 12 deletions(-) diff --git a/docs/Stdlib/python/Tutorials/machine_learning.md b/docs/Stdlib/python/Tutorials/machine_learning.md index cf2d80061..efbfd72db 100644 --- a/docs/Stdlib/python/Tutorials/machine_learning.md +++ b/docs/Stdlib/python/Tutorials/machine_learning.md @@ -1,8 +1,10 @@ # Machine Learning Tutorial +## Create SupervisedDataset + Here is a short introduction to train and predict with a machine learning model in safe-ds. -First we need to create a SupervisedDataset from the training data. +First we need to create a [SupervisedDataset][safe_ds.data.SupervisedDataset] from the training data. ```python from safe_ds.data import Table, SupervisedDataset @@ -21,28 +23,38 @@ to_be_predicted_table = Table({ sup_dataset = SupervisedDataset(table, target_column="target") ``` -SupervisedDatasets are used in safe-DS to train supervised machine learning models, because they keep track of the target -vector. A SupervisedDataset can be created from a table and specifying the target vector in the table. +[SupervisedDatasets][safe_ds.data.SupervisedDataset] are used in safe-DS to train supervised machine learning models +(e.g. [RandomForest][safe_ds.classification.RandomForest] for classification and +[LinearRegression][safe_ds.regression.LinearRegression] as a regression model), because they keep track of the target +vector. A [SupervisedDataset][safe_ds.data.SupervisedDataset] can be created from a [Table][safe_ds.data.Table] and +specifying the target vector in the table. + +## Create and train model In this code example, we want to predict the sum of a row. The `table` contains the target vector we want to train with (the sum of the rows). The `to_predicted_table` is the table we want to make predictions with, so it does not contain a target vector. -In order to train the `LinearRegression`-model we need to make the following calls in safe-DS. +In order to train the [LinearRegression][safe_ds.regression.LinearRegression]-model we need to make the following calls +in safe-DS: ```python linear_reg_model = LinearRegression() linear_reg_model.fit(sup_dataset) ``` -As we can see, a `LinearRegression`-object is created. +As we can see, a [LinearRegression][safe_ds.regression.LinearRegression]-object is created. In safe-DS machine learning models are separated in different classes where the different fit and predictions methods are implemented for the given machine learning model. -So in order to train a linear regression model we create a `LinearRegression`-object and call then the `.fit()` --method on this object. Now the `linear_reg_model` is a fitted linear regression model, and we can call -the `predict(dataset = SupervisedDataset)`-method on this model. +## Predicting new values + +So in order to train a linear regression model we create a [LinearRegression][safe_ds.regression.LinearRegression]-object +and call then the [`.fit()`][safe_ds.regression._linear_regression.LinearRegression.fit] +-method on this object. Now the `linear_reg_model` is a fitted linear regression model, and we can call the +[`predict(dataset: SupervisedDataset)`][safe_ds.regression._linear_regression.LinearRegression.predict]-method +on this model. ```python prediction = linear_reg_model.predict(dataset=to_be_predicted_table, @@ -50,9 +62,12 @@ prediction = linear_reg_model.predict(dataset=to_be_predicted_table, ``` After we trained the `linear_reg_model`-object we can make predictions with the model. To do this we call the -`predict(dataset = Table, target_name = Optional[str])`-method on the trained model. The `target_name`-parameter -is optional, so you do not need to specify it. If you do not specify the `target_name`, the name of -the `target_vector` in the given `SupervisedDataset`will be used. +[`predict(dataset: Table, target_name: Optional[str])`][safe_ds.regression._linear_regression.LinearRegression.predict]-method +on the trained model. The `target_name`-parameter is optional, so you do not need to specify it. +If you do not specify the `target_name`, the name of the `target_vector` in the given +[SupervisedDataset][safe_ds.data.SupervisedDataset] will be used. + +## Results So for the call above we will get the following output: @@ -65,4 +80,4 @@ So for the call above we will get the following output: | 4 | 7 | 1 | 12.0 | !!! note -Your target-vector may differ from our result. + Your target-vector may differ from our result.