diff --git a/v1/python-sdk/tutorials/automl-with-azureml/forecasting-bike-share/auto-ml-forecasting-bike-share.ipynb b/v1/python-sdk/tutorials/automl-with-azureml/forecasting-bike-share/auto-ml-forecasting-bike-share.ipynb index b4d25b0076..a7f4eb319b 100644 --- a/v1/python-sdk/tutorials/automl-with-azureml/forecasting-bike-share/auto-ml-forecasting-bike-share.ipynb +++ b/v1/python-sdk/tutorials/automl-with-azureml/forecasting-bike-share/auto-ml-forecasting-bike-share.ipynb @@ -1,7 +1,6 @@ { "cells": [ { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ @@ -11,7 +10,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ @@ -19,7 +17,13 @@ ] }, { - "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "!Important! This notebook is outdated and is not supported by the AutoML Team. Please use the supported version ([link](https://github.com/Azure/azureml-examples/tree/main/sdk/python/jobs/automl-standalone-jobs/automl-forecasting-task-bike-share))." + ] + }, + { "cell_type": "markdown", "metadata": {}, "source": [ @@ -37,7 +41,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ @@ -56,7 +59,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ @@ -86,7 +88,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ @@ -103,7 +104,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ @@ -137,7 +137,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ @@ -177,7 +176,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ @@ -201,7 +199,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": { "nteract": { @@ -218,6 +215,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "collapsed": false, "jupyter": { "outputs_hidden": false, "source_hidden": false @@ -237,7 +235,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": { "nteract": { @@ -256,6 +253,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "collapsed": false, "gather": { "logged": 1680247376789 }, @@ -277,7 +275,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ @@ -290,6 +287,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "collapsed": false, "jupyter": { "outputs_hidden": false, "source_hidden": false @@ -316,7 +314,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ @@ -334,7 +331,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ @@ -359,7 +355,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ @@ -378,7 +373,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ @@ -398,7 +392,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ @@ -441,7 +434,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ @@ -467,7 +459,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ @@ -486,7 +477,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ @@ -512,7 +502,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ @@ -556,7 +545,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ @@ -564,7 +552,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ @@ -583,7 +570,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ @@ -606,7 +592,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ @@ -637,7 +622,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ @@ -656,7 +640,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ @@ -673,7 +656,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ @@ -705,7 +687,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ @@ -715,7 +696,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ @@ -747,7 +727,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ @@ -822,7 +801,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.9" + "version": "3.8.10" }, "microsoft": { "ms_spell_check": { diff --git a/v1/python-sdk/tutorials/automl-with-azureml/forecasting-energy-demand/auto-ml-forecasting-energy-demand.ipynb b/v1/python-sdk/tutorials/automl-with-azureml/forecasting-energy-demand/auto-ml-forecasting-energy-demand.ipynb index 4f097661fa..03b4408679 100644 --- a/v1/python-sdk/tutorials/automl-with-azureml/forecasting-energy-demand/auto-ml-forecasting-energy-demand.ipynb +++ b/v1/python-sdk/tutorials/automl-with-azureml/forecasting-energy-demand/auto-ml-forecasting-energy-demand.ipynb @@ -2,22 +2,30 @@ "cells": [ { "cell_type": "markdown", + "metadata": {}, "source": [ "Copyright (c) Microsoft Corporation. All rights reserved.\n", "\n", "Licensed under the MIT License." - ], - "metadata": {} + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/forecasting-energy-demand/auto-ml-forecasting-energy-demand.png)" - ], - "metadata": {} + ] }, { "cell_type": "markdown", + "metadata": {}, + "source": [ + "!Important! This notebook is outdated and is not supported by the AutoML Team. Please use the supported version ([link](https://github.com/Azure/azureml-examples/blob/main/sdk/python/jobs/automl-standalone-jobs/automl-forecasting-task-energy-demand/automl-forecasting-task-energy-demand-advanced-mlflow.ipynb))." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, "source": [ "# Automated Machine Learning\n", "_**Forecasting using the Energy Demand Dataset**_\n", @@ -32,11 +40,11 @@ "Advanced Forecasting\n", "1. [Advanced Training](#advanced_training)\n", "1. [Advanced Results](#advanced_results)" - ], - "metadata": {} + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ "# Introduction\n", "\n", @@ -52,18 +60,20 @@ "1. Generate the forecast and compute the out-of-sample accuracy metrics\n", "1. Configuration and remote run of AutoML for a time-series model with lag and rolling window features\n", "1. Run and explore the forecast with lagging features" - ], - "metadata": {} + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ "# Setup" - ], - "metadata": {} + ] }, { "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "import json\n", "import logging\n", @@ -82,36 +92,36 @@ "from azureml.core import Experiment, Workspace, Dataset\n", "from azureml.train.automl import AutoMLConfig\n", "from datetime import datetime" - ], - "outputs": [], - "execution_count": null, - "metadata": {} + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ "This notebook is compatible with Azure ML SDK version 1.35.0 or later." - ], - "metadata": {} + ] }, { "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")" - ], - "outputs": [], - "execution_count": null, - "metadata": {} + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ "As part of the setup you have already created an Azure ML `Workspace` object. For Automated ML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments." - ], - "metadata": {} + ] }, { "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "ws = Workspace.from_config()\n", "\n", @@ -133,13 +143,11 @@ "pd.set_option(\"display.max_colwidth\", None)\n", "outputDf = pd.DataFrame(data=output, index=[\"\"])\n", "outputDf.T" - ], - "outputs": [], - "execution_count": null, - "metadata": {} + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ "## Create or Attach existing AmlCompute\n", "A compute target is required to execute a remote Automated ML run. \n", @@ -149,11 +157,13 @@ "#### Creation of AmlCompute takes approximately 5 minutes. \n", "If the AmlCompute with that name is already in your workspace this code will skip the creation process.\n", "As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota." - ], - "metadata": {} + ] }, { "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "from azureml.core.compute import ComputeTarget, AmlCompute\n", "from azureml.core.compute_target import ComputeTargetException\n", @@ -172,24 +182,22 @@ " compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, compute_config)\n", "\n", "compute_target.wait_for_completion(show_output=True)" - ], - "outputs": [], - "execution_count": null, - "metadata": {} + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ "# Data\n", "\n", "We will use energy consumption [data from New York City](http://mis.nyiso.com/public/P-58Blist.htm) for model training. The data is stored in a tabular format and includes energy demand and basic weather data at an hourly frequency. \n", "\n", "With Azure Machine Learning datasets you can keep a single copy of data in your storage, easily access data during model training, share data and collaborate with other users. Below, we will upload the datatset and create a [tabular dataset](https://docs.microsoft.com/bs-latn-ba/azure/machine-learning/service/how-to-create-register-datasets#dataset-types) to be used training and prediction." - ], - "metadata": {} + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ "Let's set up what we know about the dataset.\n", "\n", @@ -197,64 +205,66 @@ "Time column is the time axis along which to predict.\n", "\n", "The other columns, \"temp\" and \"precip\", are implicitly designated as features." - ], - "metadata": {} + ] }, { "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "target_column_name = \"demand\"\n", "time_column_name = \"timeStamp\"" - ], - "outputs": [], - "execution_count": null, - "metadata": {} + ] }, { "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "dataset = Dataset.Tabular.from_delimited_files(\n", " path=\"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/nyc_energy.csv\"\n", ").with_timestamp_columns(fine_grain_timestamp=time_column_name)\n", "dataset.take(5).to_pandas_dataframe().reset_index(drop=True)" - ], - "outputs": [], - "execution_count": null, - "metadata": {} + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ "The NYC Energy dataset is missing energy demand values for all datetimes later than August 10th, 2017 5AM. Below, we trim the rows containing these missing values from the end of the dataset." - ], - "metadata": {} + ] }, { "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "# Cut off the end of the dataset due to large number of nan values\n", "dataset = dataset.time_before(datetime(2017, 10, 10, 5))" - ], - "outputs": [], - "execution_count": null, - "metadata": {} + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ "## Split the data into train and test sets" - ], - "metadata": {} + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ "The first split we make is into train and test sets. Note that we are splitting on time. Data before and including August 8th, 2017 5AM will be used for training, and data after will be used for testing." - ], - "metadata": {} + ] }, { "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "# split into train based on time\n", "train = (\n", @@ -263,13 +273,13 @@ " .reset_index(drop=True)\n", ")\n", "train.sort_values(time_column_name).tail(5)" - ], - "outputs": [], - "execution_count": null, - "metadata": {} + ] }, { "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "# split into test based on time\n", "test = (\n", @@ -278,13 +288,23 @@ " .reset_index(drop=True)\n", ")\n", "test.head(5)" - ], - "outputs": [], - "execution_count": null, - "metadata": {} + ] }, { "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": false, + "jupyter": { + "outputs_hidden": false + }, + "nteract": { + "transient": { + "deleting": false + } + } + }, + "outputs": [], "source": [ "# register the splitted train and test data in workspace storage\n", "from azureml.data.dataset_factory import TabularDatasetFactory\n", @@ -296,23 +316,11 @@ "test_dataset = TabularDatasetFactory.register_pandas_dataframe(\n", " test, target=(datastore, \"dataset/\"), name=\"nyc_energy_test\"\n", ")" - ], - "outputs": [], - "execution_count": null, - "metadata": { - "jupyter": { - "source_hidden": false, - "outputs_hidden": false - }, - "nteract": { - "transient": { - "deleting": false - } - } - } + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ "### Setting the maximum forecast horizon\n", "\n", @@ -321,20 +329,20 @@ "Learn more about forecast horizons in our [Auto-train a time-series forecast model](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-auto-train-forecast#configure-and-run-experiment) guide.\n", "\n", "In this example, we set the horizon to 48 hours." - ], - "metadata": {} + ] }, { "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "forecast_horizon = 48" - ], - "outputs": [], - "execution_count": null, - "metadata": {} + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ "## Forecasting Parameters\n", "To define forecasting parameters for your experiment training, you can leverage the ForecastingParameters class. The table below details the forecasting parameter we will be passing into our experiment.\n", @@ -345,11 +353,11 @@ "|**forecast_horizon**|The forecast horizon is how many periods forward you would like to forecast. This integer horizon is in units of the timeseries frequency (e.g. daily, weekly).|\n", "|**freq**|Forecast frequency. This optional parameter represents the period with which the forecast is desired, for example, daily, weekly, yearly, etc. Use this parameter for the correction of time series containing irregular data points or for padding of short time series. The frequency needs to be a pandas offset alias. Please refer to [pandas documentation](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#dateoffset-objects) for more information.\n", "|**cv_step_size**|Number of periods between two consecutive cross-validation folds. The default value is \"auto\", in which case AutoMl determines the cross-validation step size automatically, if a validation set is not provided. Or users could specify an integer value." - ], - "metadata": {} + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ "# Train\n", "\n", @@ -367,18 +375,20 @@ "|**n_cross_validations**|Number of cross-validation folds to use for model/pipeline selection. The default value is \"auto\", in which case AutoMl determines the number of cross-validations automatically, if a validation set is not provided. Or users could specify an integer value.\n", "|**enable_early_stopping**|Flag to enble early termination if the score is not improving in the short term.|\n", "|**forecasting_parameters**|A class holds all the forecasting related parameters.|\n" - ], - "metadata": {} + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ "This notebook uses the blocked_models parameter to exclude some models that take a longer time to train on this dataset. You can choose to remove models from the blocked_models list but you may need to increase the experiment_timeout_hours parameter value to get results." - ], - "metadata": {} + ] }, { "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "from azureml.automl.core.forecasting_parameters import ForecastingParameters\n", "\n", @@ -402,65 +412,65 @@ " verbosity=logging.INFO,\n", " forecasting_parameters=forecasting_parameters,\n", ")" - ], - "outputs": [], - "execution_count": null, - "metadata": {} + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ "Call the `submit` method on the experiment object and pass the run configuration. Depending on the data and the number of iterations this can run for a while.\n", "One may specify `show_output = True` to print currently running iterations to the console." - ], - "metadata": {} + ] }, { "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "remote_run = experiment.submit(automl_config, show_output=False)" - ], - "outputs": [], - "execution_count": null, - "metadata": {} + ] }, { "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "remote_run.wait_for_completion()" - ], - "outputs": [], - "execution_count": null, - "metadata": {} + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ "## Retrieve the Best Run details\n", "Below we retrieve the best Run object from among all the runs in the experiment." - ], - "metadata": {} + ] }, { "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "best_run = remote_run.get_best_child()\n", "best_run" - ], - "outputs": [], - "execution_count": null, - "metadata": {} + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ "## Featurization\n", "We can look at the engineered feature names generated in time-series featurization via. the JSON file named 'engineered_feature_names.json' under the run outputs." - ], - "metadata": {} + ] }, { "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "# Download the JSON file locally\n", "best_run.download_file(\n", @@ -470,13 +480,11 @@ " records = json.load(f)\n", "\n", "records" - ], - "outputs": [], - "execution_count": null, - "metadata": {} + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ "### View featurization summary\n", "You can also see what featurization steps were performed on different raw features in the user data. For each raw feature in the user data, the following information is displayed:\n", @@ -486,11 +494,13 @@ "+ Type detected\n", "+ If feature was dropped\n", "+ List of feature transformations for the raw feature" - ], - "metadata": {} + ] }, { "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "# Download the featurization summary JSON file locally\n", "best_run.download_file(\n", @@ -512,41 +522,41 @@ " \"Transformations\",\n", " ]\n", "]" - ], - "outputs": [], - "execution_count": null, - "metadata": {} + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ "# Forecasting\n", "\n", "Now that we have retrieved the best pipeline/model, it can be used to make predictions on test data. We will do batch scoring on the test dataset which should have the same schema as training dataset.\n", "\n", "The inference will run on a remote compute. In this example, it will re-use the training compute." - ], - "metadata": {} + ] }, { "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "test_experiment = Experiment(ws, experiment_name + \"_inference\")" - ], - "outputs": [], - "execution_count": null, - "metadata": {} + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ "### Retrieving forecasts from the model\n", "We have created a function called `run_forecast` that submits the test data to the best model determined during the training run and retrieves forecasts. This function uses a helper script `forecasting_script` which is uploaded and expecuted on the remote compute." - ], - "metadata": {} + ] }, { "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "from run_forecast import run_remote_inference\n", "\n", @@ -561,32 +571,32 @@ "\n", "# download the inference output file to the local machine\n", "remote_run_infer.download_file(\"outputs/predictions.csv\", \"predictions.csv\")" - ], - "outputs": [], - "execution_count": null, - "metadata": {} + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ "### Evaluate\n", "To evaluate the accuracy of the forecast, we'll compare against the actual sales quantities for some select metrics, included the mean absolute percentage error (MAPE). For more metrics that can be used for evaluation after training, please see [supported metrics](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-understand-automated-ml#regressionforecasting-metrics), and [how to calculate residuals](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-understand-automated-ml#residuals)." - ], - "metadata": {} + ] }, { "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "# load forecast data frame\n", "fcst_df = pd.read_csv(\"predictions.csv\", parse_dates=[time_column_name])\n", "fcst_df.head()" - ], - "outputs": [], - "execution_count": null, - "metadata": {} + ] }, { "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "from azureml.automl.core.shared import constants\n", "from azureml.automl.runtime.shared.score import scoring\n", @@ -613,31 +623,31 @@ " (test_pred, test_test), (\"prediction\", \"truth\"), loc=\"upper left\", fontsize=8\n", ")\n", "plt.show()" - ], - "outputs": [], - "execution_count": null, - "metadata": {} + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ "# Advanced Training \n", "We did not use lags in the previous model specification. In effect, the prediction was the result of a simple regression on date, time series identifier columns and any additional features. This is often a very good prediction as common time series patterns like seasonality and trends can be captured in this manner. Such simple regression is horizon-less: it doesn't matter how far into the future we are predicting, because we are not using past data. In the previous example, the horizon was only used to split the data for cross-validation." - ], - "metadata": {} + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ "### Using lags and rolling window features\n", "Now we will configure the target lags, that is the previous values of the target variables, meaning the prediction is no longer horizon-less. We therefore must still specify the `forecast_horizon` that the model will learn to forecast. The `target_lags` keyword specifies how far back we will construct the lags of the target variable, and the `target_rolling_window_size` specifies the size of the rolling window over which we will generate the `max`, `min` and `sum` features.\n", "\n", "This notebook uses the blocked_models parameter to exclude some models that take a longer time to train on this dataset. You can choose to remove models from the blocked_models list but you may need to increase the iteration_timeout_minutes parameter value to get results." - ], - "metadata": {} + ] }, { "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "advanced_forecasting_parameters = ForecastingParameters(\n", " time_column_name=time_column_name,\n", @@ -668,63 +678,63 @@ " verbosity=logging.INFO,\n", " forecasting_parameters=advanced_forecasting_parameters,\n", ")" - ], - "outputs": [], - "execution_count": null, - "metadata": {} + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ "We now start a new remote run, this time with lag and rolling window featurization. AutoML applies featurizations in the setup stage, prior to iterating over ML models. The full training set is featurized first, followed by featurization of each of the CV splits. Lag and rolling window features introduce additional complexity, so the run will take longer than in the previous example that lacked these featurizations." - ], - "metadata": {} + ] }, { "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "advanced_remote_run = experiment.submit(automl_config, show_output=False)" - ], - "outputs": [], - "execution_count": null, - "metadata": {} + ] }, { "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "advanced_remote_run.wait_for_completion()" - ], - "outputs": [], - "execution_count": null, - "metadata": {} + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ "### Retrieve the Best Run details" - ], - "metadata": {} + ] }, { "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "best_run_lags = remote_run.get_best_child()\n", "best_run_lags" - ], - "outputs": [], - "execution_count": null, - "metadata": {} + ] }, { "cell_type": "markdown", + "metadata": {}, "source": [ "# Advanced Results\n", "We did not use lags in the previous model specification. In effect, the prediction was the result of a simple regression on date, time series identifier columns and any additional features. This is often a very good prediction as common time series patterns like seasonality and trends can be captured in this manner. Such simple regression is horizon-less: it doesn't matter how far into the future we are predicting, because we are not using past data. In the previous example, the horizon was only used to split the data for cross-validation." - ], - "metadata": {} + ] }, { "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "test_experiment_advanced = Experiment(ws, experiment_name + \"_inference_advanced\")\n", "advanced_remote_run_infer = run_remote_inference(\n", @@ -741,23 +751,23 @@ "advanced_remote_run_infer.download_file(\n", " \"outputs/predictions.csv\", \"predictions_advanced.csv\"\n", ")" - ], - "outputs": [], - "execution_count": null, - "metadata": {} + ] }, { "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "fcst_adv_df = pd.read_csv(\"predictions_advanced.csv\", parse_dates=[time_column_name])\n", "fcst_adv_df.head()" - ], - "outputs": [], - "execution_count": null, - "metadata": {} + ] }, { "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "from azureml.automl.core.shared import constants\n", "from azureml.automl.runtime.shared.score import scoring\n", @@ -786,10 +796,7 @@ " (test_pred, test_test), (\"prediction\", \"truth\"), loc=\"upper left\", fontsize=8\n", ")\n", "plt.show()" - ], - "outputs": [], - "execution_count": null, - "metadata": {} + ] } ], "metadata": { @@ -802,40 +809,40 @@ "how-to-use-azureml", "automated-machine-learning" ], + "kernel_info": { + "name": "python3" + }, "kernelspec": { - "name": "python3", + "display_name": "Python 3.8 - AzureML", "language": "python", - "display_name": "Python 3 (ipykernel)" + "name": "python38-azureml" }, "language_info": { - "name": "python", - "version": "3.8.5", - "mimetype": "text/x-python", "codemirror_mode": { "name": "ipython", "version": 3 }, - "pygments_lexer": "ipython3", + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", "nbconvert_exporter": "python", - "file_extension": ".py" - }, - "vscode": { - "interpreter": { - "hash": "6bd77c88278e012ef31757c15997a7bea8c943977c43d6909403c00ae11d43ca" - } + "pygments_lexer": "ipython3", + "version": "3.8.10" }, "microsoft": { "ms_spell_check": { "ms_spell_check_language": "en" } }, - "kernel_info": { - "name": "python3" - }, "nteract": { "version": "nteract-front-end@1.0.0" + }, + "vscode": { + "interpreter": { + "hash": "6bd77c88278e012ef31757c15997a7bea8c943977c43d6909403c00ae11d43ca" + } } }, "nbformat": 4, - "nbformat_minor": 2 -} \ No newline at end of file + "nbformat_minor": 4 +} diff --git a/v1/python-sdk/tutorials/automl-with-azureml/forecasting-github-dau/auto-ml-forecasting-github-dau.ipynb b/v1/python-sdk/tutorials/automl-with-azureml/forecasting-github-dau/auto-ml-forecasting-github-dau.ipynb index 5d858f0729..7bc17c7f5e 100644 --- a/v1/python-sdk/tutorials/automl-with-azureml/forecasting-github-dau/auto-ml-forecasting-github-dau.ipynb +++ b/v1/python-sdk/tutorials/automl-with-azureml/forecasting-github-dau/auto-ml-forecasting-github-dau.ipynb @@ -22,6 +22,13 @@ "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/forecasting-beer-remote/auto-ml-forecasting-beer-remote.png)" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "!Important! This notebook is outdated and is not supported by the AutoML Team. Please use the supported version ([link](https://github.com/Azure/azureml-examples/tree/main/sdk/python/jobs/automl-standalone-jobs/automl-forecasting-github-dau))." + ] + }, { "cell_type": "markdown", "metadata": { @@ -695,7 +702,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.6.9" + "version": "3.8.10" } }, "nbformat": 4, diff --git a/v1/python-sdk/tutorials/automl-with-azureml/forecasting-hierarchical-timeseries/auto-ml-forecasting-hierarchical-timeseries.ipynb b/v1/python-sdk/tutorials/automl-with-azureml/forecasting-hierarchical-timeseries/auto-ml-forecasting-hierarchical-timeseries.ipynb index 1e65c10331..6bace379bf 100644 --- a/v1/python-sdk/tutorials/automl-with-azureml/forecasting-hierarchical-timeseries/auto-ml-forecasting-hierarchical-timeseries.ipynb +++ b/v1/python-sdk/tutorials/automl-with-azureml/forecasting-hierarchical-timeseries/auto-ml-forecasting-hierarchical-timeseries.ipynb @@ -16,6 +16,13 @@ "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/forecasting-hierarchical-timeseries/auto-ml-forecasting-hierarchical-timeseries.png)" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "!Important! This notebook is outdated and is not supported by the AutoML Team. Please use the supported version ([link](https://github.com/Azure/azureml-examples/tree/main/sdk/python/jobs/pipelines/1k_demand_forecasting_with_pipeline_components/automl-forecasting-demand-hierarchical-timeseries-in-pipeline))." + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -666,7 +673,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.6.8" + "version": "3.8.10" } }, "nbformat": 4, diff --git a/v1/python-sdk/tutorials/automl-with-azureml/forecasting-many-models/auto-ml-forecasting-many-models.ipynb b/v1/python-sdk/tutorials/automl-with-azureml/forecasting-many-models/auto-ml-forecasting-many-models.ipynb index ef122603a7..aab5043cb9 100644 --- a/v1/python-sdk/tutorials/automl-with-azureml/forecasting-many-models/auto-ml-forecasting-many-models.ipynb +++ b/v1/python-sdk/tutorials/automl-with-azureml/forecasting-many-models/auto-ml-forecasting-many-models.ipynb @@ -16,6 +16,13 @@ "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/forecasting-hierarchical-timeseries/auto-ml-forecasting-hierarchical-timeseries.png)" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "!Important! This notebook is outdated and is not supported by the AutoML Team. Please use the supported version ([link](https://github.com/Azure/azureml-examples/tree/main/sdk/python/jobs/pipelines/1k_demand_forecasting_with_pipeline_components/automl-forecasting-demand-many-models-in-pipeline))." + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -306,7 +313,7 @@ "from azureml.core.compute import ComputeTarget, AmlCompute\n", "\n", "# Name your cluster\n", - "compute_name = \"mm-compute\"\n", + "compute_name = \"mm-compute-v1\"\n", "\n", "\n", "if compute_name in ws.compute_targets:\n", @@ -316,7 +323,7 @@ "else:\n", " print(\"Creating a new compute target...\")\n", " provisioning_config = AmlCompute.provisioning_configuration(\n", - " vm_size=\"STANDARD_D16S_V3\", max_nodes=20\n", + " vm_size=\"STANDARD_D14_V2\", max_nodes=20\n", " )\n", " # Create the compute target\n", " compute_target = ComputeTarget.create(ws, compute_name, provisioning_config)\n", @@ -864,9 +871,9 @@ "automated-machine-learning" ], "kernelspec": { - "display_name": "Python 3.8.5 ('base')", + "display_name": "Python 3.8 - AzureML", "language": "python", - "name": "python3" + "name": "python38-azureml" }, "language_info": { "codemirror_mode": { @@ -878,7 +885,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.8.5" + "version": "3.8.10" }, "vscode": { "interpreter": { diff --git a/v1/python-sdk/tutorials/automl-with-azureml/forecasting-orange-juice-sales/auto-ml-forecasting-orange-juice-sales.ipynb b/v1/python-sdk/tutorials/automl-with-azureml/forecasting-orange-juice-sales/auto-ml-forecasting-orange-juice-sales.ipynb index d770e2434c..0b1a7242cf 100644 --- a/v1/python-sdk/tutorials/automl-with-azureml/forecasting-orange-juice-sales/auto-ml-forecasting-orange-juice-sales.ipynb +++ b/v1/python-sdk/tutorials/automl-with-azureml/forecasting-orange-juice-sales/auto-ml-forecasting-orange-juice-sales.ipynb @@ -1,7 +1,6 @@ { "cells": [ { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ @@ -11,7 +10,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ @@ -19,7 +17,13 @@ ] }, { - "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "!Important! This notebook is outdated and is not supported by the AutoML Team. Please use the supported version ([link](https://github.com/Azure/azureml-examples/tree/main/sdk/python/jobs/automl-standalone-jobs/automl-forecasting-orange-juice-sales))." + ] + }, + { "cell_type": "markdown", "metadata": {}, "source": [ @@ -37,7 +41,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ @@ -50,7 +53,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ @@ -75,7 +77,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ @@ -92,7 +93,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ @@ -126,7 +126,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ @@ -166,7 +165,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ @@ -190,7 +188,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ @@ -211,7 +208,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ @@ -231,7 +227,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ @@ -264,7 +259,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ @@ -290,7 +284,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ @@ -307,7 +300,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ @@ -335,7 +327,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ @@ -374,7 +365,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ @@ -392,7 +382,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ @@ -466,7 +455,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ @@ -493,7 +481,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ @@ -513,7 +500,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ @@ -551,7 +537,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ @@ -572,7 +557,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ @@ -581,7 +565,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ @@ -610,7 +593,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ @@ -666,7 +648,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ @@ -674,7 +655,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ @@ -697,7 +677,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ @@ -717,7 +696,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ @@ -763,7 +741,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ @@ -812,7 +789,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ @@ -854,9 +830,9 @@ "friendly_name": "Forecasting orange juice sales with deployment", "index_order": 1, "kernelspec": { - "display_name": "Python 3.8.5 ('base')", + "display_name": "Python 3.8 - AzureML", "language": "python", - "name": "python3" + "name": "python38-azureml" }, "language_info": { "codemirror_mode": { @@ -868,7 +844,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.8.5" + "version": "3.8.10" }, "tags": [ "None" diff --git a/v1/python-sdk/tutorials/automl-with-azureml/forecasting-pipelines/auto-ml-forecasting-pipelines.ipynb b/v1/python-sdk/tutorials/automl-with-azureml/forecasting-pipelines/auto-ml-forecasting-pipelines.ipynb index 6277b7f013..90ff8bc552 100644 --- a/v1/python-sdk/tutorials/automl-with-azureml/forecasting-pipelines/auto-ml-forecasting-pipelines.ipynb +++ b/v1/python-sdk/tutorials/automl-with-azureml/forecasting-pipelines/auto-ml-forecasting-pipelines.ipynb @@ -1,5 +1,21 @@ { "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "!Important! This notebook is outdated and is not supported by the AutoML Team. Please use the supported version ([link](https://github.com/Azure/azureml-examples/tree/main/sdk/python/jobs/pipelines/1h_automl_in_pipeline/automl-forecasting-in-pipeline)).\n", + "\n", + "\n", + "\n", + "For examples illustrating how to build pipelines with components, please use the following links:\n", + "
Conclusion
\n", - "\n", - "Visual examination does not suggest clear seasonal patterns. We will set the STL_TYPE = None, and we will move to the next section that examines stationarity. \n", - "\n", - "\n", - "Say, we are working with a different data set that shows clear patterns of seasonality, we have several options for setting the settings:is hard to say which option will work best in your case, hence you will need to run both options to see which one results in more accurate forecasts. \n", - "Conclusion
\n", - "Since we found the original process to be non-stationary (contains unit root), we will have to model the data in first differences. As a result, we will set the DIFFERENCE_SERIES parameter to True." - ], - "metadata": {} - }, - { - "cell_type": "markdown", - "source": [ - "# 3 Check if there is a clear auto-regressive pattern\n", - "We need to determine if we should include lags of the target variable as features in order to improve forecast accuracy. To do this, we will examine the ACF and partial ACF (PACF) plots of the stationary series. In our case, it is a series in first differences.\n", - "\n", - "Conclusion
\n", - "Since we do not see a clear indication of an AR(p) process, we will not be using target lags and will set the TARGET_LAGS parameter to None." - ], - "metadata": {} - }, - { - "cell_type": "markdown", - "source": [ - "AutoML Experiment Settings
\n", - "Based on the analysis performed, we should try the following settings for the AutoML experiment and use them in the \"2_run_experiment\" notebook.\n", - "Conclusion
\n", + "\n", + "Visual examination does not suggest clear seasonal patterns. We will set the STL_TYPE = None, and we will move to the next section that examines stationarity. \n", + "\n", + "\n", + "Say, we are working with a different data set that shows clear patterns of seasonality, we have several options for setting the settings:is hard to say which option will work best in your case, hence you will need to run both options to see which one results in more accurate forecasts. \n", + "Conclusion
\n", + "Since we found the original process to be non-stationary (contains unit root), we will have to model the data in first differences. As a result, we will set the DIFFERENCE_SERIES parameter to True." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# 3 Check if there is a clear auto-regressive pattern\n", + "We need to determine if we should include lags of the target variable as features in order to improve forecast accuracy. To do this, we will examine the ACF and partial ACF (PACF) plots of the stationary series. In our case, it is a series in first differences.\n", + "\n", + "Conclusion
\n", + "Since we do not see a clear indication of an AR(p) process, we will not be using target lags and will set the TARGET_LAGS parameter to None." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "AutoML Experiment Settings
\n", + "Based on the analysis performed, we should try the following settings for the AutoML experiment and use them in the \"2_run_experiment\" notebook.\n", + "