Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add preview label to HTS and MM notebooks and update data sources #2490

Merged
merged 4 commits into from
Jul 26, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,14 @@
"metadata": {},
"source": [
"# Automated Machine Learning\n",
"**Demand Forecasting Using HTS**\n",
"\n",
"\n",
"## Demand Forecasting Using HTS (preview)\n",
"\n",
"> [!IMPORTANT]\n",
"> Items marked (preview) in this article are currently in public preview.\n",
"> The preview version is provided without a service level agreement, and it's not recommended for production workloads. Certain features might not be supported or might have constrained capabilities.\n",
"> For more information, see [Supplemental Terms of Use for Microsoft Azure Previews](https://azure.microsoft.com/support/legal/preview-supplemental-terms/).\n",
"\n",
"## Contents\n",
"1. [Introduction](#Introduction)\n",
Expand All @@ -33,7 +40,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## <pre> 1. Introduction <a id=\"Introduction\"></pre>\n",
"## 1. Introduction <a id=\"Introduction\">\n",
"\n",
"The objective of this notebook is to illustrate how to use the component-based AutoML hierarchical time series solution for demand forecasting tasks. It walks you through all stages of model evaluation and production process starting with data ingestion and concluding with batch endpoint deployment for production. Please see the following [link](placeholder) for a detailed description of the hierarchical time series modeling.\n",
"\n",
Expand All @@ -48,7 +55,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## <pre> 2. Setup <a id=\"Setup\"></pre>"
"## 2. Setup <a id=\"Setup\">"
]
},
{
Expand Down Expand Up @@ -159,7 +166,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## <pre> 3. Compute <a id=\"Compute\"></pre>\n",
"## 3. Compute <a id=\"Compute\">\n",
"\n",
"#### Create or Attach existing AmlCompute\n",
"\n",
Expand Down Expand Up @@ -205,9 +212,22 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## <pre> 4. Data <a id=\"Data\"></pre>\n",
"## 4. Data <a id=\"Data\">\n",
"\n",
"For illustration purposes we use the UCI electricity data ([link](https://archive.ics.uci.edu/ml/datasets/ElectricityLoadDiagrams20112014#)). The original dataset contains electricity consumption data for 370 consumers measured at 15 minute intervals. In the data set for this demonstrations, we have aggregated to an hourly frequency and convereted to the kilowatt hours (kWh) for 10 customers. Each customer is assigned to one of the two groups as denoted by the entries in the `group_id` column. \n",
"\n",
"For illustration purposes we use the UCI electricity data ([link](https://archive.ics.uci.edu/ml/datasets/ElectricityLoadDiagrams20112014#)). The original dataset contains electricity consumption data for 370 consumers measured at 15 minute intervals. In the data set for this demonstrations, we have aggregated to an hourly frequency and convereted to the kilowatt hours (kWh) for 10 customers. Each customer is assigned to one of the two groups as denoted by the entries in the `group_id` column. The following cells read and print the first few rows of the training data as well as print the number of unique time series in the dataset."
"The data for this notebook is located in the `automl-sample-notebook-data` container in the datastore and is publicly available. In the next few cells, we will download the train, test and inference datasets from the public datastore and store them locally."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"train_data_path = \"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/uci-demand-pipeline-data-hts/train/uci_electro_small_hts_train.parquet\"\n",
"test_data_path = \"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/uci-demand-pipeline-data-hts/test/uci_electro_small_hts_test.parquet\"\n",
"inference_data_path = \"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/uci-demand-pipeline-data-hts/inference/uci_electro_small_hts_inference.parquet\""
]
},
{
Expand All @@ -221,16 +241,41 @@
"hierarchy_column_names = [\"group_id\", \"customer_id\"]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def create_folder_and_save_as_parquet(file_uri, output_folder):\n",
" os.makedirs(output_folder, exist_ok=True)\n",
" data_frame = pd.read_parquet(file_uri)\n",
" file_name = os.path.split(file_uri)[-1]\n",
" data_path = os.path.join(output_folder, file_name)\n",
" data_frame.to_parquet(data_path, index=False)\n",
" return None\n",
"\n",
"\n",
"create_folder_and_save_as_parquet(train_data_path, \"./data/train\")\n",
"create_folder_and_save_as_parquet(test_data_path, \"./data/test\")\n",
"create_folder_and_save_as_parquet(inference_data_path, \"./data/inference\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The following cells read and print the first few rows of the training data as well as the number of unique time series in the dataset."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"dataset_type = \"train\"\n",
"df = pd.read_parquet(\n",
" f\"./data/{dataset_type}/uci_electro_small_mm_{dataset_type}.parquet\"\n",
")\n",
"df = pd.read_parquet(f\"./data/{dataset_type}\")\n",
"df.head(3)"
]
},
Expand All @@ -257,7 +302,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## <pre> 5. Import Components From Registry <a id=\"ImportComponents\"></pre>\n",
"## 5. Import Components From Registry <a id=\"ImportComponents\">\n",
"\n",
"An Azure Machine Learning component is a self-contained piece of code that does one step in a machine learning pipeline. A component is analogous to a function - it has a name, inputs, outputs, and a body. Components are the building blocks of the Azure Machine Learning pipelines. It's a good engineering practice to build a machine learning pipeline where each step has well-defined inputs and outputs. In Azure Machine Learning, a component represents one reusable step in a pipeline. Components are designed to help improve the productivity of pipeline building. Specifically, components offer:\n",
"- Well-defined interface: Components require a well-defined interface (input and output). The interface allows the user to build steps and connect steps easily. The interface also hides the complex logic of a step and removes the burden of understanding how the step is implemented.\n",
Expand Down Expand Up @@ -413,7 +458,7 @@
"tags": []
},
"source": [
"## <pre> 6. Create a Pipeline <a id=\"CreatePipeline\"></pre>\n",
"## 6. Create a Pipeline <a id=\"CreatePipeline\">\n",
"\n",
"Now that we imported the components we will build an evaluation pipeline. This pipeline will allow us to partition the data, train best models for each partition, genererate rolling forecasts on the test set, and, finally, calculate metrics on the test set output."
]
Expand Down Expand Up @@ -635,7 +680,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## <pre> 7. Kick Off Pipeline Runs <a id=\"PipelineRuns\"></pre>\n",
"## 7. Kick Off Pipeline Runs <a id=\"PipelineRuns\">\n",
"\n",
"Now that the pipeline is defined, we will use it to kick off several runs. First, we will kick off an experiment which will train, inference and evaluate the performance for the best AutoML model for each `hierarchy_training_level`. Next, we will kick off the same pipeline which will only use the naive model for the same training level of the hierarchy. This will allow us to establish a baseline and compare performance results."
]
Expand Down Expand Up @@ -810,7 +855,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## <pre> 8. Download Pipeline Output <a id=\"DownloadOutput\"></pre>\n",
"## 8. Download Pipeline Output <a id=\"DownloadOutput\">\n",
"Next, we will download the output files generated by the compute metrics components for each executed pipeline and save them in the corresponfing subfolder of the `output` folder. First, we create corresponding output directories. Then, we execute the `ml_client.jobs.download` command which downloads experiments' outputs."
]
},
Expand Down Expand Up @@ -878,7 +923,7 @@
}
},
"source": [
"## <pre> 9. Compare Evaluation Results <a id=\"CompareResults\"></pre>\n",
"## 9. Compare Evaluation Results <a id=\"CompareResults\">\n",
"\n",
"### 9.1. Examine Metrics\n",
"\n",
Expand Down Expand Up @@ -1090,7 +1135,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## <pre> 10. Deployment <a id=\"Deployment\"></pre>\n",
"## 10. Deployment <a id=\"Deployment\">\n",
"\n",
"In this section, we will illustrate how to deploy and inference models using batch endpoint. Batch endpoints are endpoints that are used to do batch inferencing on large volumes of data in asynchronous way. Batch endpoints receive pointers to data and run jobs asynchronously to process the data in parallel on compute clusters and store outputs to a datastore for further analysis. For more information on batch endpoints see this [link](https://learn.microsoft.com/en-us/azure/machine-learning/concept-endpoints-batch?view=azureml-api-2).\n",
"\n",
Expand Down
Binary file not shown.
Binary file not shown.
Binary file not shown.
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,13 @@
"metadata": {},
"source": [
"# Automated Machine Learning\n",
"**Demand Forecasting Using Many Models**\n",
"\n",
"## Demand Forecasting Using Many Models (preview)\n",
"\n",
"> [!IMPORTANT]\n",
"> Items marked (preview) in this article are currently in public preview.\n",
"> The preview version is provided without a service level agreement, and it's not recommended for production workloads. Certain features might not be supported or might have constrained capabilities.\n",
"> For more information, see [Supplemental Terms of Use for Microsoft Azure Previews](https://azure.microsoft.com/support/legal/preview-supplemental-terms/).\n",
"\n",
"## Contents\n",
"1. [Introduction](#Introduction)\n",
Expand All @@ -33,7 +39,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## <pre>1. Introduction <a id=\"Introduction\"></pre>\n",
"## 1. Introduction <a id=\"Introduction\">\n",
"\n",
"The objective of this notebook is to illustrate how to use the component-based AutoML many models solution for demand forecasting tasks. It walks you through all stages of model evaluation and production process starting with data ingestion and concluding with batch endpoint deployment for production.\n",
"\n",
Expand All @@ -48,7 +54,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## <pre> 2. Setup <a id=\"Setup\"></pre>"
"## 2. Setup <a id=\"Setup\">"
]
},
{
Expand Down Expand Up @@ -159,7 +165,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## <pre> 3. Compute <a id=\"Compute\">\n",
"## 3. Compute <a id=\"Compute\">\n",
"\n",
"#### Create or Attach existing AmlCompute\n",
"\n",
Expand Down Expand Up @@ -205,7 +211,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## <pre> 4. Data <a id=\"Data\"></pre>\n",
"## 4. Data <a id=\"Data\">\n",
"\n",
"For illustration purposes we use the UCI electricity data ([link](https://archive.ics.uci.edu/ml/datasets/ElectricityLoadDiagrams20112014#)). The original dataset contains electricity consumption data for 370 consumers measured at 15 minute intervals. In the data set for this demonstrations, we have aggregated to an hourly frequency and converted to the kilowatt hours (kWh) for 10 customers.\n",
"\n",
Expand All @@ -218,9 +224,9 @@
"metadata": {},
"outputs": [],
"source": [
"train_data_path = \"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/uci_electro_small_public_mm_train/uci_electro_small_mm_train.csv\"\n",
"test_data_path = \"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/uci_electro_small_public_mm_test/uci_electro_small_mm_test.csv\"\n",
"inference_data_path = \"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/uci_electro_small_public_mm_infer/uci_electro_small_mm_inference.csv\""
"train_data_path = \"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/uci-demand-pipeline-data-mm/train/uci_electro_small_mm_train.parquet\"\n",
"test_data_path = \"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/uci-demand-pipeline-data-mm/test/uci_electro_small_mm_test.parquet\"\n",
"inference_data_path = \"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/uci-demand-pipeline-data-mm/inference/uci_electro_small_mm_inference.parquet\""
]
},
{
Expand All @@ -242,11 +248,11 @@
"source": [
"def create_folder_and_save_as_parquet(file_uri, output_folder):\n",
" os.makedirs(output_folder, exist_ok=True)\n",
" data_frame = pd.read_csv(file_uri, parse_dates=[time_column_name])\n",
" file_name = os.path.splitext(os.path.split(file_uri)[-1])[0] + \".parquet\"\n",
" data_frame = pd.read_parquet(file_uri)\n",
" file_name = os.path.split(file_uri)[-1]\n",
" data_path = os.path.join(output_folder, file_name)\n",
" data_frame.to_parquet(data_path, index=False)\n",
" return data_frame\n",
" return None\n",
"\n",
"\n",
"create_folder_and_save_as_parquet(train_data_path, \"./data/train\")\n",
Expand All @@ -258,7 +264,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The following cells read and print the first few rows of the training data as well as print the number of unique time series in the data."
"The following cells read and print the first few rows of the training data as well as the number of unique time series in the dataset."
]
},
{
Expand All @@ -285,7 +291,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## <pre> 5. Import Components From Registry <a id=\"ImportComponents\"></pre>\n",
"## 5. Import Components From Registry <a id=\"ImportComponents\">\n",
"\n",
"An Azure Machine Learning component is a self-contained piece of code that does one step in a machine learning pipeline. A component is analogous to a function - it has a name, inputs, outputs, and a body. Components are the building blocks of the Azure Machine Learning pipelines. It's a good engineering practice to build a machine learning pipeline where each step has well-defined inputs and outputs. In Azure Machine Learning, a component represents one reusable step in a pipeline. Components are designed to help improve the productivity of pipeline building. Specifically, components offer:\n",
"- Well-defined interface: Components require a well-defined interface (input and output). The interface allows the user to build steps and connect steps easily. The interface also hides the complex logic of a step and removes the burden of understanding how the step is implemented.\n",
Expand Down Expand Up @@ -463,7 +469,7 @@
"tags": []
},
"source": [
"## <pre> 6. Create a Pipeline <a id=\"CreatePipeline\"></pre>\n",
"## 6. Create a Pipeline <a id=\"CreatePipeline\">\n",
"\n",
"Now that we imported the components we will build an evaluation pipeline. This pipeline will allow us to partition the data, train best models for each partition, genererate rolling forecasts on the test set, and, finally, calculate metrics on the test set output."
]
Expand Down Expand Up @@ -778,7 +784,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## <pre> 7. Kick Off Pipeline Runs <a id=\"PipelineRuns\"></pre>\n",
"## 7. Kick Off Pipeline Runs <a id=\"PipelineRuns\">\n",
"\n",
"Now that the pipeline is defined, we will use it to kick off several runs. First, we will kick off an experiment which will train, inference and evaluate the performance for the best AutoML model for each partition. Next, we will kick off the same pipeline which will only use the naive model for the same partitions. This will allow us to establish a baseline and compare performance results."
]
Expand Down Expand Up @@ -957,7 +963,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## <pre> 8. Download Pipeline Output <a id=\"DownloadOutput\"></pre>\n",
"## 8. Download Pipeline Output <a id=\"DownloadOutput\">\n",
"Next, we will download the output files generated by the compute metrics components for each executed pipeline and save them in the corresponfing subfolder of the `output` folder. First, we create corresponding output directories. Then, we execute the `ml_client.jobs.download` command which downloads experiments' outputs."
]
},
Expand Down Expand Up @@ -1023,7 +1029,7 @@
}
},
"source": [
"## <pre> 9. Compare Evaluation Results <a id=\"CompareResults\"></pre>\n",
"## 9. Compare Evaluation Results <a id=\"CompareResults\">\n",
"\n",
"### 9.1. Examine Metrics\n",
"\n",
Expand Down Expand Up @@ -1231,7 +1237,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## <pre> 10. Deployment <a id=\"Deployment\"></pre>\n",
"## 10. Deployment <a id=\"Deployment\">\n",
"\n",
"In this section, we will illustrate how to deploy and inference models using batch endpoint. Batch endpoints are endpoints that are used to do batch inferencing on large volumes of data in asynchronous way. Batch endpoints receive pointers to data and run jobs asynchronously to process the data in parallel on compute clusters and store outputs to a datastore for further analysis. For more information on batch endpoints see this [link](https://learn.microsoft.com/en-us/azure/machine-learning/concept-endpoints-batch?view=azureml-api-2).\n",
"\n",
Expand Down