From 67c0a10a34e7da13cae04131999a79c5068081a8 Mon Sep 17 00:00:00 2001 From: nick863 <30440255+nick863@users.noreply.github.com> Date: Tue, 23 May 2023 19:34:57 -0700 Subject: [PATCH] Use metrics package in the notebook. (#2130) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * Amah/update e2e sample (#1898) * updated the notebook * updated the e2e sample notebook * removed test value * linting * removed unwanted cell Co-authored-by: Ali M * Fixed kernels for V2 notebooks (#1902) * Fix batch SDK sample script errors (#1910) * Adding automated tests and README for SDK v2 tutorials (#1903) * Adding test for AML in a day * Update tutorials-azureml-in-a-day-azureml-in-a-day.yml * Update tutorials-azureml-in-a-day-azureml-in-a-day.yml * Update tutorials-azureml-in-a-day-azureml-in-a-day.yml * Update azureml-in-a-day.ipynb * Adding tests for more tutorials * Update azureml-in-a-day.ipynb * Adding README files * Update readme.py * Update azureml-in-a-day.ipynb * Added Description to all files * Update README.md * Update readme.py * Updated automl regression notebook to work with the latest matplotlib (#1901) * Updated automl regression notebook to work with the latest matplotlib * Updated generate_workflows.py based on move to v1 folder * Updated automl_env files for SDK 1.47.0 * Revert "Revert Update v1 Many Models and HTS Notebook" (#1823) * Revert "Revert "Update v1 Many Models and HTS Notebook" (#1763)" This reverts commit 13f44d077266b0ba200ec153563c8e65eae11c80. * fix PR comments * added new line at end * fix black issue Co-authored-by: Rahul Kumar * fix: mlflow deployment NYC (#1908) * Fixing git workflow for cli/standalone/multi-label image classification (#1904) * renaming multi-label image classification git workflow; correcting yaml name * Updates in github workflows from PR (https://github.com/Azure/azureml-examples/pull/1879) after running cli/readme.py * Revert "Updates in github workflows from PR (https://github.com/Azure/azureml-examples/pull/1879) after running cli/readme.py" This reverts commit 3bdee939c9ea403d8bb22698a78c811d3108c687. * Updates in github workflows after running cli/readme.py * Updates in github workflows for model/job-model-as-output and model/job-model-as-input * Removing workflows related to model-job-model-as-input/output * Security scanner for Azure ML Compute Instance (#1755) * Security scanner for Azure ML Compute Instance detecting malware and vulnerabilities * Apply black formatting * Apply doc template * Address PR feedback * Update doc * Update README.md * Address PR feedback * Adding some basic samples for autogenerated SDKv2 for TypeSript language (#1895) * Adding checks for k8 extension and check for attaching the compute to workspace * Adding a note on the top of all the workflows about regenerating using python script * Update header in the workflows * Adding the template for account and container * Update the resource names for November to recreate resources * Adding some basic samples for autogenerated SDKv2 for TypeSript language * Address the linting issues. (#1920) * Adding checks for k8 extension and check for attaching the compute to workspace * Adding a note on the top of all the workflows about regenerating using python script * Update header in the workflows * Adding the template for account and container * Update the resource names for November to recreate resources * Adding some basic samples for autogenerated SDKv2 for TypeSript language * Reformat the files to address the linting issue * add metadata (#1915) Add metadata to cells for docs * add metadata to cells (#1916) * rename automl nlp workflows (#1925) * Rename automl-nlp-text-classification-multiclass-task-sentiment-mlflow.ipynb to automl-nlp-classification-multiclass-sentiment-mlflow.ipynb remove text and task from the name * Rename automl-nlp-text-classification-multiclass-task-sentiment.ipynb to automl-nlp-classification-multiclass-sentiment.ipynb * Rename automl-nlp-text-classification-multilabel-task-paper-cat.ipynb to automl-nlp-classification-multilabel-paper-cat.ipynb * remove classification in the names * workflow file changes and README.md * remove original workflow files with long name * Fix typo in title of distributed MNIST TensorFlow notebook #1922 (#1922) * Update data.ipynb (#1918) * Update data.ipynb * Update data.ipynb * Update data.ipynb * Update data.ipynb * Do not remove cpu-cluster compute (#1934) Co-authored-by: Nikolay Rovinskiy * Update workspace sample notebook (#1929) * Update workspace sample notebook * Fix errors in the sample notebook * Update workspace.ipynb * Update workspace.ipynb fix code formatting * update formatting> * Update the sdk_helper to clear the hacks * revert sdk_helper hacks * Update workspace.ipynb update ws name in the last cell for cleanup * Update black formatting :*( * Update kernelspec * change to python310-sdkv2 Co-authored-by: Yisheng Yao * add metadata to cells (#1917) * add metadata to cells * delete comma * add metadata to cell (#1933) * Update RAI CLI examples to use component version 0.3.0 (#1928) * Update cli-responsibleaidashboard-housing-classification.yml * Update cli-responsibleaidashboard-programmer-regression.yml * rename local folder data example to avoid conflict (#1935) * Regenerating the workflows and addressing the linting issues in the repo (#1941) * nlp notebook build failed due to duplicated runs names (#1931) * add wait() to delete deployment * rename the endpoint name * add back time string to endpoint * reformat notebook code * data.ipynb is not related to this PR, but build failed due to the reformat requirement * endpoint name number of char 3 to 32 * Update data.ipynb * revert change to data.ipynb * revert change to data.ipynb * Readdress safe rollout (#1884) Adjust syntax * Fix tags (#1944) * added preprocessing code (#1906) *added preprocessing component/node in automl-pipelines example for sdk and cli * Fix mlflow-for-batch-tabular notebook (#1937) * Fix mlflow-for-batch-tabular notebook * Wait for job to finish to download * Update notebook * Update path * Fix format Co-authored-by: nancy-mejia * Fix custom output batch notebook (#1943) * Fix custom output batch notebook * Update format Co-authored-by: nancy-mejia * add metadata to another cell (#1948) * add metadata to another cell * fix metadata * fix syntax * Updating identity fucntion for data sample notebook(authenticating in a job) (#1952) Co-authored-by: Pritam Kumar Das * replace stale base image (#1947) * Update mldesigner component env style (#1954) * update component env style * rename and update env yaml * AutoML Images: Generating MLTable on the fly; Prepping data in user specified folder (#1911) * AutoML Images: Generating MLTable on the fly; downloading and prepping the data in user specified folder * AutoML Images: Clearing output of notebooks * AutoML Images: Cleaning up * AutoML Images: black and black-nb formatting * AutoML Images: refactoring * AutoML Images: black formatting * AutoML Images: black formatting * AutoML Images: Fixing typo; refactoring * Reverting metadata updates * Fix cli-assets-model-job-model-as-input and cli-assets-model-job-model-as-output (#1949) * Fix nyc_taxi_data_regression-create-and-deploy-model (#1958) * updated model example structure (#1909) * updated model example structure * remove on needed file reference * updated path * added updated workflows * Update README.md * Santiagxf/aml batch fixes (#1936) * test fixes * Fix broken link (#1702) * Migrate IcM generation to monitor (#1926) * Remove IcMs from readme.py * Update workflows * Remove job folder under assets (#1960) Co-authored-by: nancy-mejia * move mlflow examples into python sdk v2 folder (#1942) * move mlflow examples into python sdk v2 folder * checking black and readme.py * readme Co-authored-by: Facundo Santiago * Update RAI notebooks to add RAI built-in component version (#1957) * Update responsibleaidashboard-diabetes-decision-making.ipynb * Update responsibleaidashboard-diabetes-regression-model-debugging.ipynb * Update responsibleaidashboard-housing-classification-model-debugging.ipynb * Update responsibleaidashboard-housing-decision-making.ipynb * Update responsibleaidashboard-programmer-regression-model-debugging.ipynb * Update RAI components to 0.4.0 in RAI CLI examples (#1956) * Update cli-responsibleaidashboard-housing-classification.yml * Update cli-responsibleaidashboard-programmer-regression.yml * updated mm related notebooks (#1959) * updated mm related notebooks * fix format for black * fix PR comments Co-authored-by: Rahul Kumar * Updated classification vision notebooks with XAI (#1734) * v1 multi-class batch scoring notebook * v2 vision notebooks * minor updates * minor version updates * column name update * links updated * removed multi-class batch scoring notebook * link updates * format changes * format changes 1 * format changes * format changes * minor update * minor updates * format changes * format changes * minor updates * link update * minor update * minor update * minor update * format changes * AutoMode Examples (#1601) * Multiclass mlflow image classification notebook with AutoMode. * AutoMode for the rest of the notebooks. * AutoMode via CLI. * Reformat with black. * Wait after AutoMode runs. * Patch compute cluster in configuration files. * Add GitHub actions for AutoMode CLI configuration files. * Clarify description of AutoMode. * Patch merge bugs. * Lead with paragraph about AutoMode. * Patch multilabel notebook. * Lead with Automode in CLI examples. * Move some configuration functions in AutoMode section. * v1 multi-class batch scoring notebook * v2 vision notebooks * minor updates * minor version updates * column name update * Attempt to fix AutoMode CLI example for multiclass classification. * Reformat three main image notebooks. * Fix setup for AutoMode YAML job definitions. * Unbreak scoring code and data (fix bad merge). * Newline character at end of multiclass-batch-scoring file. * More fixing of bad merge. * Newline character at end of multiclass batch scoring notebook. * Use the term AutoMode side-by-side with automatic sweeps in notebooks. * Use the term AutoMode side-by-side with automatic sweeps in CLI instructions. * Make YAML files for AutoMode look more similar to regular sweeps. Co-authored-by: Vadthyavath Ram <7171558+vadthyavath@users.noreply.github.com> * update scikit-image to 0.19.3 (#1974) * Avoid warning in Many-Models Notebook (#1971) * avoid warning * update reason for dropping column * update data_preprocessing_tabular script Co-authored-by: Rahul Kumar * distributed training of yolov5 files (#1896) * distributed training of yolov5 files * Add files via upload editing connecting to workspace cells. * Add files via upload ran black on the notebook for formatting * Add files via upload edited the handle to workspace (ml_client) * Add files via upload renamed compute_name to "gpu-cluster" * Delete sdk/python/jobs/single-step/pytorch/distributed-training-yolov5/yolov5/datasets directory deleting dataset directory * Delete sdk/python/jobs/single-step/pytorch/distributed-training-yolov5/yolov5/data directory deleted data folder and is uploaded in https://azuremlexamples.blob.core.windows.net/datasets/yolov5/data/ * Add files via upload Changed the input data path to azure blob storage * Delete .pre-commit-config.yaml * Remove timeouts for AutoMode to reduce potential for confusion (#1968) * Remove timeout for AutoMode examples. * Reformat. * Update sdk/python/jobs/automl-standalone-jobs/automl-image-classification-multiclass-task-fridge-items/automl-image-classification-multiclass-task-fridge-items.ipynb Co-authored-by: sharma-riti <52715641+sharma-riti@users.noreply.github.com> * Update sdk/python/jobs/automl-standalone-jobs/automl-image-classification-multilabel-task-fridge-items/automl-image-classification-multilabel-task-fridge-items.ipynb Co-authored-by: sharma-riti <52715641+sharma-riti@users.noreply.github.com> * Update sdk/python/jobs/automl-standalone-jobs/automl-image-object-detection-task-fridge-items/automl-image-object-detection-task-fridge-items.ipynb Co-authored-by: sharma-riti <52715641+sharma-riti@users.noreply.github.com> * Update sdk/python/jobs/automl-standalone-jobs/automl-image-instance-segmentation-task-fridge-items/automl-image-instance-segmentation-task-fridge-items.ipynb Co-authored-by: sharma-riti <52715641+sharma-riti@users.noreply.github.com> * Fix the suggestions. Co-authored-by: sharma-riti <52715641+sharma-riti@users.noreply.github.com> * fix amlarc workflow (#1976) * install docker dependency * install docker package before local deployment * set single step spark directly * update filter * fix wrong directory * remove 4b_datastore_datapath_uri * update scikit-image version for automl pipeline (#1980) * deleting the "using-mlflow" folder from v1 (#1963) * Move from loading JSON to CSV during inference. (#1973) * Directly take the csv file from the batch endpoint * Change file extensions and comments * Fix * Add separator notation. * Add option to use the comma instead of tab separated files. Co-authored-by: Nikolay Rovinskiy * Fix documentation to notebooks (#1966) * Patch hf-textgen and simple deployment (#1979) * fix: image_classification_with_densenet (#1981) * fix: AZ_BATCH_JOB_ID => AZUREML_RUN_ID * fix: black * Parse forecast origin (#1983) Co-authored-by: Nikolay Rovinskiy * Changed to use `ml_client.online_endpoints` (#1969) changed from `ml_client.begin_create_or_update` to `ml_client.online_endpoints.begin_create_or_update` * [AutoML Images] Add mltable asset links for sdk (#1970) * AutoML Images: Add mltable asset links; update docs * renaming classes back to labels * Reverting metadata update * Fixing typo * Fixing typo * Refactoring mltable to MLTable * Update sdk helper to include cli (#1939) update max_trials for cli as well * add mltable asset doc links (#1965) Co-authored-by: Nick Mecklenburg * add codegen v2 example (#1987) * add codegen example v2 * add explanation * add mlflow import * add mlflow in yaml * add tracking uri * Remove old RAI examples (#1988) * remove RAI examples under Pipelines * remove related GitHub Actions * Custom deployments in Online Endpoints with MLflow (#1978) * move mlflow examples into python sdk v2 folder * checking black and readme.py * readme * feat: new example for MLflow custom deployment * build * fix: PR comments Co-authored-by: msakande <17515964+msakande@users.noreply.github.com> * Update CIFAR-10 examples distribution config (#1989) * try new distribution config * update CLI example config * Increase maximum nodes on clusters (#1992) * remove extra leading space in component command (#1993) * [bhavanatumma/mlflow-model-deploy-v3] - conflict resolution with new … (#1990) * [bhavanatumma/mlflow-model-deploy-v3] - conflict resolution with new and workflow fix * [bhavanatumma/mlflow-model-deploy-v3] - papermill fix for the flow * [bhavanatumma/mlflow-model-deploy-v3] - papermill fix for the flow * [bhavanatumma/mlflow-model-deploy-v3] - papermill fix for the flow Co-authored-by: bhavanatumma * fiw wrong filter (#1985) * fiw wrong filter * add more time for automl-dynamic job * remove duplicate step * Address LROPoller error message that appears when cpu-cluster is created for the first time (#1995) * Address LROPoller error message that appears when cpu-cluster is created for the first time * Removing local deployments * MLflow refresh for 2.0 (#2000) * feat: changes * comments * formatting * fix * fix: adding predict with deployment client * update markdown cells to sync with doc (#2001) * Avoid use common path as output path (#1996) * Avoid use common path as output path in azureml-examples * update * update * update * update readme Co-authored-by: Ying Chen <2601502859@qq.com> * remove old RAI examples from README (#2004) * Changed process_count_per_instance to 1 for NC6 compute (#2005) * remove model loading (#1975) Co-authored-by: Miseon Park * Refactor the script for adding identity with compute (#2014) * Refactor the script for adding identity with compute * fix (#2017) * minor edit of markdown content (#2019) * Address begin delete (#2011) Change begin delete statement to address intermittent failures in: - Safe Rollout - SAI/UAI * Removing step to archive anonymous model on triton deployment (#1997) * Remove archiving model step No need to archive an anonymous model registered by the system. Eventually, this action will trigger an error given it is a stage transition not allowed. * Removing archive model step No need to archive an anonymous model registered by the system. Eventually, this action will trigger an error given it is a stage transition not allowed. * Update online-endpoints-triton-cc.ipynb * Update online-endpoints-triton.ipynb * Check for failed pipeline. (#2013) Co-authored-by: Nikolay Rovinskiy * Create workflows to run pipeline job using registry components (#1900) Co-authored-by: Ubuntu * Format with black (#2021) * adjust multilabel notebook to adapt prediction (#1972) data format change * Changed Advise of n_cross_validations In V2 Notebooks (#2029) * Update auto-ml-forecasting-bike-share.ipynb * Update automl-forecasting-orange-juice-sales-mlflow.ipynb * Update automl-forecasting-task-energy-demand-advanced.ipynb * Update automl-forecasting-in-pipeline.ipynb * [AutoML-Image] MLFlow Inference: Decode the image utf-8 str (#2027) * Decode the image read to utf-8 encoding * Added escape char before double quotes in raw .ipynb file * Formated notebooks as per black-nb * delay registry component workflows by 59 min (#2033) * delay registry component workflows by 59 min * fix cleanup.sh Co-authored-by: Ubuntu Co-authored-by: Jun Qi * Update azureml-getting-started-studio.ipynb (#1849) * Update for azure-ai-ml==1.3.0 and ml extension 2.13.0 (#2007) * Add SDK release candidate * Add CLI release candidate * Update sdk release candidate * Add Default Values for Environment Variables in Workspace Connections Notebook (#2010) * Add Default Value for ACR Username/Password * add defaults for git_pat and python_feed_sas * Update connections.ipynb * Update connections.ipynb * update workflow file * add step in readme to update workflow with env vars * revert deletion * revert changes to notebook * Update connections.ipynb * Update sdk release candidate * Update SDK release candidate * Use released versions of azure-ai-ml (version 1.3.0) and ml extension (version 2.13.0) Co-authored-by: Neehar Duvvuri <40341266+needuv@users.noreply.github.com> * Update the readme.py script in cli folder to work on Windows as well as Linux (#2035) * Updated the cli readme.py to support Windows * Changing the json request folder location to align better with the docs (#2036) * Update instance type to Standard_DS3_V2 (#2037) * Ran sdk/python/readme.py (#2040) This fixed the kernel for some notebooks and cleaned some whitespace in workflow files. * Add deepspeed autotuning and training examples (#2028) * add deepspeed example * fix mlflow logging in train.py * move examples to cli folder * move to jobs, add job.yaml files * generate workflow file * add comment in train.py * add latest tag to env * try different way of starting run * set up job as pipeline * move to pipelines folder * add temp yaml file to pass validation * modify readme.py to support deepspeed * move generate-yml script * recreate workflow files. * move generated key location * change environment * change env for both examples * fix env * try v100 computes * create nd40 cluster * add compute create line * change max number of nodes * change data directory * change command to block * move data files into src * remove unused data thing * change data file type * fix mlflow get run fail * change command style in job * change location of job * change generate key path * change code: path * rename results folder * change back key path * change max number of nd40 nodes * try moving autotuning folder into src * change output file name * try overwriting output * add dockerfile for custom env * change env in yaml * update README and move autotune example * move dockerfile to parent dir * Fixed readme.py loading notebook (#2041) * resolve log_param conflict issue (#2045) * resolve log_param conflict issue * reslove black format --------- Co-authored-by: Clement Wang * lock jupyter-client version (#2046) * Changed schedule to reduce quota usage (#2043) * Changed schedule to reduce quota usage * Changed to schedule every 12 hours * feat: remove reference to deprecated json schema (#2047) * Run readme.py on two remaining notebooks (#2053) * Changed schedule for cli samples to run every 12 hours at random times (#2054) * fix kubernetes deployment resources (#2055) * rectified resource sections under new rules * remove redundant changes * remove redundant changes --------- Co-authored-by: Mingda Jia * fix code cell (#2039) * fix code cell If creating the compute cluster, the print didn't work. This change fixes that. (Default code won't hit this, because the compute cluster already exists. But only occurs when that default cluster is not present.) * Update azureml-in-a-day.ipynb * Update azureml-in-a-day.ipynb * move print and revert var change * Add working-directly for sampledata creation (#2059) * Fix deepspeed examples bugs (#2044) * add timeout for deepspeed jobs * re format readme with black * change timeout length * change dockerfile to use acpt image * add training custom env * fix hostfile bug * fix bash generation * address comments * increase number of gpus being used * make sure deepspeed is upgraded to latest version * write to hostfile in single process * Init (#2065) * Prs examples (#2064) * Update PRS examples * Update PRS examples * Update prs examples * Update PRS examples * Update format * Update by comments * fix sdk default datastore issue. * Update wording * [ML][Pipelines]Remove DLLogger dependency from pipelines samples (#2063) * remove DLLogger dependency * update * fix black --------- Co-authored-by: Clement Wang * Update RAI component version to 0.5.0 in CLI examples (#2067) * Update cli-responsibleaidashboard-housing-classification.yml * Update cli-responsibleaidashboard-programmer-regression.yml * Rename MLFlow Sample to NCD (#2069) * Init * Readme * [add-codegen-block] - re-added codegen part (#2070) * [add-codegen-block] - re-added codegen part * [bhavanatumma/add-codegen-block] - review comments --------- Co-authored-by: bhavanatumma * RAI housing decision making notebook: Change metrics to regression metrics (#2076) So far, the scorecard metrics were classification metrics leading to a validation error during execution. Changing to regression metrics fixes the problem. * Fix bug where single node cannot be used (#2077) * add hostfile check with only one node * change logic * add changes to training example * update data store path (#2079) * update data store path * output file * sleep * using folder * Update tensorflow version to 2.1.4 (#2081) * Update tensorflow version to 2.1.4 * Revert change for failing cli examples * Upgrade tensorflow in cli jobs * Revert upgrade tensorflow in failing jobs * Unblock create_registry PR Check (#2086) * delete this .sh to unblock the PR check. The creation test is somewhere else. * deleted related workflow * [PRS] Install mltable>=1.2.0 in the user conda env (#2082) * install dependency * unpin version * refine * refine * refine * pin 1.2.0 * refine * tf1 --------- Co-authored-by: Xiaole Wen * Add role assignment cleanup and workaround for the exit code 100 error. (#2049) * Add role assignment cleanup * Add verbose logging for troubleshooting the exit code 100 issue * bypass the error for grub-efi-amd64-signed * Adding the redirections back * minor updates added to AzureML in day file (#2085) * minor updates added to AzureML in day file * add text to modify notebook * fix meta data * fix meta data --------- Co-authored-by: Kenwiggan Co-authored-by: Sheri Gilley * Fix deploy-custom-container-mlflow-multiployment-scikit.sh after mlflow folder rename (#2091) * [Component/Pipeline]: Add Typescript/JavaScript samples for SDKv2 (#2080) * update * update * [PipelineJob]: Add Typescript/JavaScript samples for SDKv2 (#2087) * add sample * update * update * fix indent * Update sdk/typescript/src/resources/jobs/pipelines/pipelineJobCreateOrUpdateSample.ts Co-authored-by: guanghechen <42513619+guanghechen@users.noreply.github.com> * update gitignore * update folder structure * remove not supported delete sample --------- Co-authored-by: guanghechen <42513619+guanghechen@users.noreply.github.com> Co-authored-by: Clement Wang --------- Co-authored-by: Ying Chen <2601502859@qq.com> Co-authored-by: Han Wang Co-authored-by: guanghechen <42513619+guanghechen@users.noreply.github.com> Co-authored-by: Clement Wang * Update RAI notebook: enable use_model_dependency and add metadata example (#2089) * Update responsibleaidashboard-diabetes-decision-making.ipynb * Update responsibleaidashboard-diabetes-regression-model-debugging.ipynb * Update responsibleaidashboard-housing-classification-model-debugging.ipynb * Update responsibleaidashboard-housing-decision-making.ipynb * Update responsibleaidashboard-programmer-regression-model-debugging.ipynb * Change 18.04 to 20.04 in MOE (#2078) * Update config notebook links for the forecasting notebooks (#2095) * changed link in enegy demand * updated URL in remaning notebooks * Updated automl_env.yml to SDK 1.49.0 (#2096) * Updated automl_env.yml to SDK 1.49.0 * Simplified automl_env_linux.yml * Changed validation to use ubuntu 20.04 * Updated papermill to avoid conflict with packaging package * Added pip install azureml-core for creating sampledata (#2008) * Samuel100/using mltable (#2083) * init using-mltable * print data * deleted created artifact * following contributing guidelines * remove from_config for ws details * formatted train.py * removing files not in this PR * added local-to-cloud example * local-to-cloud example added * removed nuasance files * Fix cli scripts batch score test automation (#2108) * Update DeepSpeed Training example launcher (#2103) * change training example * fix workflow * fix syntax * update readmes * edit readme * remove env variables * Demand forecasting data prep and TCN (#2101) * initial commit after deleting previous branch due to rebase issues * data prep notebook * generated yml files and modified README.md * changed the name of dataset name for data from public blob. This is necessary because the data-prep notebook uploads the data with the same name and causes version mismatch. Hence, the test and train data have different grains * added create new version option to dataset registration * addressed all PR comments * removed variable that was referenced before declaration * changed compute target creation cell from markdown to code * Schema improvements for Interactive Jobs (#2109) * new schema changes * black format * First V2 notebook validation (#2102) * First V2 notebook validation * Updated MLFlow download_artifact to remove warning * Moved V1 validation definitions to under .github (#2117) * Moved V1 validation definitions to under .github for consistency with V2 validation definitions * updated URL in TCN and data prep notebooks (#2118) * Added validation for more V2 notebooks (#2110) * Added validation for more V2 notebooks * Fixed deprecation warnings * Add mlflow for validation * increate timeout for automl-dynamic jobs (#2122) * Use metrics package to calculate metrics * Use metrics package to calculate metrics * Changed training V1 notebooks to ubuntu 20.04 (#2100) * Changed training V1 notebooks to ubuntu 20.04 * Updated requirements.txt to speed up install * Updated creds, workspace and resource group * Added pin for mlflow-skinny * update service principal (#2133) * delete registry after creation (#2107) * delete registry after creation * always run deletion * init env * set working dir * add sdk_helper * add sub_id * edit for debugging * try debugging * try deletion inside the notebook * delete in the same step * print registry name for debugging * remove deletion for testing. * change everything back for final pr * modify format * make deletion inside the ipnb * try sdk deletion * try sdk * use verified api. * change format * Update registry-create.ipynb * Users/anksing/release 1.5.0 (#2140) * Update share-models-components-environments.ipynb * Update online-endpoints-keyvault.ipynb * Update share-models-components-environments.ipynb * Adding best practices for large scale deep learning (#2144) Adding best-practices for large-scale deep learning workloads. * Get started tutorials (#2111) New tutorial series * Wait for delete deployment (#2145) Co-authored-by: nancy-mejia * Fixing broken links in the BestPractices folder. (#2146) Fixing broken links under BestPractices folder, used relative paths. * Added basic validation for automl nlp notebooks (#2114) * Added basic validation for automl nlp notebooks * Added exception for Readonly warning * Updated download_artifacts * Fixed check endpoint exception * more MLTable example notebooks (#2131) * more MLTable example notebooks * included requirements file install * updated requirements file * pinned package versions * fixed typo * fixed the file path * upversioned azureml-dataprep * in cmd job ensure latest asset version is pulled * updated versioning strategy * removed delta example * remove delta gh workflow as example removed * Update deploy-model notebook (#2149) Co-authored-by: nancy-mejia * Table of content fix and added smoke yaml (#2148) * Table of content fix and added smoke yaml * added sample page_type * changed description * updated description * metadata and md updates (#2150) * Update README.md (#2151) * Update README.md * resolved comments * removed space * Changed Monitoring and optimization to Bold as well. * Update README.md (#2147) fix typo * Reindexing the quantiles dataframe (#2143) * Reindexing the quantiles dataframe * Updated fix for reindex issue * [bhavanatumma/forecasting-revert] - revert batch deployment changes (#2073) * [bhavanatumma/forecasting-revert] - revert batch deployment changes * [bhavanatumma/forecasting-revert] - revert batch deployment changes * [bhavanatumma/forecasting-revert] - edited readme * [bhavanatumma/forecasting-revert] - adding workflow * [bhavanatumma/forecasting-revert] - readme modification * update image for inference --------- Co-authored-by: bhavanatumma Co-authored-by: Rahul Kumar * Added validation for forecast automl notebooks (#2132) * Added validation for forecast automl notebooks * Ignore downloading artifact message * Update nebula.md for support of memory buffer size (#2153) * remove experiment_name (#2154) * remove experiment_name * remove bullet * Batch Endpoints refactor of examples (#2158) * batch endpoint CLI folders --------- Co-authored-by: santiagxf * Remove batch-cluster from infra cleanup script (#2157) * update * update * sample notebook for data asset in registry (#2167) * Quick fixes for files referenced on docs (#2164) * batch endpoint CLI folders * fixes * docs sync * black --------- Co-authored-by: santiagxf * fixing languages section in the sample metadata (#2171) * fixing languages in the sample metadata * fixing title in the regression automl CLI pipeline sample * Fixes (#2173) * Fix data for hello-iris-datastore-folder and cli-jobs-basics-hello-iris-datastore-file (#2179) * Fixed Pipelines readme files that prevents samples from being indexed. (#2178) * giving unique titles to each of the pipelines samples * improved description on TF sample * Clean old resource group as well (#2180) * Clean old resource group as well * Added escape in regular expression for "AzureML Metrics Writer (preview)". So that more Role Assignments are cleaned * Fix share-models-components-environments notebook (#2185) Co-authored-by: nancy-mejia * adjust order to align with CLI/Studio samples (#2168) (#2169) * Update share-models-components-environments notebook (#2187) Co-authored-by: nancy-mejia * Changing readme under hierarchical timeseries folder (#2194) * creating first level markdown title. * Lochen/pipeline component pup (#2090) * init * add identity * init * re-structure * update * update * update readme * update * bring back spark component example * fix typos * fix typo * add inputs * add tag for doc * add CI * remove pipeline component * remove is_deterministic * reformat --------- Co-authored-by: Clement Wang * For data-import feature public preview (#2165) * For data-import feature public preview * Fixed broken smoke test test case * Fixing failed smoketests * smoke test fixes * format fixes * Fix smoketest error * updated package to fix smoketest * smoketest fixes * smoke test format fixes * format corrected * package name corrected * package name correction * Model import Notebook and workflow (#2172) * example added to use registred component to create pipeline * reformatted notebook with black * reformatted notebook with black * reformatted notebook with black * reformatted notebook with black * changed model example from ai4bharat to hugging face * changed model example from ai4bharat to hugging face * Update automl-classification-bankmarketing-in-pipeline.ipynb * Update automl-classification-bankmarketing-in-pipeline.ipynb * updated notebook * Update pipeline_with_registered_components.ipynb * Update pipeline_with_registered_components.ipynb * Update pipeline_with_registered_components.ipynb * Update pipeline_with_registered_component_in_registry.ipynb * Update bert-base-uncased-test.json * notebook added for pipeline component * notebook added using pipeline component and individual components * notebook added using pipeline component and individual components * notebook update with bug bash review * Update Model Import Pipeline.ipynb * Update pipeline_with_registered_components.ipynb * Update pipeline_with_registered_component_in_registry.ipynb * Update pipeline_with_registered_components.ipynb * Update Model Import Pipeline.ipynb * Update pipeline_with_registered_component_in_registry.ipynb * notebook update * notebook updated with review changes * notebook updated with review changes * review comments implemented * updated the folder structure * Update import_model_into_registry.ipynb * notebook updated * code updated * workflow added * workflow added * registyr updated * notebook updsted for preview-test1 * review comments addressed * review comments addressed * error for false task value updated in description * cell added to show the registered model * header fonts changed * nit formatted * Update online-endpoints-simple-deployment.ipynb * Update online-endpoints-simple-deployment.ipynb * Update online-endpoints-simple-deployment.ipynb * Update online-endpoints-simple-deployment.ipynb * headers changed * registry name changed * string to int for version logic * nit formatted * nit formatted * Removed a modified file from pull request * Removed a modified file from pull request * Update import_model_into_registry.ipynb * Update import_model_into_registry.ipynb * Update import_model_into_registry.ipynb * Update import_model_into_registry.ipynb * Update import_model_into_registry.ipynb * image updated in notebook * Update import_model_into_registry.ipynb * Update import_model_into_registry.ipynb * Update import_model_into_registry.ipynb * Update import_model_into_registry.ipynb * Update import_model_into_registry.ipynb * Update import_model_into_registry.ipynb * Add workflows for all endpoints (#2203) * Add workflows for all endpoints * Fixed online vs batch endpoint workflow command bug * Testing workflows catting endpoint .yml files * Removed all newly generated non-endpoint related workflows that were under endpoint folder * Testing workflows deleting endpoints before creation if exists * Add continue on failure for delete endpoint * Testing replacing endpoint names during action and deleting * Testing extra delete section * Testing -n over -f * Testing new endpoint names * Reduced endpoint names to 32 chars max * Replaced lowercase endpoint_name with uppercase, added quotes * Removed uai create endpoint workflow * Added create deployment sections to workflows if exist * Standardized all usages of conda*.yml to use .yaml ending in cli/endpoints * Standardized all usages of conda*.yml to use .yaml ending in cli/endpoints * Used same modified endpoint name for deployments in workflow * Temporarily removed deployments and endpoints which were failing due to malformed .yml and other transient reasons * Removed extra print statement and comment * Added missing name for deployment * Added 1-sai-create-endpoint to ignore list * updating python version (#2196) (#2197) * updating python version (#2196) Co-authored-by: Raghu Ramaswamy * Added endpoint paths to trigger create deployment workflows to test Python ver change * Removed unrelated workflows that are failing * Removed unrelated workflows * Ignore kubernetes green deployment * Fixed typo in pip and joblib versions --------- Co-authored-by: Raghu Ramaswamy <13340619+raghutillu@users.noreply.github.com> Co-authored-by: Raghu Ramaswamy * Rename conda.yml to conda.yaml (#2206) * Rename conda.yml to conda.yaml * Re-ran sdk/python/readme.py to update workflows * Shorten validation configuration file paths (#2214) * Shorten validation configuration file paths * use nc6s_v3 gpu for OD tasks (#2213) * use nc6s_v3 gpu for OD tasks * update gpu name * delta example and up-version mltable (#2205) * add fast-checkpoint examples for nebulaml (#2123) * add nebulaml examples readme * add nebulaml examples readme * remove unused readme * add some fix * run python readme.py * add cifar10 case * Add README file and other examples of Nebula * reformat files * reformat files in the nebulaml examples * reformat files in the nebulaml examples --------- Co-authored-by: luwei Co-authored-by: Ziqi Wang * Adding OBO sample (#2193) * Adding OBO sample * Formatted file * Review comments to add readme * Adding readme for obo sample * Updating langauge for sample * Update cli/jobs/single-step/on-behalf-of/README.md Co-authored-by: Bala P V <33712765+balapv@users.noreply.github.com> * Update cli/jobs/single-step/on-behalf-of/README.md Co-authored-by: Bala P V <33712765+balapv@users.noreply.github.com> * Update job.yaml --------- Co-authored-by: Bala P V <33712765+balapv@users.noreply.github.com> * fix od batch scoring notebook (#2220) * fix od batch scoring notebook * format file * update broken path * Update RAI component version to 0.7.0 in CLI examples (#2230) * Update RAI component version to 0.7.0 * Update cli-responsibleaidashboard-programmer-regression.yml * Santiagxf/patch scripts docs (#2232) * Update deploy-and-run.sh * Update deploy-and-run.sh * Update deploy-and-run.sh * Update deploy-and-run.sh * Update deploy-and-run.sh * Upgrade the mode training environment for RAI CLI examples (#2231) * Upgrade the mode training environment for RAI CLI examples * Update train_programmers.yml * Change torch_nebula to nebulaml (#2219) * Change torch_nebula to nebulaml * Renaming the doc as README file * Rename the package name from torch_nebula to nebulaml --------- Co-authored-by: xiaoranli Co-authored-by: Ziqi Wang * Mabables/foundation models (#2182) * foundation model samples - text classification - emotion detection `date` * foundation model samples - text classification - emotion detection Wed Mar 15 23:03:38 PDT 2023 * foundation model samples - text classification - emotion detection Wed Mar 15 23:13:42 PDT 2023 * foundation model samples - text classification - emotion detection Wed Mar 22 11:15:27 PDT 2023 * foundation model samples - text classification - emotion detection Wed Mar 22 17:03:46 PDT 2023 * foundation model samples - text classification - emotion detection Wed Mar 22 18:43:05 PDT 2023 * foundation model samples - text classification - emotion detection Wed Mar 22 21:56:19 PDT 2023 * foundation model samples - text classification - emotion detection Wed Mar 22 22:02:51 PDT 2023 * foundation model samples - text classification - emotion detection Wed Mar 22 23:02:34 PDT 2023 * foundation model samples - text classification - emotion detection Wed Mar 22 23:10:16 PDT 2023 * foundation model samples - text classification - emotion detection Thu Mar 23 11:09:07 PDT 2023 * summarization sample * ft notebooks - fixes for eval and latest label * inference sample (#2159) * emotion detectio cli sample * cli sample emotion detection * cli sample emotion detection * cli sample emotion detection * cli sample emotion detection * fill mask inference cli sample * fill mask inference cli sample * Updating FT notebooks and CLI examples (#2160) * notebook samples added for translation and token classification * Adding cli examples --------- Co-authored-by: HrishikeshGeedMS * mlflow for metrics * adding eval components * FT notebook updates - added dependencies installation, command to find number of GPU's in SKU and Metrics (#2162) * Adding requirements install, cmd to find gpu number and metrics to notebooks * Adding deployment supported SKU list to notebooks * Removing version dependency for azure-ai-ml * Updating gpus query to python code * renaming compute_model_selector -> compute_model_import * evaluation text classification sample * renaming compute_model_selector -> compute_model_import * evaluation text classification sample * evaluation text classification sample * evaluation text classification sample * correcting the evaluation config extension from jsonl -> json * updating the logic to calculate the gpu count * evaluation text classification sample * inference text classification sample * Adding github workflows for FT notebooks * model eval dashboard screenshot * Inference samples for all tasks (#2175) * Sample scripts for fill-mask and translation * Add generic inference scripts * remove task-specific folders * remove whisper files * Update sample inputs * use task specific scripts * Sample scripts for fill-mask and translation * Add generic inference scripts * remove task-specific folders * remove whisper files * Update sample inputs * use task specific scripts * Sample scripts for fill-mask and translation * Add generic inference scripts * remove task-specific folders * remove whisper files * Update sample inputs * use task specific scripts * Add generic inference scripts * remove task-specific folders * remove whisper files * Update sample inputs * use task specific scripts * inference samples for all tasks * Add sample script for text-classification * Remove generic samples * remove token-classification samples * delete extra dataset * Add token-classification-samples (#2176) * Model Evaluation sample notebooks * Model Evaluation sample notebooks - Removing cell outputs * Model Evaluation sample notebooks - Cleaning up notebooks * Model Evaluation sample notebooks - Adding Git actions * Model Evaluation sample notebooks - Renaming cluster name * Model Evaluation sample notebooks - Adding model evaluation dashboard screenshots * Model Evaluation sample notebooks - Fixing black workflow runs * add whisper files and update scripts (#2183) * Model Evaluation sample notebooks - Adding configs for text-gen and fill-mask tasks * Model Evaluation sample notebooks - Removing multilabel Github workflow * Model Evaluation sample notebooks - Modifying eval-configs for fill-mask and text-gen * Model Evaluation sample notebooks - Fixing missing info for fill-mask and text-gen notebooks * Add ground truth comparison, update sample scores (#2186) * Fixing github runners (#2188) * Try to read workspace details from config * Reformatted versions with black * Fixing text classification notebook issue * Adding mlflow installation * Update QnA metric * Handeling runs with None * Model Evaluation sample notebooks - Adding documentation for evaluation configs * Model Evaluation sample notebooks - Fixing black check run * added ymls for model-evaluation-subgraph cli (#2189) * added ymls for model-evaluation-subgraph cli * changed type to pipeline in cli files * updated model id * Removing input_column_names from FT cli * batch sample for text classification (#2199) * batch sample * batch sample * batch samples * batch samples for text-classification * batch samples for text-classification * Updating FT notebooks to use latest NCD input format * updating config for text and token classification. (#2204) Co-authored-by: Chandra Sekhar Gupta Aravpalli * Model Evaluation sample notebooks - Modifying to Azure production registry * Model Evaluation sample notebooks - Fixing black runs * Updating registry_name to azureml-preview-test1 temporarily * Model Evaluation sample notebooks - Fixing text-classification notebook * Model Evaluation sample notebooks - Fixing text-classification notebook * Model Evaluation sample notebooks - Changing registry for model fetching * Model Evaluation sample notebooks - Changing registry for model fetching * Model Evaluation sample notebooks - Fixing fill mask masks * Model Evaluation sample notebooks - Finalising notebooks * Model Evaluation sample notebooks - Fixing black runs * task name typo (#2207) * Setting up FT notebooks cron to run at midnight daily * Changing NCD compute * Changing registry to azureml * Aditisingh/update names (#2212) * renamed directories * renamed cli directories * updated data paths * Updating FT notebooks inference compute * updating notebooks for evaluation of base models (#2216) * updating evaluation text classification notebooks. * addding notebook for evaluation of sentiment analysis models. --------- Co-authored-by: Chandra Sekhar Gupta Aravpalli * Add batch fill mask example. (#2208) Add fill mask batch endpoint example notebook. * Add remaining batch examples for HF foundation models (#2217) adding batch endpoint example notebooks for all except summarization and asr, which did not pass. * Pmanoj/foundational models cli issues (#2223) * fixing cli script bugs * Changing inference sku --------- Co-authored-by: Pavan Manoj Jonnalagadda * Reformatting inference files * Model Evaluation sample notebooks - Fixing Github workflows --------- Co-authored-by: Manoj Bableshwar Co-authored-by: skanakamedal <116672436+skanakamedal@users.noreply.github.com> Co-authored-by: HrishikeshGeedMS Co-authored-by: Narayanan Madhu Co-authored-by: SitaRam Chaitanya Kanakamedala Co-authored-by: Pavan Manoj Jonnalagadda Co-authored-by: Sumadhva Sridhar <109793745+susridhar@users.noreply.github.com> Co-authored-by: Sarthak Singhal Co-authored-by: Aditi Singh <114134940+s-aditi@users.noreply.github.com> Co-authored-by: Chandra Sekhar Gupta <38103118+guptha23@users.noreply.github.com> Co-authored-by: Chandra Sekhar Gupta Aravpalli Co-authored-by: amymsft <94562419+amymsft@users.noreply.github.com> Co-authored-by: Haritha Pallavi Bendapudi * add jsonl conversion helpers for automl-image notebooks (#2202) * prototype notebookes for jsonl conversion: multiclass, multilabel, object detection coco, object detection voc, instance segmentation coco, instance segmentation voc * implement jsonl conversion for multiclass and multilabel classification, demonstrate in jsonl-conversion/notebooks and modify automl-image-classification-multiclass-task-fridge-items notebook to use the new implementation * change multilabel notebook to use new jsonl conversion * implement coco jsonl converter for object detection and change notebook to use new implementation, verified that new implementation produces the same jsonl file as the old implementation * implement voc jsonl converter for object detection and demonstrate in object detection notebook, verified that new implementation produces the same train and validation json files as the original * clear outputs in changed notebooks * add instance segmentation to voc jsonl converter, verify that it generates the same train and val annotation files as original, verify that it generates the same annotation file as the coco jsonl converter for object detection * implement coco to jsonl conversion for instance segmentation for iscrowd==0, tested with example notebook from Azure/medical-imaging which ignores crowd annotations, also verified that this does not break the coco to jsonl code for object detection * implement coco to jsonl for instance segmentation for iscrowd==1, generate coco data for instance segmentation notebook, add coco usage to instance segmentation notebook, verified that the jsonl files generated for both voc to jsonl and coco to jsonl (using the newly generated data) are equivalent * refactor mask to polygon using automl.dnn.vision helpers * handling for compressed and uncompressed rle in coco 2 jsonl converter * generate odFridgeObjects data in coco format using rle instead of polygons * demonstrate coco to jsonl for rle data * add docstrings to jsonl conversion code, test with notebooks again, clean up extraneous files * remove extraneous imports * add azureml-automl-dnn-vision pip install for voc to jsonl conversion * reformat with black * add od batch scoring notebook * respond to pr comments: remove unnecessary pip installs, revert notebook metadata, revert modified experiment names, remove az login calls * restore pip install for azureml-automl-dnn-vision, needed to pass gate * revert notebook metadata * copy masktools helpers from azureml-automl-dnn-vision directly into source code * remove unnecessary pip installs, reformat with black, restore metadata * include imports for pycocotools and simplification, necessary for jsonl converison * add skimage pip install * fix skimage -> scikit-image pip install * clarify markdown for pip install prompts Co-authored-by: sharma-riti <52715641+sharma-riti@users.noreply.github.com> --------- Co-authored-by: Rehaan Bhimani Co-authored-by: sharma-riti <52715641+sharma-riti@users.noreply.github.com> * Fix model name (#2236) Co-authored-by: Aishani Bhalla * Import model notebook changes (#2234) * notebook review changes done * Update import_model_into_registry.ipynb * notebook review changes done * notebook review changes done * Update import_model_into_registry.ipynb * review changes * Update import_model_into_registry.ipynb changed kernel * review changes * review changes * review changes * review changes * Rename and define models for ncd MOE script (#2237) * rename models for ncd scenario * fix lightgbm model name --------- Co-authored-by: Aishani Bhalla * Blurb for agreement with license terms (#2243) * notebook updated for cela * notebook updated for cela * notebook updated for cela * notebook updated for cela tnc * notebook updated for cela * notebook updated for cela * notebook updated for cela * notebook updated for cela * notebook updated for cela * notebook updated for cela * notebook updated for cela * notebook updated for cela * ASR batch endpoint example (#2225) Add speech recognition example using batch endpoints. * Add batch summarization example (#2226) Add summarization example using batch endpoints. Includes a cropping step on the data, which will need to be removed when the model no longer needs this to be done by the user and crops inputs during preprocessing. * fix typos in the low priority notebook. (#2246) used local scoped variable within the cluster creation function. * add env to inference config (#2240) Co-authored-by: Miseon Park * Fix notebooks * Fix notebooks * Updating model versions (#2244) * Updating model versions * Rename ft pipeline names in notebooks * updating the model version and the registry * changing the job name --------- Co-authored-by: Pavan Manoj Jonnalagadda * Small notebook fix * Small notebook fix * fixing S3 URL format (#2251) * #2156 - fix dir for README link (#2210) Co-authored-by: jgussman * Fix Git hub dau to use metrics and mltable; fix linting in bike share * Fix Git hub dau to use metrics and mltable; fix linting in bike share * Update sample "image_classification_keras_minist_convnet" conda dependency (#2249) * update * update --------- Co-authored-by: Ying Chen <2601502859@qq.com> * Rename job_service_type to type (#2253) * Remove `kubernetes-compute-*` workflows that aren't testing a customer facing sample (#2252) * Remove kubernetes-compute* workflows that don't appear to have a customer facing example * Remove .github/kubernetes-compute * Fixes * Fixes * Fix OJ notebook * Fix OJ notebook * Demand forecasting using many models (#2238) * many models demand forecasting * updated README.md * updated azure workflows * changed compute to dedicated, removed references to TCN * resolved most of Nikolay's comments * changed VM type * fixed formatting * changed to a smaller compute * Changed credentials call in yml * Revert "Changed credentials call in yml" This reverts commit 0de650c1b2f24ff241538200aed1d632ee7bd4ec. * Changed credentials call in yml * remove bert example in Nebulaml (#2258) Co-authored-by: xiaoranli * remove news summary dataset from repo, download using script in sample (#2250) * remove news summary dataset from repo, download using script in sample * fix to cli sample to remove dataset from repo * typo fix * formatting * Fix shared models components environments (#2262) * revert #2182 becuse data issues (#2264) * change model name for nlp to use kebab-case model names instead of enum or snake_case (#2266) Co-authored-by: Rehaan Bhimani * Whitelist more warning * Whitelist more warning * Whitelist another warning. * Whitelist another warning. * Suppress warnings * Suppress warnings * Try warning suppression in different place * Try warning suppression in different place * Add forecasting dependencies to Github dau * Fix * Handle warnings properly * Suppress warnings * Exclude downloading message from warnings * Fix warning --------- Co-authored-by: amah Co-authored-by: Ali M Co-authored-by: jeff-shepherd <39775772+jeff-shepherd@users.noreply.github.com> Co-authored-by: SagarikaKengunte <56979207+SagarikaKengunte@users.noreply.github.com> Co-authored-by: Bala P V <33712765+balapv@users.noreply.github.com> Co-authored-by: Rahul Kumar <74648335+iamrk04@users.noreply.github.com> Co-authored-by: Rahul Kumar Co-authored-by: Facundo Santiago Co-authored-by: Rupal jain Co-authored-by: Matthieu Maitre Co-authored-by: Harneet Virk Co-authored-by: Sheri Gilley Co-authored-by: Xiaoxiao Li <115597441+MelonXiaoxiao@users.noreply.github.com> Co-authored-by: daholste <43974253+daholste@users.noreply.github.com> Co-authored-by: Roope Astala Co-authored-by: Nikolay Rovinskiy Co-authored-by: seanyao1 Co-authored-by: Yisheng Yao Co-authored-by: Gaurav Gupta <47334368+gaugup@users.noreply.github.com> Co-authored-by: Scott Vickers <2232140+code-vicar@users.noreply.github.com> Co-authored-by: Alex Wallace <80542152+xanwal@users.noreply.github.com> Co-authored-by: HrishikeshGeedMS <113683726+HrishikeshGeedMS@users.noreply.github.com> Co-authored-by: nancy-mejia <106106141+nancy-mejia@users.noreply.github.com> Co-authored-by: nancy-mejia Co-authored-by: Pritam Kumar Das Co-authored-by: Pritam Kumar Das Co-authored-by: Zeliang Tian <83852443+zetiaatgithub@users.noreply.github.com> Co-authored-by: Zhengfei Wang <38847871+zhengfeiwang@users.noreply.github.com> Co-authored-by: Abraham Omorogbe Co-authored-by: kdestin <101366538+kdestin@users.noreply.github.com> Co-authored-by: Mope Akande <17515964+msakande@users.noreply.github.com> Co-authored-by: Ramu Vadthyavath Co-authored-by: rdondera-microsoft <98922913+rdondera-microsoft@users.noreply.github.com> Co-authored-by: Vadthyavath Ram <7171558+vadthyavath@users.noreply.github.com> Co-authored-by: sharma-riti <52715641+sharma-riti@users.noreply.github.com> Co-authored-by: jyravi <56615890+jyravi@users.noreply.github.com> Co-authored-by: Xingzhi Zhang <37076709+elliotzh@users.noreply.github.com> Co-authored-by: Shohei Nagata Co-authored-by: MaurisLucis Co-authored-by: Nick Mecklenburg Co-authored-by: cassieesvelt <73311224+cassieesvelt@users.noreply.github.com> Co-authored-by: Korin <0mza987@gmail.com> Co-authored-by: Bhavana Co-authored-by: bhavanatumma Co-authored-by: Ying Chen Co-authored-by: Ying Chen <2601502859@qq.com> Co-authored-by: Miseon Co-authored-by: Miseon Park Co-authored-by: Han Wang Co-authored-by: Hugo Aponte Co-authored-by: Jun <64982533+robertq0910@users.noreply.github.com> Co-authored-by: Ubuntu Co-authored-by: Chuan Tian <77308530+ctian-msft@users.noreply.github.com> Co-authored-by: Vivek Dani <110168656+vivek-dani@users.noreply.github.com> Co-authored-by: Jun Qi Co-authored-by: saoh <90349400+Frogglew@users.noreply.github.com> Co-authored-by: Neehar Duvvuri <40341266+needuv@users.noreply.github.com> Co-authored-by: Clement Wang <47586720+wangchao1230@users.noreply.github.com> Co-authored-by: Clement Wang Co-authored-by: Martin-Jia <54347731+Martin-Jia@users.noreply.github.com> Co-authored-by: Mingda Jia Co-authored-by: Alain LI <67991576+alainli0928@users.noreply.github.com> Co-authored-by: Roman Lutz Co-authored-by: Diondra <16376603+diondrapeck@users.noreply.github.com> Co-authored-by: Komnus丶Q <40655746+quchuyuan@users.noreply.github.com> Co-authored-by: xiaolewen Co-authored-by: Xiaole Wen Co-authored-by: Kenwiggan <108840031+Kenwiggan@users.noreply.github.com> Co-authored-by: Kenwiggan Co-authored-by: guanghechen <42513619+guanghechen@users.noreply.github.com> Co-authored-by: tongy-msft <91754176+tongyu-microsoft@users.noreply.github.com> Co-authored-by: vbejan-msft <65432549+vlbejan@users.noreply.github.com> Co-authored-by: Samuel Kemp Co-authored-by: yetamsft <71487683+yetamsft@users.noreply.github.com> Co-authored-by: Srujan Saggam <41802116+srsaggam@users.noreply.github.com> Co-authored-by: Ankit Singhal <30610298+singankit@users.noreply.github.com> Co-authored-by: Razvan Tanase Co-authored-by: savitamittal1 <39776179+savitamittal1@users.noreply.github.com> Co-authored-by: ccozianu Co-authored-by: SamGos93 <115183100+SamGos93@users.noreply.github.com> Co-authored-by: santiagxf Co-authored-by: Mathieu St-Louis <81435026+mastloui-msft@users.noreply.github.com> Co-authored-by: Kriti <53083330+fkriti@users.noreply.github.com> Co-authored-by: Sasidhar Kasturi Co-authored-by: SeokJin Han <4353157+dem108@users.noreply.github.com> Co-authored-by: Cloga Chen Co-authored-by: AmarBadal <51719265+AmarBadal@users.noreply.github.com> Co-authored-by: Vivian Li Co-authored-by: Raghu Ramaswamy <13340619+raghutillu@users.noreply.github.com> Co-authored-by: Raghu Ramaswamy Co-authored-by: alexwong2024 <125523190+alexwong2024@users.noreply.github.com> Co-authored-by: luwei Co-authored-by: Ziqi Wang Co-authored-by: Li, Xiaoran Co-authored-by: xiaoranli Co-authored-by: sarthaks95 <13473111+sarthaks95@users.noreply.github.com> Co-authored-by: Manoj Bableshwar Co-authored-by: skanakamedal <116672436+skanakamedal@users.noreply.github.com> Co-authored-by: HrishikeshGeedMS Co-authored-by: Narayanan Madhu Co-authored-by: SitaRam Chaitanya Kanakamedala Co-authored-by: Pavan Manoj Jonnalagadda Co-authored-by: Sumadhva Sridhar <109793745+susridhar@users.noreply.github.com> Co-authored-by: Sarthak Singhal Co-authored-by: Aditi Singh <114134940+s-aditi@users.noreply.github.com> Co-authored-by: Chandra Sekhar Gupta <38103118+guptha23@users.noreply.github.com> Co-authored-by: Chandra Sekhar Gupta Aravpalli Co-authored-by: amymsft <94562419+amymsft@users.noreply.github.com> Co-authored-by: Haritha Pallavi Bendapudi Co-authored-by: Rehaan Bhimani Co-authored-by: Rehaan Bhimani Co-authored-by: Aishani Bhalla Co-authored-by: Aishani Bhalla Co-authored-by: Jude Gussman <39165899+jgussman@users.noreply.github.com> Co-authored-by: jgussman Co-authored-by: Ada <61294872+adrosa@users.noreply.github.com> --- ...hub-dau-auto-ml-forecasting-github-dau.yml | 2 + sdk/python/forecasting-requirements.txt | 1 + .../auto-ml-forecasting-github-dau.ipynb | 16 ++-- .../helpers/generate_ml_table.py | 22 ++--- .../helpers/metrics_helper.py | 88 ++++++++----------- ...orecasting-orange-juice-sales-mlflow.ipynb | 48 ++++------ .../metrics_helper.py | 66 +++++++------- .../auto-ml-forecasting-bike-share.ipynb | 25 +++--- .../forecast/generate_ml_table.py | 22 ++--- .../metrics_helper.py | 88 ++++++++----------- ...g-task-energy-demand-advanced-mlflow.ipynb | 11 +-- .../metrics_helper.py | 66 +++++++------- .../validation/check_notebook_output.py | 2 + 13 files changed, 201 insertions(+), 256 deletions(-) diff --git a/.github/workflows/sdk-jobs-automl-standalone-jobs-automl-forecasting-github-dau-auto-ml-forecasting-github-dau.yml b/.github/workflows/sdk-jobs-automl-standalone-jobs-automl-forecasting-github-dau-auto-ml-forecasting-github-dau.yml index 5347e1e4bf..1c4ce610ed 100644 --- a/.github/workflows/sdk-jobs-automl-standalone-jobs-automl-forecasting-github-dau-auto-ml-forecasting-github-dau.yml +++ b/.github/workflows/sdk-jobs-automl-standalone-jobs-automl-forecasting-github-dau-auto-ml-forecasting-github-dau.yml @@ -36,6 +36,8 @@ jobs: run: pip install -r sdk/python/dev-requirements.txt - name: pip install mlflow reqs run: pip install -r sdk/python/mlflow-requirements.txt + - name: pip install forecasting reqs + run: pip install -r sdk/python/forecasting-requirements.txt - name: azure login uses: azure/login@v1 with: diff --git a/sdk/python/forecasting-requirements.txt b/sdk/python/forecasting-requirements.txt index bf80325c9b..e82332ab0d 100644 --- a/sdk/python/forecasting-requirements.txt +++ b/sdk/python/forecasting-requirements.txt @@ -1,6 +1,7 @@ # Specific requirements of the forecasting notebooks. # This file is deprecated and will be removed when # metrics package will be able to evaluate forecasting models. +azureml-metrics>=0.0.6.post1 scikit-learn>=0.19.0,<0.23.0 arch<=5.3.1 statsmodels>=0.11.0,<0.13.5 diff --git a/sdk/python/jobs/automl-standalone-jobs/automl-forecasting-github-dau/auto-ml-forecasting-github-dau.ipynb b/sdk/python/jobs/automl-standalone-jobs/automl-forecasting-github-dau/auto-ml-forecasting-github-dau.ipynb index 72f0fa3750..34f401d310 100644 --- a/sdk/python/jobs/automl-standalone-jobs/automl-forecasting-github-dau/auto-ml-forecasting-github-dau.ipynb +++ b/sdk/python/jobs/automl-standalone-jobs/automl-forecasting-github-dau/auto-ml-forecasting-github-dau.ipynb @@ -170,18 +170,22 @@ }, "outputs": [], "source": [ + "import os\n", + "import shutil\n", + "import pandas as pd\n", + "\n", "from helpers.generate_ml_table import create_ml_table\n", "\n", - "create_ml_table(\"github_dau_2011-2018_train.csv\", \"./data/training-mltable-folder\")\n", + "train = pd.read_csv(\"github_dau_2011-2018_train.csv\", parse_dates=[\"date\"])\n", + "create_ml_table(\n", + " train, \"github_dau_2011-2018_train.parquet\", \"./data/training-mltable-folder\"\n", + ")\n", "\n", "# Training MLTable defined locally, with local data to be uploaded\n", "my_training_data_input = Input(\n", " type=AssetTypes.MLTABLE, path=\"./data/training-mltable-folder\"\n", ")\n", "\n", - "import os\n", - "import shutil\n", - "\n", "os.makedirs(\"test_dataset\", exist_ok=True)\n", "shutil.copy(\n", " \"github_dau_2011-2018_test.csv\",\n", @@ -625,8 +629,6 @@ "metadata": {}, "outputs": [], "source": [ - "import pandas as pd\n", - "\n", "pd.DataFrame(best_run.data.metrics, index=[0]).T" ] }, @@ -897,7 +899,7 @@ "source": [ "from helpers.metrics_helper import calculate_metrics\n", "\n", - "calculate_metrics(fcst_df[target_column_name], fcst_df[\"predicted\"])" + "calculate_metrics(train, fcst_df, target_column_name, time_column_name)" ] }, { diff --git a/sdk/python/jobs/automl-standalone-jobs/automl-forecasting-github-dau/helpers/generate_ml_table.py b/sdk/python/jobs/automl-standalone-jobs/automl-forecasting-github-dau/helpers/generate_ml_table.py index b81e3ec389..3d2ea25593 100644 --- a/sdk/python/jobs/automl-standalone-jobs/automl-forecasting-github-dau/helpers/generate_ml_table.py +++ b/sdk/python/jobs/automl-standalone-jobs/automl-forecasting-github-dau/helpers/generate_ml_table.py @@ -1,17 +1,11 @@ -import shutil +import mltable import os -import yaml -def create_ml_table(csv_file, output, delimiter=",", encoding="ascii"): - os.makedirs(output, exist_ok=True) - fname = os.path.split(csv_file)[-1] - mltable = { - "paths": [{"file": f"./{fname}"}], - "transformations": [ - {"read_delimited": {"delimiter": delimiter, "encoding": encoding}} - ], - } - with open(os.path.join(output, "MLTable"), "w") as f: - f.write(yaml.dump(mltable)) - shutil.copy(csv_file, os.path.join(output, fname)) +def create_ml_table(data_frame, file_name, output_folder): + os.makedirs(output_folder, exist_ok=True) + data_path = os.path.join(output_folder, file_name) + data_frame.to_parquet(data_path, index=False) + paths = [{"file": data_path}] + ml_table = mltable.from_parquet_files(paths) + ml_table.save(output_folder) diff --git a/sdk/python/jobs/automl-standalone-jobs/automl-forecasting-github-dau/helpers/metrics_helper.py b/sdk/python/jobs/automl-standalone-jobs/automl-forecasting-github-dau/helpers/metrics_helper.py index b5e4c020b1..23aa6fd9cb 100644 --- a/sdk/python/jobs/automl-standalone-jobs/automl-forecasting-github-dau/helpers/metrics_helper.py +++ b/sdk/python/jobs/automl-standalone-jobs/automl-forecasting-github-dau/helpers/metrics_helper.py @@ -1,54 +1,38 @@ import pandas as pd -import numpy as np - - -def mean_absolute_error(actual, pred): - """Calculate mean absolute error.""" - return np.mean(np.abs(actual - pred)) - - -def mean_squared_error(actual, pred): - """Calculate mean squared error.""" - return np.mean((actual - pred) ** 2) - - -def r2_score(actual, pred): - """Calculate r2 score""" - numerator = ((actual - pred) ** 2).sum() - denominator = ((actual - np.mean(actual)) ** 2).sum() - - return 1.0 - numerator / denominator - - -def APE(actual, pred): - """ - Calculate absolute percentage error. - Returns a vector of APE values with same length as actual/pred. - """ - return 100 * np.abs((actual - pred) / actual) - - -def MAPE(actual, pred): - """ - Calculate mean absolute percentage error. - Remove NA and values where actual is close to zero - """ - not_na = ~(np.isnan(actual) | np.isnan(pred)) - not_zero = ~np.isclose(actual, 0.0) - actual_safe = actual[not_na & not_zero] - pred_safe = pred[not_na & not_zero] - return np.mean(APE(actual_safe, pred_safe)) - - -def calculate_metrics(actual, pred): - not_na = ~(np.isnan(actual) | np.isnan(pred)) - actual_safe = actual[not_na] - pred_safe = pred[not_na] - rmse = np.sqrt(mean_squared_error(actual_safe, pred_safe)) - metrics_dict = {} - metrics_dict["R2 score"] = r2_score(actual_safe, pred_safe) - metrics_dict["mean absolute error"] = mean_absolute_error(actual_safe, pred_safe) - metrics_dict["mean_absolute_percentage_error"] = MAPE(actual_safe, pred_safe) - metrics_dict["root mean squared error"] = rmse - +import warnings + +with warnings.catch_warnings(record=True): + from azureml.metrics import constants + from azureml.metrics import compute_metrics + + +def calculate_metrics( + X_train, + X_test, + actuals_colum_name, + time_column_name, + time_series_id_column_names=None, + predictions_column_name="predicted", +): + # Remove all NaNs in the train set + X_train = X_train.copy() + X_train.dropna(subset=[actuals_colum_name], inplace=True) + y_train = X_train.pop(actuals_colum_name).values + # Remove all NaNs in the test set. + X_test = X_test.copy() + X_test.dropna(subset=[actuals_colum_name, predictions_column_name], inplace=True) + actual = X_test.pop(actuals_colum_name).values + pred = X_test.pop(predictions_column_name).values + metrics = compute_metrics( + task_type=constants.Tasks.FORECASTING, + y_test=actual, + y_pred=pred, + X_test=X_test, + X_train=X_train, + y_train=y_train, + time_column_name=time_column_name, + time_series_id_column_names=time_series_id_column_names, + metrics=constants.Metric.SCALAR_REGRESSION_SET, + ) + metrics_dict = metrics[constants.Metric.Metrics] return pd.DataFrame(metrics_dict.items(), columns=["metric name", "score"]) diff --git a/sdk/python/jobs/automl-standalone-jobs/automl-forecasting-orange-juice-sales/automl-forecasting-orange-juice-sales-mlflow.ipynb b/sdk/python/jobs/automl-standalone-jobs/automl-forecasting-orange-juice-sales/automl-forecasting-orange-juice-sales-mlflow.ipynb index 80a8915d13..20776c96d3 100644 --- a/sdk/python/jobs/automl-standalone-jobs/automl-forecasting-orange-juice-sales/automl-forecasting-orange-juice-sales-mlflow.ipynb +++ b/sdk/python/jobs/automl-standalone-jobs/automl-forecasting-orange-juice-sales/automl-forecasting-orange-juice-sales-mlflow.ipynb @@ -234,10 +234,7 @@ "train, test = split_last_n_by_series_id(data_subset, n_test_periods)\n", "\n", "# Save the DataFrame objects to files\n", - "train_data_path = \"./data/dominicks_OJ_train.csv\"\n", - "test_data_path = \"./data/dominicks_OJ_test.csv\"\n", - "train.to_csv(train_data_path, index=False)\n", - "test.to_csv(test_data_path, index=False)" + "train_data_path = \"./data/dominicks_OJ_train.parquet\"" ] }, { @@ -257,28 +254,21 @@ "metadata": {}, "outputs": [], "source": [ - "import yaml\n", + "import mltable\n", "import os\n", - "import shutil\n", "\n", "\n", - "def create_folder_and_ml_table(csv_file, output, delimiter=\",\", encoding=\"ascii\"):\n", - " os.makedirs(output, exist_ok=True)\n", - " fname = os.path.split(csv_file)[-1]\n", - "\n", - " mltable = {\n", - " \"paths\": [{\"file\": f\"./{fname}\"}],\n", - " \"transformations\": [\n", - " {\"read_delimited\": {\"delimiter\": delimiter, \"encoding\": encoding}}\n", - " ],\n", - " }\n", - " with open(os.path.join(output, \"MLTable\"), \"w\") as f:\n", - " f.write(yaml.dump(mltable))\n", - " shutil.copy(csv_file, os.path.join(output, fname))\n", + "def create_folder_and_ml_table(data_frame, file_name, output_folder):\n", + " os.makedirs(output_folder, exist_ok=True)\n", + " data_path = os.path.join(output_folder, file_name)\n", + " data_frame.to_parquet(data_path, index=False)\n", + " paths = [{\"file\": data_path}]\n", + " ml_table = mltable.from_parquet_files(paths)\n", + " ml_table.save(output_folder)\n", "\n", "\n", "train_mltable_path = \"./data/training-mltable-folder\"\n", - "create_folder_and_ml_table(train_data_path, train_mltable_path)\n", + "create_folder_and_ml_table(train, \"dominicks_OJ_train.parquet\", train_mltable_path)\n", "\n", "# Training MLTable defined locally, with local data to be uploaded\n", "my_training_data_input = Input(type=AssetTypes.MLTABLE, path=train_mltable_path)" @@ -299,11 +289,7 @@ "outputs": [], "source": [ "os.makedirs(\"test_dataset\", exist_ok=True)\n", - "shutil.copy(\n", - " test_data_path,\n", - " \"test_dataset/dominicks_OJ_test.csv\",\n", - ")\n", - "\n", + "test.to_csv(os.path.join(\"test_dataset\", \"dominicks_OJ_test.csv\"), index=False)\n", "my_test_data_input = Input(\n", " type=AssetTypes.URI_FOLDER,\n", " path=\"test_dataset/\",\n", @@ -1134,7 +1120,9 @@ "source": [ "from metrics_helper import calculate_metrics\n", "\n", - "calculate_metrics(fcst_df[target_column_name], fcst_df[\"predicted\"])" + "calculate_metrics(\n", + " train, fcst_df, target_column_name, time_column_name, time_series_id_column_names\n", + ")" ] }, { @@ -1152,11 +1140,9 @@ "metadata": {}, "outputs": [], "source": [ - "history_data = pd.read_csv(\n", - " \"./data/training-mltable-folder/dominicks_OJ_train.csv\",\n", - " parse_dates=[time_column_name],\n", - ")\n", - "history_data = history_data.query(\"Store == 2 and Brand == 'dominicks'\")\n", + "history_data = mltable.load(\"./data/training-mltable-folder\").to_pandas_dataframe()\n", + "history_data[time_column_name] = pd.to_datetime(history_data[time_column_name])\n", + "history_data = history_data.query(\"Store == 2 and Brand == 'dominicks'\").copy()\n", "history_data.sort_values(by=time_column_name, inplace=True)\n", "history_data = history_data.iloc[-3 * forecast_horizon :]\n", "# Merge predictions to historic data.\n", diff --git a/sdk/python/jobs/automl-standalone-jobs/automl-forecasting-orange-juice-sales/metrics_helper.py b/sdk/python/jobs/automl-standalone-jobs/automl-forecasting-orange-juice-sales/metrics_helper.py index b2f35ce43a..23aa6fd9cb 100644 --- a/sdk/python/jobs/automl-standalone-jobs/automl-forecasting-orange-juice-sales/metrics_helper.py +++ b/sdk/python/jobs/automl-standalone-jobs/automl-forecasting-orange-juice-sales/metrics_helper.py @@ -1,40 +1,38 @@ import pandas as pd -import numpy as np +import warnings -from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score +with warnings.catch_warnings(record=True): + from azureml.metrics import constants + from azureml.metrics import compute_metrics -def APE(actual, pred): - """ - Calculate absolute percentage error. - Returns a vector of APE values with same length as actual/pred. - """ - return 100 * np.abs((actual - pred) / actual) - - -def MAPE(actual, pred): - """ - Calculate mean absolute percentage error. - Remove NA and values where actual is close to zero - """ - not_na = ~(np.isnan(actual) | np.isnan(pred)) - not_zero = ~np.isclose(actual, 0.0) - actual_safe = actual[not_na & not_zero] - pred_safe = pred[not_na & not_zero] - return np.mean(APE(actual_safe, pred_safe)) - - -def calculate_metrics(actual, pred): - not_na = ~(np.isnan(actual) | np.isnan(pred)) - actual_safe = actual[not_na] - pred_safe = pred[not_na] - rmse = np.sqrt(mean_squared_error(actual_safe, pred_safe)) - metrics_dict = {} - metrics_dict["R2 score"] = r2_score(actual_safe, pred_safe) - metrics_dict["mean absolute error"] = mean_absolute_error(actual_safe, pred_safe) - metrics_dict["mean_absolute_percentage_error"] = MAPE(actual_safe, pred_safe) - metrics_dict["root mean squared error"] = rmse - metrics_dict["normalized root mean squared error"] = rmse / np.abs( - actual_safe.max() - actual_safe.min() +def calculate_metrics( + X_train, + X_test, + actuals_colum_name, + time_column_name, + time_series_id_column_names=None, + predictions_column_name="predicted", +): + # Remove all NaNs in the train set + X_train = X_train.copy() + X_train.dropna(subset=[actuals_colum_name], inplace=True) + y_train = X_train.pop(actuals_colum_name).values + # Remove all NaNs in the test set. + X_test = X_test.copy() + X_test.dropna(subset=[actuals_colum_name, predictions_column_name], inplace=True) + actual = X_test.pop(actuals_colum_name).values + pred = X_test.pop(predictions_column_name).values + metrics = compute_metrics( + task_type=constants.Tasks.FORECASTING, + y_test=actual, + y_pred=pred, + X_test=X_test, + X_train=X_train, + y_train=y_train, + time_column_name=time_column_name, + time_series_id_column_names=time_series_id_column_names, + metrics=constants.Metric.SCALAR_REGRESSION_SET, ) + metrics_dict = metrics[constants.Metric.Metrics] return pd.DataFrame(metrics_dict.items(), columns=["metric name", "score"]) diff --git a/sdk/python/jobs/automl-standalone-jobs/automl-forecasting-task-bike-share/auto-ml-forecasting-bike-share.ipynb b/sdk/python/jobs/automl-standalone-jobs/automl-forecasting-task-bike-share/auto-ml-forecasting-bike-share.ipynb index 794de3b2dc..5c678911b1 100644 --- a/sdk/python/jobs/automl-standalone-jobs/automl-forecasting-task-bike-share/auto-ml-forecasting-bike-share.ipynb +++ b/sdk/python/jobs/automl-standalone-jobs/automl-forecasting-task-bike-share/auto-ml-forecasting-bike-share.ipynb @@ -156,11 +156,10 @@ "all_data.drop([\"casual\", \"registered\"], axis=1, inplace=True)\n", "\n", "os.makedirs(\"data\", exist_ok=True)\n", - "all_data[all_data[time_column_name] <= pd.Timestamp(\"2012-08-31\")].to_csv(\n", - " os.path.join(\"data\", \"bike-no-train.csv\"), index=False\n", - ")\n", "create_ml_table(\n", - " os.path.join(\"data\", \"bike-no-train.csv\"), \"./data/training-mltable-folder\"\n", + " all_data[all_data[time_column_name] <= pd.Timestamp(\"2012-08-31\")],\n", + " \"bike-no-train.parquet\",\n", + " \"./data/training-mltable-folder\",\n", ")" ] }, @@ -464,7 +463,7 @@ "source": [ "# Create the AutoML forecasting job with the related factory-function. Force the target column, to be integer type (To be added in phase 2)\n", "forecasting_job = automl.forecasting(\n", - " compute=cluster_name,\n", + " compute=\"bike-share-v2\",\n", " experiment_name=exp_name,\n", " training_data=my_training_data_input,\n", " target_column_name=target_column_name,\n", @@ -1018,7 +1017,8 @@ "source": [ "from metrics_helper import calculate_metrics\n", "\n", - "calculate_metrics(fcst_df[target_column_name], fcst_df[\"predicted\"])" + "train = all_data[all_data[time_column_name] <= pd.Timestamp(\"2012-08-31\")]\n", + "calculate_metrics(train, fcst_df, target_column_name, time_column_name)" ] }, { @@ -1046,15 +1046,15 @@ "metadata": {}, "outputs": [], "source": [ + "import mltable\n", + "\n", "fcst_df_h14 = (\n", " fcst_df.groupby(\"forecast_origin\", as_index=False)\n", " .last()\n", " .drop(columns=[\"forecast_origin\"])\n", ")\n", - "train_data = pd.read_csv(\n", - " \"./data/training-mltable-folder/bike-no-train.csv\",\n", - " parse_dates=[time_column_name],\n", - ")\n", + "train_data = mltable.load(\"./data/training-mltable-folder/\").to_pandas_dataframe()\n", + "train_data[time_column_name] = pd.to_datetime(train_data[time_column_name])\n", "test_data = pd.read_csv(\n", " \"./test_dataset/bike-no-test.csv\",\n", " parse_dates=[time_column_name],\n", @@ -1116,9 +1116,8 @@ "outputs": [], "source": [ "date_filter = (fcst_df.date != \"2012-10-29\") & (fcst_df.date < \"2012-11-22\")\n", - "calculate_metrics(\n", - " fcst_df[date_filter][target_column_name], fcst_df[date_filter][\"predicted\"]\n", - ")" + "\n", + "calculate_metrics(train, fcst_df[date_filter], target_column_name, time_column_name)" ] }, { diff --git a/sdk/python/jobs/automl-standalone-jobs/automl-forecasting-task-bike-share/forecast/generate_ml_table.py b/sdk/python/jobs/automl-standalone-jobs/automl-forecasting-task-bike-share/forecast/generate_ml_table.py index b81e3ec389..3d2ea25593 100644 --- a/sdk/python/jobs/automl-standalone-jobs/automl-forecasting-task-bike-share/forecast/generate_ml_table.py +++ b/sdk/python/jobs/automl-standalone-jobs/automl-forecasting-task-bike-share/forecast/generate_ml_table.py @@ -1,17 +1,11 @@ -import shutil +import mltable import os -import yaml -def create_ml_table(csv_file, output, delimiter=",", encoding="ascii"): - os.makedirs(output, exist_ok=True) - fname = os.path.split(csv_file)[-1] - mltable = { - "paths": [{"file": f"./{fname}"}], - "transformations": [ - {"read_delimited": {"delimiter": delimiter, "encoding": encoding}} - ], - } - with open(os.path.join(output, "MLTable"), "w") as f: - f.write(yaml.dump(mltable)) - shutil.copy(csv_file, os.path.join(output, fname)) +def create_ml_table(data_frame, file_name, output_folder): + os.makedirs(output_folder, exist_ok=True) + data_path = os.path.join(output_folder, file_name) + data_frame.to_parquet(data_path, index=False) + paths = [{"file": data_path}] + ml_table = mltable.from_parquet_files(paths) + ml_table.save(output_folder) diff --git a/sdk/python/jobs/automl-standalone-jobs/automl-forecasting-task-bike-share/metrics_helper.py b/sdk/python/jobs/automl-standalone-jobs/automl-forecasting-task-bike-share/metrics_helper.py index b5e4c020b1..23aa6fd9cb 100644 --- a/sdk/python/jobs/automl-standalone-jobs/automl-forecasting-task-bike-share/metrics_helper.py +++ b/sdk/python/jobs/automl-standalone-jobs/automl-forecasting-task-bike-share/metrics_helper.py @@ -1,54 +1,38 @@ import pandas as pd -import numpy as np - - -def mean_absolute_error(actual, pred): - """Calculate mean absolute error.""" - return np.mean(np.abs(actual - pred)) - - -def mean_squared_error(actual, pred): - """Calculate mean squared error.""" - return np.mean((actual - pred) ** 2) - - -def r2_score(actual, pred): - """Calculate r2 score""" - numerator = ((actual - pred) ** 2).sum() - denominator = ((actual - np.mean(actual)) ** 2).sum() - - return 1.0 - numerator / denominator - - -def APE(actual, pred): - """ - Calculate absolute percentage error. - Returns a vector of APE values with same length as actual/pred. - """ - return 100 * np.abs((actual - pred) / actual) - - -def MAPE(actual, pred): - """ - Calculate mean absolute percentage error. - Remove NA and values where actual is close to zero - """ - not_na = ~(np.isnan(actual) | np.isnan(pred)) - not_zero = ~np.isclose(actual, 0.0) - actual_safe = actual[not_na & not_zero] - pred_safe = pred[not_na & not_zero] - return np.mean(APE(actual_safe, pred_safe)) - - -def calculate_metrics(actual, pred): - not_na = ~(np.isnan(actual) | np.isnan(pred)) - actual_safe = actual[not_na] - pred_safe = pred[not_na] - rmse = np.sqrt(mean_squared_error(actual_safe, pred_safe)) - metrics_dict = {} - metrics_dict["R2 score"] = r2_score(actual_safe, pred_safe) - metrics_dict["mean absolute error"] = mean_absolute_error(actual_safe, pred_safe) - metrics_dict["mean_absolute_percentage_error"] = MAPE(actual_safe, pred_safe) - metrics_dict["root mean squared error"] = rmse - +import warnings + +with warnings.catch_warnings(record=True): + from azureml.metrics import constants + from azureml.metrics import compute_metrics + + +def calculate_metrics( + X_train, + X_test, + actuals_colum_name, + time_column_name, + time_series_id_column_names=None, + predictions_column_name="predicted", +): + # Remove all NaNs in the train set + X_train = X_train.copy() + X_train.dropna(subset=[actuals_colum_name], inplace=True) + y_train = X_train.pop(actuals_colum_name).values + # Remove all NaNs in the test set. + X_test = X_test.copy() + X_test.dropna(subset=[actuals_colum_name, predictions_column_name], inplace=True) + actual = X_test.pop(actuals_colum_name).values + pred = X_test.pop(predictions_column_name).values + metrics = compute_metrics( + task_type=constants.Tasks.FORECASTING, + y_test=actual, + y_pred=pred, + X_test=X_test, + X_train=X_train, + y_train=y_train, + time_column_name=time_column_name, + time_series_id_column_names=time_series_id_column_names, + metrics=constants.Metric.SCALAR_REGRESSION_SET, + ) + metrics_dict = metrics[constants.Metric.Metrics] return pd.DataFrame(metrics_dict.items(), columns=["metric name", "score"]) diff --git a/sdk/python/jobs/automl-standalone-jobs/automl-forecasting-task-energy-demand/automl-forecasting-task-energy-demand-advanced-mlflow.ipynb b/sdk/python/jobs/automl-standalone-jobs/automl-forecasting-task-energy-demand/automl-forecasting-task-energy-demand-advanced-mlflow.ipynb index 227610ed54..729c1d022e 100644 --- a/sdk/python/jobs/automl-standalone-jobs/automl-forecasting-task-energy-demand/automl-forecasting-task-energy-demand-advanced-mlflow.ipynb +++ b/sdk/python/jobs/automl-standalone-jobs/automl-forecasting-task-energy-demand/automl-forecasting-task-energy-demand-advanced-mlflow.ipynb @@ -985,9 +985,14 @@ "metadata": {}, "outputs": [], "source": [ + "import mltable\n", "from metrics_helper import calculate_metrics\n", "\n", - "calculate_metrics(fcst_df[target_column_name], fcst_df[\"predicted\"])" + "ml_table = mltable.load(\"./data/training-mltable-folder\")\n", + "history_data = ml_table.to_pandas_dataframe()\n", + "history_data[time_column_name] = pd.to_datetime(history_data[time_column_name])\n", + "\n", + "calculate_metrics(history_data, fcst_df, target_column_name, time_column_name)" ] }, { @@ -1005,10 +1010,6 @@ "metadata": {}, "outputs": [], "source": [ - "history_data = pd.read_csv(\n", - " \"./data/training-mltable-folder/nyc_energy_training_clean.csv\",\n", - " parse_dates=[time_column_name],\n", - ")\n", "history_data.sort_values(by=time_column_name, inplace=True)\n", "history_data = history_data.iloc[-3 * forecast_horizon :]\n", "# Merge predictions to historic data.\n", diff --git a/sdk/python/jobs/automl-standalone-jobs/automl-forecasting-task-energy-demand/metrics_helper.py b/sdk/python/jobs/automl-standalone-jobs/automl-forecasting-task-energy-demand/metrics_helper.py index b2f35ce43a..23aa6fd9cb 100644 --- a/sdk/python/jobs/automl-standalone-jobs/automl-forecasting-task-energy-demand/metrics_helper.py +++ b/sdk/python/jobs/automl-standalone-jobs/automl-forecasting-task-energy-demand/metrics_helper.py @@ -1,40 +1,38 @@ import pandas as pd -import numpy as np +import warnings -from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score +with warnings.catch_warnings(record=True): + from azureml.metrics import constants + from azureml.metrics import compute_metrics -def APE(actual, pred): - """ - Calculate absolute percentage error. - Returns a vector of APE values with same length as actual/pred. - """ - return 100 * np.abs((actual - pred) / actual) - - -def MAPE(actual, pred): - """ - Calculate mean absolute percentage error. - Remove NA and values where actual is close to zero - """ - not_na = ~(np.isnan(actual) | np.isnan(pred)) - not_zero = ~np.isclose(actual, 0.0) - actual_safe = actual[not_na & not_zero] - pred_safe = pred[not_na & not_zero] - return np.mean(APE(actual_safe, pred_safe)) - - -def calculate_metrics(actual, pred): - not_na = ~(np.isnan(actual) | np.isnan(pred)) - actual_safe = actual[not_na] - pred_safe = pred[not_na] - rmse = np.sqrt(mean_squared_error(actual_safe, pred_safe)) - metrics_dict = {} - metrics_dict["R2 score"] = r2_score(actual_safe, pred_safe) - metrics_dict["mean absolute error"] = mean_absolute_error(actual_safe, pred_safe) - metrics_dict["mean_absolute_percentage_error"] = MAPE(actual_safe, pred_safe) - metrics_dict["root mean squared error"] = rmse - metrics_dict["normalized root mean squared error"] = rmse / np.abs( - actual_safe.max() - actual_safe.min() +def calculate_metrics( + X_train, + X_test, + actuals_colum_name, + time_column_name, + time_series_id_column_names=None, + predictions_column_name="predicted", +): + # Remove all NaNs in the train set + X_train = X_train.copy() + X_train.dropna(subset=[actuals_colum_name], inplace=True) + y_train = X_train.pop(actuals_colum_name).values + # Remove all NaNs in the test set. + X_test = X_test.copy() + X_test.dropna(subset=[actuals_colum_name, predictions_column_name], inplace=True) + actual = X_test.pop(actuals_colum_name).values + pred = X_test.pop(predictions_column_name).values + metrics = compute_metrics( + task_type=constants.Tasks.FORECASTING, + y_test=actual, + y_pred=pred, + X_test=X_test, + X_train=X_train, + y_train=y_train, + time_column_name=time_column_name, + time_series_id_column_names=time_series_id_column_names, + metrics=constants.Metric.SCALAR_REGRESSION_SET, ) + metrics_dict = metrics[constants.Metric.Metrics] return pd.DataFrame(metrics_dict.items(), columns=["metric name", "score"]) diff --git a/v1/scripts/validation/check_notebook_output.py b/v1/scripts/validation/check_notebook_output.py index 01e8328786..69f77df488 100644 --- a/v1/scripts/validation/check_notebook_output.py +++ b/v1/scripts/validation/check_notebook_output.py @@ -52,6 +52,8 @@ "Readonly attribute primary_metric will be ignored", "Downloading artifact ", "Warnings:", + "Downloading builder script", + "Downloading extra modules", "custom base image or base dockerfile detected", ]