diff --git a/examples/00_quick_start/sar_movieratings_with_azureml_designer.ipynb b/examples/00_quick_start/sar_movieratings_with_azureml_designer.ipynb index ec5cf005cb..0605e0558b 100644 --- a/examples/00_quick_start/sar_movieratings_with_azureml_designer.ipynb +++ b/examples/00_quick_start/sar_movieratings_with_azureml_designer.ipynb @@ -10,9 +10,9 @@ "\n", "[AzureML Designer](https://docs.microsoft.com/en-us/azure/machine-learning/concept-designer) lets you visually connect datasets and modules on an interactive canvas to create machine learning models. \n", "\n", - "![img](https://recodatasets.blob.core.windows.net/images/designer-drag-and-drop.gif)\n", + "One of the features of AzureML Designer is that it is possible for developers to integrate any python library to make it available as a module/component. In this notebook are are going to show how to integrate [SAR](sar_movielens.ipynb) and several other modules in Designer.\n", "\n", - "One of the features of AzureML Designer is that it is possible for developers to integrate any python library to make it available as a module. In this notebook are are going to show how to integrate [SAR](sar_movielens.ipynb) and several other modules in Designer\n", + "Note that custom module is renamed to component.\n", "\n", "\n", "## Installation\n", @@ -24,10 +24,11 @@ "# Uninstall azure-cli-ml (the `az ml` commands)\n", "az extension remove -n azure-cli-ml\n", "# Install local version of azure-cli-ml (which includes `az ml module` commands)\n", - "az extension add --source https://azuremlsdktestpypi.azureedge.net/CLI-SDK-Runners-Validation/13082891/azure_cli_ml-0.1.0.13082891-py3-none-any.whl --pip-extra-index-urls https://azuremlsdktestpypi.azureedge.net/CLI-SDK-Runners-Validation/13082891 --yes\n", + "CLI_SDK_VERSION=26005222\n", + "az extension add --source https://azuremlsdktestpypi.azureedge.net/CLI-SDK-Runners-Validation/$CLI_SDK_VERSION/azure_cli_ml-0.1.0.$CLI_SDK_VERSION-py3-none-any.whl --pip-extra-index-urls https://azuremlsdktestpypi.azureedge.net/CLI-SDK-Runners-Validation/$CLI_SDK_VERSION --yes --verbose\n", "```\n", "\n", - "## Module implementation\n", + "## Component implementation\n", "\n", "The scenario that we are going to reproduce in Designer, as a reference example, is the content of the [SAR quickstart notebook](sar_movielens.ipynb). In it, we load a dataset, split it into train and test sets, train SAR algorithm, predict using the test set and compute several ranking metrics (precision at k, recall at k, MAP and nDCG).\n", "\n", @@ -91,82 +92,76 @@ "Once we have the python entry, we need to create the yaml file that will interact with Designer, [precision_at_k.yaml](../../reco_utils/azureml/azureml_designer_modules/module_specs/precision_at_k.yaml).\n", "\n", "```yaml\n", - "moduleIdentifier: \n", - " namespace: microsoft.com/cat\n", - " moduleName: Precision at K\n", - " moduleVersion: 1.1.0\n", - "description: \"Precision at K metric from Recommenders repo: https://github.com/Microsoft/Recommenders.\"\n", - "metadata:\n", - " annotations:\n", - " tags: [\"Recommenders\", \"Metrics\"]\n", + "$schema: http://azureml/sdk-2-0/CommandComponent.json\n", + "name: microsoft.com.cat.precision_at_k\n", + "version: 1.1.1\n", + "display_name: Precision at K\n", + "type: CommandComponent\n", + "description: 'Precision at K metric from Recommenders repo: https://github.com/Microsoft/Recommenders.'\n", + "tags:\n", + " Recommenders:\n", + " Metrics:\n", "inputs:\n", - "- name: Rating true\n", - " type: DataFrameDirectory\n", - " description: True DataFrame.\n", - "- name: Rating pred\n", - " type: DataFrameDirectory\n", - " description: Predicted DataFrame.\n", - "- name: User column\n", - " type: String\n", - " default: UserId\n", - " description: Column name of user IDs.\n", - "- name: Item column\n", - " type: String\n", - " default: MovieId\n", - " description: Column name of item IDs.\n", - "- name: Rating column\n", - " type: String\n", - " default: Rating\n", - " description: Column name of ratings.\n", - "- name: Prediction column\n", - " type: String\n", - " default: prediction\n", - " description: Column name of predictions.\n", - "- name: Relevancy method\n", - " type: String\n", - " default: top_k\n", - " description: method for determining relevancy ['top_k', 'by_threshold'].\n", - "- name: Top k\n", - " type: Integer\n", - " default: 10\n", - " description: Number of top k items per user.\n", - "- name: Threshold\n", - " type: Float\n", - " default: 10.0\n", - " description: Threshold of top items per user.\n", + " rating_true:\n", + " type: AnyDirectory\n", + " description: True DataFrame.\n", + " optional: false\n", + " rating_pred:\n", + " type: AnyDirectory\n", + " description: Predicted DataFrame.\n", + " optional: false\n", + " user_column:\n", + " type: String\n", + " description: Column name of user IDs.\n", + " default: UserId\n", + " optional: false\n", + " item_column:\n", + " type: String\n", + " description: Column name of item IDs.\n", + " default: MovieId\n", + " optional: false\n", + " rating_column:\n", + " type: String\n", + " description: Column name of ratings.\n", + " default: Rating\n", + " optional: false\n", + " prediction_column:\n", + " type: String\n", + " description: Column name of predictions.\n", + " default: prediction\n", + " optional: false\n", + " relevancy_method:\n", + " type: String\n", + " description: method for determining relevancy ['top_k', 'by_threshold'].\n", + " default: top_k\n", + " optional: false\n", + " top_k:\n", + " type: Integer\n", + " description: Number of top k items per user.\n", + " default: 10\n", + " optional: false\n", + " threshold:\n", + " type: Float\n", + " description: Threshold of top items per user.\n", + " default: 10.0\n", + " optional: false\n", "outputs:\n", - "- name: Score\n", - " type: DataFrameDirectory\n", - " description: Precision at k (min=0, max=1).\n", - "implementation:\n", - " container:\n", - " amlEnvironment:\n", - " python:\n", - " condaDependenciesFile: sar_conda.yaml\n", - " additionalIncludes:\n", - " - ../../../\n", - " command: [python, reco_utils/azureml/azureml_designer_modules/entries/precision_at_k_entry.py]\n", - " args:\n", - " - --rating-true\n", - " - inputPath: Rating true\n", - " - --rating-pred\n", - " - inputPath: Rating pred\n", - " - --col-user\n", - " - inputValue: User column\n", - " - --col-item\n", - " - inputValue: Item column\n", - " - --col-rating\n", - " - inputValue: Rating column\n", - " - --col-prediction\n", - " - inputValue: Prediction column\n", - " - --relevancy-method\n", - " - inputValue: Relevancy method\n", - " - --k\n", - " - inputValue: Top k\n", - " - --threshold\n", - " - inputValue: Threshold\n", - " - --score-result\n", - " - outputPath: Score\n", + " score:\n", + " type: AnyDirectory\n", + " description: Precision at k (min=0, max=1).\n", + "code:\n", + " ../../../../\n", + "command: >-\n", + " python reco_utils/azureml/azureml_designer_modules/entries/precision_at_k_entry.py\n", + " --rating-true {inputs.rating_true} --rating-pred {inputs.rating_pred} --col-user\n", + " {inputs.user_column} --col-item {inputs.item_column} --col-rating {inputs.rating_column}\n", + " --col-prediction {inputs.prediction_column} --relevancy-method {inputs.relevancy_method}\n", + " --k {inputs.top_k} --threshold {inputs.threshold} --score-result {outputs.score}\n", + "environment:\n", + " conda:\n", + " conda_dependencies_file: reco_utils/azureml/azureml_designer_modules/module_specs/sar_conda.yaml\n", + " os: Linux\n", + "\n", "```\n", "\n", "In the yaml file we can see a number of sections. The heading defines attributes like name, version or description. In the section inputs, all inputs are defined. The two main dataframes have ports, which can be connected to other modules. The inputs without port appear in a canvas menu. The output is defined as a DataFrame as well. The last section, implementation, defines the conda environment, the associated python entry and the arguments to the python file.\n", @@ -237,15 +232,15 @@ } ], "source": [ - "# Regsiter modules with spec via Azure CLI\n", + "# Regsiter components with spec via Azure CLI\n", "root_path = os.path.abspath(os.path.join(os.getcwd(), \"../../\"))\n", "specs_folder = os.path.join(root_path, \"reco_utils/azureml/azureml_designer_modules/module_specs\")\n", "github_prefix = 'https://github.com/microsoft/recommenders/blob/master/reco_utils/azureml/azureml_designer_modules/module_specs/'\n", "specs = os.listdir(specs_folder)\n", "for spec in specs:\n", " spec_path = github_prefix + spec\n", - " print(f\"Start to register module spec: {spec} ...\")\n", - " subprocess.run(f\"az ml module register --spec-file {spec_path}\", shell=True)\n", + " print(f\"Start to register component spec: {spec} ...\")\n", + " subprocess.run(f\"az ml component create --file {spec_path}\", shell=True)\n", " print(f\"Done.\")" ] }, @@ -257,7 +252,7 @@ "\n", "Once the modules are registered, they will appear in the canvas as the module `Recommenders`. There you will be able to create a pipeline like this:\n", "\n", - "![img](https://recodatasets.blob.core.windows.net/images/azureml_designer_sar_precisionatk.png)\n", + "![img](https://raw.githubusercontent.com/Azure/AzureMachineLearningGallery/main/pipelines/sar-pipeline/sar-pipeline.png)\n", "\n", "Now, thanks to AzureML Designer, users can compute the latest state of the art algorithms in recommendation systems without writing a line of python code.\n", "\n", @@ -272,9 +267,13 @@ ], "metadata": { "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" + "name": "python3", + "display_name": "Python 3.6.8 64-bit ('test': conda)", + "metadata": { + "interpreter": { + "hash": "ad1389e27ccf93b6cb9b27912fdce5bd72b7d47f7c4b29627ffa9bc4b1e3e5d1" + } + } }, "language_info": { "codemirror_mode": { @@ -286,7 +285,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.6.10" + "version": "3.6.8-final" } }, "nbformat": 4, diff --git a/reco_utils/azureml/azureml_designer_modules/module_specs/map.yaml b/reco_utils/azureml/azureml_designer_modules/module_specs/map.yaml index cd95b58b7d..0a917b5ba8 100644 --- a/reco_utils/azureml/azureml_designer_modules/module_specs/map.yaml +++ b/reco_utils/azureml/azureml_designer_modules/module_specs/map.yaml @@ -1,76 +1,69 @@ -amlModuleIdentifier: - namespace: microsoft.com/cat - moduleName: MAP - moduleVersion: 1.1.1 -description: "Mean Average Precision at K metric from Recommenders repo: https://github.com/Microsoft/Recommenders." -metadata: - annotations: - tags: ["Recommenders", "Metrics"] +$schema: http://azureml/sdk-2-0/CommandComponent.json +name: microsoft.com.cat.map +version: 1.1.1 +display_name: MAP +type: CommandComponent +description: 'Mean Average Precision at K metric from Recommenders repo: https://github.com/Microsoft/Recommenders.' +tags: + Recommenders: + Metrics: inputs: -- name: Rating true - type: AnyDirectory - description: True DataFrame. -- name: Rating pred - type: AnyDirectory - description: Predicted DataFrame. -- name: User column - type: String - default: UserId - description: Column name of user IDs. -- name: Item column - type: String - default: MovieId - description: Column name of item IDs. -- name: Rating column - type: String - default: Rating - description: Column name of ratings. -- name: Prediction column - type: String - default: prediction - description: Column name of predictions. -- name: Relevancy method - type: String - default: top_k - description: method for determining relevancy ['top_k', 'by_threshold']. -- name: Top k - type: Integer - default: 10 - description: Number of top k items per user. -- name: Threshold - type: Float - default: 10.0 - description: Threshold of top items per user. + rating_true: + type: AnyDirectory + description: True DataFrame. + optional: false + rating_pred: + type: AnyDirectory + description: Predicted DataFrame. + optional: false + user_column: + type: String + description: Column name of user IDs. + default: UserId + optional: false + item_column: + type: String + description: Column name of item IDs. + default: MovieId + optional: false + rating_column: + type: String + description: Column name of ratings. + default: Rating + optional: false + prediction_column: + type: String + description: Column name of predictions. + default: prediction + optional: false + relevancy_method: + type: String + description: method for determining relevancy ['top_k', 'by_threshold']. + default: top_k + optional: false + top_k: + type: Integer + description: Number of top k items per user. + default: 10 + optional: false + threshold: + type: Float + description: Threshold of top items per user. + default: 10.0 + optional: false outputs: -- name: Score - type: AnyDirectory - description: MAP at k (min=0, max=1). -implementation: - container: - amlEnvironment: - python: - condaDependenciesFile: sar_conda.yaml - additionalIncludes: - - ../../../ - command: [python, reco_utils/azureml/azureml_designer_modules/entries/map_entry.py] - args: - - --rating-true - - inputPath: Rating true - - --rating-pred - - inputPath: Rating pred - - --col-user - - inputValue: User column - - --col-item - - inputValue: Item column - - --col-rating - - inputValue: Rating column - - --col-prediction - - inputValue: Prediction column - - --relevancy-method - - inputValue: Relevancy method - - --k - - inputValue: Top k - - --threshold - - inputValue: Threshold - - --score-result - - outputPath: Score \ No newline at end of file + score: + type: AnyDirectory + description: MAP at k (min=0, max=1). +code: + ../../../../ +command: >- + python reco_utils/azureml/azureml_designer_modules/entries/map_entry.py --rating-true + {inputs.rating_true} --rating-pred {inputs.rating_pred} --col-user {inputs.user_column} + --col-item {inputs.item_column} --col-rating {inputs.rating_column} --col-prediction + {inputs.prediction_column} --relevancy-method {inputs.relevancy_method} --k {inputs.top_k} + --threshold {inputs.threshold} --score-result {outputs.score} +environment: + conda: + conda_dependencies_file: reco_utils/azureml/azureml_designer_modules/module_specs/sar_conda.yaml + os: Linux diff --git a/reco_utils/azureml/azureml_designer_modules/module_specs/ndcg.yaml b/reco_utils/azureml/azureml_designer_modules/module_specs/ndcg.yaml index 678e7f6411..f394c57ed0 100644 --- a/reco_utils/azureml/azureml_designer_modules/module_specs/ndcg.yaml +++ b/reco_utils/azureml/azureml_designer_modules/module_specs/ndcg.yaml @@ -1,76 +1,70 @@ -amlModuleIdentifier: - namespace: microsoft.com/cat - moduleName: nDCG - moduleVersion: 1.1.1 -description: "Normalized Discounted Cumulative Gain (nDCG) at K metric from Recommenders repo: https://github.com/Microsoft/Recommenders." -metadata: - annotations: - tags: ["Recommenders", "Metrics"] +$schema: http://azureml/sdk-2-0/CommandComponent.json +name: microsoft.com.cat.ndcg +version: 1.1.1 +display_name: nDCG +type: CommandComponent +description: 'Normalized Discounted Cumulative Gain (nDCG) at K metric from Recommenders + repo: https://github.com/Microsoft/Recommenders.' +tags: + Recommenders: + Metrics: inputs: -- name: Rating true - type: AnyDirectory - description: True DataFrame. -- name: Rating pred - type: AnyDirectory - description: Predicted DataFrame. -- name: User column - type: String - default: UserId - description: Column name of user IDs. -- name: Item column - type: String - default: MovieId - description: Column name of item IDs. -- name: Rating column - type: String - default: Rating - description: Column name of ratings. -- name: Prediction column - type: String - default: prediction - description: Column name of predictions. -- name: Relevancy method - type: String - default: top_k - description: method for determining relevancy ['top_k', 'by_threshold']. -- name: Top k - type: Integer - default: 10 - description: Number of top k items per user. -- name: Threshold - type: Float - default: 10.0 - description: Threshold of top items per user. + rating_true: + type: AnyDirectory + description: True DataFrame. + optional: false + rating_pred: + type: AnyDirectory + description: Predicted DataFrame. + optional: false + user_column: + type: String + description: Column name of user IDs. + default: UserId + optional: false + item_column: + type: String + description: Column name of item IDs. + default: MovieId + optional: false + rating_column: + type: String + description: Column name of ratings. + default: Rating + optional: false + prediction_column: + type: String + description: Column name of predictions. + default: prediction + optional: false + relevancy_method: + type: String + description: method for determining relevancy ['top_k', 'by_threshold']. + default: top_k + optional: false + top_k: + type: Integer + description: Number of top k items per user. + default: 10 + optional: false + threshold: + type: Float + description: Threshold of top items per user. + default: 10.0 + optional: false outputs: -- name: Score - type: AnyDirectory - description: nDCG at k (min=0, max=1). -implementation: - container: - amlEnvironment: - python: - condaDependenciesFile: sar_conda.yaml - additionalIncludes: - - ../../../ - command: [python, reco_utils/azureml/azureml_designer_modules/entries/ndcg_entry.py] - args: - - --rating-true - - inputPath: Rating true - - --rating-pred - - inputPath: Rating pred - - --col-user - - inputValue: User column - - --col-item - - inputValue: Item column - - --col-rating - - inputValue: Rating column - - --col-prediction - - inputValue: Prediction column - - --relevancy-method - - inputValue: Relevancy method - - --k - - inputValue: Top k - - --threshold - - inputValue: Threshold - - --score-result - - outputPath: Score \ No newline at end of file + score: + type: AnyDirectory + description: nDCG at k (min=0, max=1). +code: + ../../../../ +command: >- + python reco_utils/azureml/azureml_designer_modules/entries/ndcg_entry.py --rating-true + {inputs.rating_true} --rating-pred {inputs.rating_pred} --col-user {inputs.user_column} + --col-item {inputs.item_column} --col-rating {inputs.rating_column} --col-prediction + {inputs.prediction_column} --relevancy-method {inputs.relevancy_method} --k {inputs.top_k} + --threshold {inputs.threshold} --score-result {outputs.score} +environment: + conda: + conda_dependencies_file: reco_utils/azureml/azureml_designer_modules/module_specs/sar_conda.yaml + os: Linux diff --git a/reco_utils/azureml/azureml_designer_modules/module_specs/precision_at_k.yaml b/reco_utils/azureml/azureml_designer_modules/module_specs/precision_at_k.yaml index 7b3009b668..c1a2978b24 100644 --- a/reco_utils/azureml/azureml_designer_modules/module_specs/precision_at_k.yaml +++ b/reco_utils/azureml/azureml_designer_modules/module_specs/precision_at_k.yaml @@ -1,76 +1,69 @@ -amlModuleIdentifier: - namespace: microsoft.com/cat - moduleName: Precision at K - moduleVersion: 1.1.1 -description: "Precision at K metric from Recommenders repo: https://github.com/Microsoft/Recommenders." -metadata: - annotations: - tags: ["Recommenders", "Metrics"] +$schema: http://azureml/sdk-2-0/CommandComponent.json +name: microsoft.com.cat.precision_at_k +version: 1.1.1 +display_name: Precision at K +type: CommandComponent +description: 'Precision at K metric from Recommenders repo: https://github.com/Microsoft/Recommenders.' +tags: + Recommenders: + Metrics: inputs: -- name: Rating true - type: AnyDirectory - description: True DataFrame. -- name: Rating pred - type: AnyDirectory - description: Predicted DataFrame. -- name: User column - type: String - default: UserId - description: Column name of user IDs. -- name: Item column - type: String - default: MovieId - description: Column name of item IDs. -- name: Rating column - type: String - default: Rating - description: Column name of ratings. -- name: Prediction column - type: String - default: prediction - description: Column name of predictions. -- name: Relevancy method - type: String - default: top_k - description: method for determining relevancy ['top_k', 'by_threshold']. -- name: Top k - type: Integer - default: 10 - description: Number of top k items per user. -- name: Threshold - type: Float - default: 10.0 - description: Threshold of top items per user. + rating_true: + type: AnyDirectory + description: True DataFrame. + optional: false + rating_pred: + type: AnyDirectory + description: Predicted DataFrame. + optional: false + user_column: + type: String + description: Column name of user IDs. + default: UserId + optional: false + item_column: + type: String + description: Column name of item IDs. + default: MovieId + optional: false + rating_column: + type: String + description: Column name of ratings. + default: Rating + optional: false + prediction_column: + type: String + description: Column name of predictions. + default: prediction + optional: false + relevancy_method: + type: String + description: method for determining relevancy ['top_k', 'by_threshold']. + default: top_k + optional: false + top_k: + type: Integer + description: Number of top k items per user. + default: 10 + optional: false + threshold: + type: Float + description: Threshold of top items per user. + default: 10.0 + optional: false outputs: -- name: Score - type: AnyDirectory - description: Precision at k (min=0, max=1). -implementation: - container: - amlEnvironment: - python: - condaDependenciesFile: sar_conda.yaml - additionalIncludes: - - ../../../ - command: [python, reco_utils/azureml/azureml_designer_modules/entries/precision_at_k_entry.py] - args: - - --rating-true - - inputPath: Rating true - - --rating-pred - - inputPath: Rating pred - - --col-user - - inputValue: User column - - --col-item - - inputValue: Item column - - --col-rating - - inputValue: Rating column - - --col-prediction - - inputValue: Prediction column - - --relevancy-method - - inputValue: Relevancy method - - --k - - inputValue: Top k - - --threshold - - inputValue: Threshold - - --score-result - - outputPath: Score \ No newline at end of file + score: + type: AnyDirectory + description: Precision at k (min=0, max=1). +code: + ../../../../ +command: >- + python reco_utils/azureml/azureml_designer_modules/entries/precision_at_k_entry.py + --rating-true {inputs.rating_true} --rating-pred {inputs.rating_pred} --col-user + {inputs.user_column} --col-item {inputs.item_column} --col-rating {inputs.rating_column} + --col-prediction {inputs.prediction_column} --relevancy-method {inputs.relevancy_method} + --k {inputs.top_k} --threshold {inputs.threshold} --score-result {outputs.score} +environment: + conda: + conda_dependencies_file: reco_utils/azureml/azureml_designer_modules/module_specs/sar_conda.yaml + os: Linux diff --git a/reco_utils/azureml/azureml_designer_modules/module_specs/recall_at_k.yaml b/reco_utils/azureml/azureml_designer_modules/module_specs/recall_at_k.yaml index 65c4940db4..042d790f36 100644 --- a/reco_utils/azureml/azureml_designer_modules/module_specs/recall_at_k.yaml +++ b/reco_utils/azureml/azureml_designer_modules/module_specs/recall_at_k.yaml @@ -1,76 +1,69 @@ -amlModuleIdentifier: - namespace: microsoft.com/cat - moduleName: Recall at K - moduleVersion: 1.1.1 -description: "Recall at K metric from Recommenders repo: https://github.com/Microsoft/Recommenders." -metadata: - annotations: - tags: ["Recommenders", "Metrics"] +$schema: http://azureml/sdk-2-0/CommandComponent.json +name: microsoft.com.cat.recall_at_k +version: 1.1.1 +display_name: Recall at K +type: CommandComponent +description: 'Recall at K metric from Recommenders repo: https://github.com/Microsoft/Recommenders.' +tags: + Recommenders: + Metrics: inputs: -- name: Rating true - type: AnyDirectory - description: True DataFrame. -- name: Rating pred - type: AnyDirectory - description: Predicted DataFrame. -- name: User column - type: String - default: UserId - description: Column name of user IDs. -- name: Item column - type: String - default: MovieId - description: Column name of item IDs. -- name: Rating column - type: String - default: Rating - description: Column name of ratings. -- name: Prediction column - type: String - default: prediction - description: Column name of predictions. -- name: Relevancy method - type: String - default: top_k - description: method for determining relevancy ['top_k', 'by_threshold']. -- name: Top k - type: Integer - default: 10 - description: Number of top k items per user. -- name: Threshold - type: Float - default: 10.0 - description: Threshold of top items per user. + rating_true: + type: AnyDirectory + description: True DataFrame. + optional: false + rating_pred: + type: AnyDirectory + description: Predicted DataFrame. + optional: false + user_column: + type: String + description: Column name of user IDs. + default: UserId + optional: false + item_column: + type: String + description: Column name of item IDs. + default: MovieId + optional: false + rating_column: + type: String + description: Column name of ratings. + default: Rating + optional: false + prediction_column: + type: String + description: Column name of predictions. + default: prediction + optional: false + relevancy_method: + type: String + description: method for determining relevancy ['top_k', 'by_threshold']. + default: top_k + optional: false + top_k: + type: Integer + description: Number of top k items per user. + default: 10 + optional: false + threshold: + type: Float + description: Threshold of top items per user. + default: 10.0 + optional: false outputs: -- name: Score - type: AnyDirectory - description: Recall at k (min=0, max=1). -implementation: - container: - amlEnvironment: - python: - condaDependenciesFile: sar_conda.yaml - additionalIncludes: - - ../../../ - command: [python, reco_utils/azureml/azureml_designer_modules/entries/recall_at_k_entry.py] - args: - - --rating-true - - inputPath: Rating true - - --rating-pred - - inputPath: Rating pred - - --col-user - - inputValue: User column - - --col-item - - inputValue: Item column - - --col-rating - - inputValue: Rating column - - --col-prediction - - inputValue: Prediction column - - --relevancy-method - - inputValue: Relevancy method - - --k - - inputValue: Top k - - --threshold - - inputValue: Threshold - - --score-result - - outputPath: Score \ No newline at end of file + score: + type: AnyDirectory + description: Recall at k (min=0, max=1). +code: + ../../../../ +command: >- + python reco_utils/azureml/azureml_designer_modules/entries/recall_at_k_entry.py + --rating-true {inputs.rating_true} --rating-pred {inputs.rating_pred} --col-user + {inputs.user_column} --col-item {inputs.item_column} --col-rating {inputs.rating_column} + --col-prediction {inputs.prediction_column} --relevancy-method {inputs.relevancy_method} + --k {inputs.top_k} --threshold {inputs.threshold} --score-result {outputs.score} +environment: + conda: + conda_dependencies_file: reco_utils/azureml/azureml_designer_modules/module_specs/sar_conda.yaml + os: Linux diff --git a/reco_utils/azureml/azureml_designer_modules/module_specs/sar_score.yaml b/reco_utils/azureml/azureml_designer_modules/module_specs/sar_score.yaml index 00f54c461d..178ce331bb 100644 --- a/reco_utils/azureml/azureml_designer_modules/module_specs/sar_score.yaml +++ b/reco_utils/azureml/azureml_designer_modules/module_specs/sar_score.yaml @@ -1,80 +1,82 @@ -amlModuleIdentifier: - namespace: microsoft.com/cat - moduleName: SAR Scoring - moduleVersion: 1.1.1 +$schema: http://azureml/sdk-2-0/CommandComponent.json +name: microsoft.com.cat.sar_scoring +version: 1.1.1 +display_name: SAR Scoring +type: CommandComponent description: | - Python SAR Recommenders - repo: https://github.com/Microsoft/Recommenders -metadata: - annotations: - tags: ["Recommenders"] + Python SAR Recommenders + repo: https://github.com/Microsoft/Recommenders +tags: + Recommenders: inputs: -- name: Trained model - type: AnyDirectory - description: The directory contains SAR model. -- name: Dataset to score - type: AnyDirectory - description: Dataset to score -- name: Score type - type: Enum - default: Item recommendation - description: The type of score which the recommender should output - options: - - Rating prediction: - - name: Items to predict - type: Enum - default: Items in score set - description: The set of items to predict for test users - options: - - Items in training set - - Items in score set - - Item recommendation: - - name: Ranking metric - type: Enum - default: Rating - description: The metric of ranking used in item recommendation - options: - - Rating: - - name: Remove seen items - type: Boolean - description: Flag to remove items seen in training from recommendation - default: false - - Similarity - - Popularity - - name: Top k - type: Integer - default: 10 - description: The number of top items to recommend. - min: 1 - - name: Sort top k - type: Boolean - description: Flag to sort top k results. - default: true -- name: Normalize - type: Boolean - default: false - description: Flag to normalize predictions to scale of original ratings + trained_model: + type: AnyDirectory + description: The directory contains SAR model. + optional: false + dataset_to_score: + type: AnyDirectory + description: Dataset to score + optional: false + score_type: + type: Enum + description: The type of score which the recommender should output + enum: + - Rating prediction + - Item recommendation + default: Item recommendation + optional: false + items_to_predict: + type: Enum + description: The set of items to predict for test users + enum: + - Items in training set + - Items in score set + default: Items in score set + optional: true + ranking_metric: + type: Enum + description: The metric of ranking used in item recommendation + enum: + - Rating + - Similarity + - Popularity + default: Rating + optional: true + remove_seen_items: + type: Boolean + description: Flag to remove items seen in training from recommendation + default: false + optional: true + top_k: + type: Integer + description: The number of top items to recommend. + min: 1 + default: 10 + optional: true + sort_top_k: + type: Boolean + description: Flag to sort top k results. + default: true + optional: true + normalize: + type: Boolean + description: Flag to normalize predictions to scale of original ratings + default: false + optional: false outputs: -- name: Score result - type: AnyDirectory - description: Ratings or items to output -implementation: - container: - amlEnvironment: - python: - condaDependenciesFile: sar_conda.yaml - additionalIncludes: - - ../../../ - command: [python, reco_utils/azureml/azureml_designer_modules/entries/score_sar_entry.py] - args: [ - --trained-model, {inputPath: Trained model}, - --dataset-to-score, {inputPath: Dataset to score}, - --score-type, {inputValue: Score type}, - [--items-to-predict, {inputValue: Items to predict}], - --normalize, {inputValue: Normalize}, - [--ranking-metric, {inputValue: Ranking metric}], - [--top-k, {inputValue: Top k}], - [--sort-top-k, {inputValue: Sort top k}], - [--remove-seen-items, {inputValue: Remove seen items}], - --score-result, {outputPath: Score result}, - ] + score_result: + type: AnyDirectory + description: Ratings or items to output +code: + ../../../../ +command: >- + python reco_utils/azureml/azureml_designer_modules/entries/score_sar_entry.py --trained-model + {inputs.trained_model} --dataset-to-score {inputs.dataset_to_score} --score-type + {inputs.score_type} [--items-to-predict {inputs.items_to_predict}] --normalize {inputs.normalize} + [--ranking-metric {inputs.ranking_metric}] [--top-k {inputs.top_k}] [--sort-top-k + {inputs.sort_top_k}] [--remove-seen-items {inputs.remove_seen_items}] --score-result + {outputs.score_result} +environment: + conda: + conda_dependencies_file: reco_utils/azureml/azureml_designer_modules/module_specs/sar_conda.yaml + os: Linux diff --git a/reco_utils/azureml/azureml_designer_modules/module_specs/sar_train.yaml b/reco_utils/azureml/azureml_designer_modules/module_specs/sar_train.yaml index 4787fe84ad..23c99a9ad8 100644 --- a/reco_utils/azureml/azureml_designer_modules/module_specs/sar_train.yaml +++ b/reco_utils/azureml/azureml_designer_modules/module_specs/sar_train.yaml @@ -1,65 +1,58 @@ -amlModuleIdentifier: - namespace: microsoft.com/cat - moduleName: SAR Training - moduleVersion: 1.1.1 -metadata: - annotations: - tags: ["Recommenders"] -description: "SAR Train from Recommenders repo: https://github.com/Microsoft/Recommenders." +$schema: http://azureml/sdk-2-0/CommandComponent.json +name: microsoft.com.cat.sar_training +version: 1.1.1 +display_name: SAR Training +type: CommandComponent +description: 'SAR Train from Recommenders repo: https://github.com/Microsoft/Recommenders.' +tags: + Recommenders: inputs: -- name: Input path - type: AnyDirectory - description: The directory contains dataframe. -- name: User column - type: String - default: UserId - description: Column name of user IDs. -- name: Item column - type: String - default: MovieId - description: Column name of item IDs. -- name: Rating column - type: String - default: Rating - description: Column name of rating. -- name: Timestamp column - type: String - default: Timestamp - description: Column name of timestamp. -- name: Normalize - type: Boolean - default: false - description: Flag to normalize predictions to scale of original ratings -- name: Time decay - type: Boolean - default: false - description: Flag to apply time decay + input_path: + type: AnyDirectory + description: The directory contains dataframe. + optional: false + user_column: + type: String + description: Column name of user IDs. + default: UserId + optional: false + item_column: + type: String + description: Column name of item IDs. + default: MovieId + optional: false + rating_column: + type: String + description: Column name of rating. + default: Rating + optional: false + timestamp_column: + type: String + description: Column name of timestamp. + default: Timestamp + optional: false + normalize: + type: Boolean + description: Flag to normalize predictions to scale of original ratings + default: false + optional: false + time_decay: + type: Boolean + description: Flag to apply time decay + default: false + optional: false outputs: -- name: Output model - type: AnyDirectory - description: The output directory contains a trained model -implementation: - container: - amlEnvironment: - python: - condaDependenciesFile: sar_conda.yaml - additionalIncludes: - - ../../../ - command: [python, reco_utils/azureml/azureml_designer_modules/entries/train_sar_entry.py] - args: - - --input-path - - inputPath: Input path - - --col-user - - inputValue: User column - - --col-item - - inputValue: Item column - - --col-rating - - inputValue: Rating column - - --col-timestamp - - inputValue: Timestamp column - - --normalize - - inputValue: Normalize - - --time-decay - - inputValue: Time decay - - --output-model - - outputPath: Output model \ No newline at end of file + output_model: + type: AnyDirectory + description: The output directory contains a trained model +code: + ../../../../ +command: >- + python reco_utils/azureml/azureml_designer_modules/entries/train_sar_entry.py --input-path + {inputs.input_path} --col-user {inputs.user_column} --col-item {inputs.item_column} + --col-rating {inputs.rating_column} --col-timestamp {inputs.timestamp_column} --normalize + {inputs.normalize} --time-decay {inputs.time_decay} --output-model {outputs.output_model} +environment: + conda: + conda_dependencies_file: reco_utils/azureml/azureml_designer_modules/module_specs/sar_conda.yaml + os: Linux diff --git a/reco_utils/azureml/azureml_designer_modules/module_specs/stratified_splitter.yaml b/reco_utils/azureml/azureml_designer_modules/module_specs/stratified_splitter.yaml index 428b48b95b..e649ff6e0a 100644 --- a/reco_utils/azureml/azureml_designer_modules/module_specs/stratified_splitter.yaml +++ b/reco_utils/azureml/azureml_designer_modules/module_specs/stratified_splitter.yaml @@ -1,65 +1,54 @@ -amlModuleIdentifier: - namespace: microsoft.com/cat - moduleName: Stratified Splitter - moduleVersion: 1.1.1 -metadata: - annotations: - tags: ["Recommenders"] -description: "Python stratified splitter from Recommenders repo: https://github.com/Microsoft/Recommenders." +$schema: http://azureml/sdk-2-0/CommandComponent.json +name: microsoft.com.cat.stratified_splitter +version: 1.1.1 +display_name: Stratified Splitter +type: CommandComponent +description: 'Python stratified splitter from Recommenders repo: https://github.com/Microsoft/Recommenders.' +tags: + Recommenders: inputs: -- name: Input path - type: AnyDirectory - description: The directory contains dataframe. -- name: Ratio - type: Float - default: 0.75 - max: 1.0 - min: 0.0 - description: > - Ratio for splitting data. If it is a single float number, - it splits data into two halves and the ratio argument indicates the ratio of - training data set; if it is a list of float numbers, the splitter splits - data into several portions corresponding to the split ratios. If a list is - provided and the ratios are not summed to 1, they will be normalized. -- name: User column - type: String - default: UserId - description: Column name of user IDs. -- name: Item column - type: String - default: MovieId - description: Column name of item IDs. -- name: Seed - type: Integer - default: 42 - description: Seed. + input_path: + type: AnyDirectory + description: The directory contains dataframe. + optional: false + ratio: + type: Float + description: | + Ratio for splitting data. If it is a single float number, it splits data into two halves and the ratio argument indicates the ratio of training data set; if it is a list of float numbers, the splitter splits data into several portions corresponding to the split ratios. If a list is provided and the ratios are not summed to 1, they will be normalized. + min: 0.0 + max: 1.0 + default: 0.75 + optional: false + user_column: + type: String + description: Column name of user IDs. + default: UserId + optional: false + item_column: + type: String + description: Column name of item IDs. + default: MovieId + optional: false + seed: + type: Integer + description: Seed. + default: 42 + optional: false outputs: -- name: Output train data - type: AnyDirectory - description: The output directory contains a training dataframe. -- name: Output test data - type: AnyDirectory - description: The output directory contains a test dataframe. -implementation: - container: - amlEnvironment: - python: - condaDependenciesFile: sar_conda.yaml - additionalIncludes: - - ../../../ - command: [python, reco_utils/azureml/azureml_designer_modules/entries/stratified_splitter_entry.py] - args: - - --input-path - - inputPath: Input path - - --ratio - - inputValue: Ratio - - --col-user - - inputValue: User column - - --col-item - - inputValue: Item column - - --seed - - inputValue: Seed - - --output-train - - outputPath: Output train data - - --output-test - - outputPath: Output test data \ No newline at end of file + output_train_data: + type: AnyDirectory + description: The output directory contains a training dataframe. + output_test_data: + type: AnyDirectory + description: The output directory contains a test dataframe. +code: + ../../../../ +command: >- + python reco_utils/azureml/azureml_designer_modules/entries/stratified_splitter_entry.py + --input-path {inputs.input_path} --ratio {inputs.ratio} --col-user {inputs.user_column} + --col-item {inputs.item_column} --seed {inputs.seed} --output-train {outputs.output_train_data} + --output-test {outputs.output_test_data} +environment: + conda: + conda_dependencies_file: reco_utils/azureml/azureml_designer_modules/module_specs/sar_conda.yaml + os: Linux