Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reject asset upload with invalid id #273

Merged
merged 6 commits into from
Feb 24, 2022

Conversation

JAulet
Copy link
Contributor

@JAulet JAulet commented Dec 1, 2021

Excerpt from issue #209:

The actual id of the dataset asset in MLX was codenet-langclass. Notice the dash - which correctly replaced the underscore _ character. However, the component which generated the DataShim metadata used the value of the id field from the YAML file, with the id containing the underscore character.

Proposed fix:

Any _upload_XXX method in api/server/swagger_server/controllers_impl/XXX_service_controller_impl.py which takes an id (or similar field like model_identifier) from the uploaded YAML file to generate the asset id should call a new validate_id() method similar to the swagger_server.controllers_impl.validate_parameters function in api/server/swagger_server/controllers_impl/__init__.py:

api/server/swagger_server/controllers_impl/:

  • __init__.py, add validate_id(id: str) method
  • dataset_service_controller_impl.py, line 372: def _upload_dataset_yaml(...) ... use validate_id(id: str)
  • model_service_controller_impl.py, line 381: def _upload_model_yaml(...)... use validate_id(id: str)
  • notebook_service_controller_impl.py, line 392: def _upload_notebook_yaml(...) ... use validate_id(id: str)

Resolves #209

@ckadner ckadner removed the request for review from yhwang December 2, 2021 00:04
@ckadner ckadner changed the title [WIP] Added validate-id to __init__.py [WIP] Reject asset upload with invalid id Dec 2, 2021
@ckadner ckadner added the RCOS Potential work items for RCOS student interns label Dec 2, 2021
Copy link
Member

@ckadner ckadner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @JAulet -- the original problem was not that the id we generate was invalid, but the id field inside the YAML could still be invalid. And since that YAML itself was used by another component to get the id the DataShim logic failed. So this PR should add some code to verify the ids in the YAML adhere to the k8s identifier format.

@@ -396,6 +396,11 @@ def _upload_notebook_yaml(yaml_file_content: AnyStr, name=None, access_token=Non
requirements = yaml_dict["implementation"]["github"].get("requirements")
filter_categories = yaml_dict.get("filter_categories") or dict()

errors, status = validate_id(notebook_id)
Copy link
Member

@ckadner ckadner Feb 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's move this check above line 391 where we generate the notebook_id and validate the yaml_dict.get("id")

    errors, status = validate_id(yaml_dict.get("id"))

@@ -397,6 +397,11 @@ def _upload_model_yaml(yaml_file_content: AnyStr, name=None, existing_id=None):
if type(api_model.servable_tested_platforms) == str:
api_model.servable_tested_platforms = api_model.servable_tested_platforms.replace(", ", ",").split(",")

errors, status = validate_id(api_model.id)
Copy link
Member

@ckadner ckadner Feb 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's move this check to line 375 after we load the yaml and before we create the ApiModel object and validate the model_def.get("model_identifier")

    errors, status = validate_id(model_def.get("model_identifier"))

@@ -371,6 +371,11 @@ def _upload_dataset_yaml(yaml_file_content: AnyStr, name=None, existing_id=None)
# if yaml_dict.get("id") != dataset_id:
# raise ValueError(f"Dataset.id contains non k8s character: {yaml_dict.get('id')}")

errors, status = validate_id(dataset_id)
Copy link
Member

@ckadner ckadner Feb 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's move this check above line 366 after we load the YAML and before we generate the dataset_id and validate the yaml_dict.get("id")

    errors, status = validate_id(yaml_dict.get("id"))

call to check the id of the yaml file itself rather than generated id.

Signed-off-by: JAulet <[email protected]>
@JAulet
Copy link
Contributor Author

JAulet commented Feb 18, 2022

@ckadner Please Review.

@ckadner
Copy link
Member

ckadner commented Feb 18, 2022

@ckadner Please Review.

@JAulet -- did you try this out by uploading a model, dataset, and notebook YAML respectively with invalid ID values (and with valid ones)?

@ckadner ckadner marked this pull request as ready for review February 19, 2022 00:52
@ckadner ckadner changed the title [WIP] Reject asset upload with invalid id Reject asset upload with invalid id Feb 19, 2022
@JAulet
Copy link
Contributor Author

JAulet commented Feb 22, 2022

@ckadner Please Review.

@JAulet -- did you try this out by uploading a model, dataset, and notebook YAML respectively with invalid ID values (and with valid ones)?

Yes, I did. I got the correct output for each.

Copy link
Member

@ckadner ckadner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Thanks @JAulet

@mlx-bot
Copy link
Collaborator

mlx-bot commented Feb 24, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ckadner, JAulet

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ckadner ckadner merged commit 4e8c003 into machine-learning-exchange:main Feb 24, 2022
krishnakumar27 pushed a commit to krishnakumar27/mlx that referenced this pull request Mar 30, 2022
* Added validate-id to __init__.py
* Added check to notebook_service_controller_impl.py
* Added validate-id check to dataset_service_controller_impl.py and
  model_service_controller_impl.py.
* Moved validate_id check to before id is generated and changed
  function call to check the id of the yaml file itself rather than generated id.

Resolves machine-learning-exchange#209

Signed-off-by: JAulet <[email protected]>
Signed-off-by: Krishna Kumar <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved lgtm RCOS Potential work items for RCOS student interns
Projects
None yet
Development

Successfully merging this pull request may close these issues.

API should reject asset upload when id contain invalid characters
3 participants