Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(sdk): Add BaseModel to component_spec data classes #6372

Merged
merged 12 commits into from
Aug 25, 2021

Conversation

ji-yaqi
Copy link
Contributor

@ji-yaqi ji-yaqi commented Aug 17, 2021

No description provided.

@ji-yaqi
Copy link
Contributor Author

ji-yaqi commented Aug 18, 2021

/retest

Copy link
Member

@chensun chensun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Yaqi. Left a couple nitpicks but overall looks good.

default: null
type: str
labels: null
name: component_1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can be done in a later PR, but the name better be on the top. (likely we need some custom sort logic).

@chensun chensun requested review from neuromage and removed request for Ark-kun August 18, 2021 07:36
@ji-yaqi ji-yaqi changed the title feat(sdk): Add yaml_model to component_spec data classes feat(sdk): Add BaseModel to component_spec data classes Aug 18, 2021
Converter([bool], ['Boolean', 'Bool', 'bool'], _serialize_bool, _bool_deserializer_code, _bool_deserializer_definitions),
Converter([list], ['JsonArray', 'List', 'list'], _serialize_json, 'json.loads', 'import json'), # ! JSON map keys are always strings. Python converts all keys to strings without warnings
Converter([dict], ['JsonObject', 'Dictionary', 'Dict', 'dict'], _serialize_json, 'json.loads', 'import json'), # ! JSON map keys are always strings. Python converts all keys to strings without warnings
Converter([str,'str'], ['String', 'str'], _serialize_str, 'str', None),
Copy link
Member

@chensun chensun Aug 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is needed to convert the type 'str' to actual type 'String'. Otherwise, the input as 'str' will be identified as None type

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this is the right place to change though. It looks to me the code build around this won't really make use of the added "type".

For example, type_name_to_type will only use the first item in this list:

type_name_to_type = {type_name: converter.types[0] for converter in _converters for type_name in converter.type_names if converter.types}

And for type_to_type_name, you'll get both {str, 'String'} and {'str': 'String'}

type_to_type_name = {typ: converter.type_names[0] for converter in _converters for typ in converter.types}

But then the key 'str' (with quotes) will never be used because type(value) will never return 'str'.
type_name = type_to_type_name.get(type(value), type(value).__name__)

I assume you got an error without changing this? maybe we can sync offline and take a look at the callstack to see if there's a better way to address that.

component_spec.OutputSpec(name='output1', type=str),
],
inputs={
'input1': component_spec.InputSpec(type="str"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: be consistent with quote sign.

Converter([bool], ['Boolean', 'Bool', 'bool'], _serialize_bool, _bool_deserializer_code, _bool_deserializer_definitions),
Converter([list], ['JsonArray', 'List', 'list'], _serialize_json, 'json.loads', 'import json'), # ! JSON map keys are always strings. Python converts all keys to strings without warnings
Converter([dict], ['JsonObject', 'Dictionary', 'Dict', 'dict'], _serialize_json, 'json.loads', 'import json'), # ! JSON map keys are always strings. Python converts all keys to strings without warnings
Converter([str,'str'], ['String', 'str'], _serialize_str, 'str', None),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this is the right place to change though. It looks to me the code build around this won't really make use of the added "type".

For example, type_name_to_type will only use the first item in this list:

type_name_to_type = {type_name: converter.types[0] for converter in _converters for type_name in converter.type_names if converter.types}

And for type_to_type_name, you'll get both {str, 'String'} and {'str': 'String'}

type_to_type_name = {typ: converter.type_names[0] for converter in _converters for typ in converter.types}

But then the key 'str' (with quotes) will never be used because type(value) will never return 'str'.
type_name = type_to_type_name.get(type(value), type(value).__name__)

I assume you got an error without changing this? maybe we can sync offline and take a look at the callstack to see if there's a better way to address that.

@@ -80,7 +80,7 @@ def test_instantiate_component_omitting_arguments_with_default(
arguments={
'input1': 'hello',
'input2': 100,
'input3': 3.14,
'input3': '3.14',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this a string? (yet the integer default is not a string)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline, it's a default field which gets parsed to string originally, will think about future improvement.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be a good idea to only support string-serialized data here.
This data must be serialized at some point to a string that becomes a part of the command-line or a file.
It might be better to serialize as early as possible.
Otherwise, the serialization happens every time the pipeline is executed and the serialized data can be different every time (e.g. we Python <3.6 was serializing dicts to JSON with random key ordering).

This applies to default values as well. KFP v1's create_component_from_func serializes all default values to strings at the time of the component creation. So every component instance in every pipeline has the same serialized value.

default: Optional[Any] = None
# TODO(ji-yaqi): Add logic to cast default value into the specified type.
type: str
default: Union[str, int, float, bool, dict, list] = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Optional[Union[...]] = None

Optional[...] == Union[..., None]

component_spec.OutputSpec(name='output1', type=str),
],
inputs={
'input1': component_spec.InputSpec(type="str")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: single quote for consistency.

Copy link
Member

@chensun chensun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

inputs={
'input1': component_spec.InputSpec(type='String'),
'input2': component_spec.InputSpec(type='int'),
'input3': component_spec.InputSpec(type='float', default=3.14),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe Integer?

@chensun
Copy link
Member

chensun commented Aug 25, 2021

/lgtm

@ji-yaqi
Copy link
Contributor Author

ji-yaqi commented Aug 25, 2021

/approve

@google-oss-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ji-yaqi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ji-yaqi ji-yaqi merged commit 65174cd into kubeflow:master Aug 25, 2021
@ji-yaqi ji-yaqi deleted the save_component branch August 25, 2021 21:00
raise NotImplementedError
with open(output_file, 'a') as output_file:
json_component = self.json(exclude_unset=True, exclude_none=True)
yaml_file = yaml.safe_dump(json.loads(json_component))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better use kfp.components._yaml_utils.dump_yaml(...)

open_mock = mock_open()
expected_yaml = textwrap.dedent("""\
implementation:
commands:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kubernetes and Google Cloud use command + args naming.

- 'set -ex

echo "$0" > "$1"'
- name: input1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does name mean? How does it differentiate between inputs and outputs (e.g. when the component is transforming Model input to Model output)?

commands=[
'sh',
'-c',
'set -ex\necho "$0" > "$1"',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JFYI: The resulting file will have \n at the end. And unlike Argo, Vertex Pipelines will currently preserve that end-of-line character and put it on the command-line. Sometimes this is not desired.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants