-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(sdk): Add BaseModel to component_spec data classes #6372
Conversation
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Yaqi. Left a couple nitpicks but overall looks good.
sdk/python/kfp/v2/compiler_cli_tests/test_data/experimental_v2_component.json
Outdated
Show resolved
Hide resolved
sdk/python/kfp/v2/components/experimental/base_component_test.py
Outdated
Show resolved
Hide resolved
sdk/python/kfp/v2/components/experimental/component_spec_test.py
Outdated
Show resolved
Hide resolved
sdk/python/kfp/v2/components/experimental/component_spec_test.py
Outdated
Show resolved
Hide resolved
default: null | ||
type: str | ||
labels: null | ||
name: component_1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can be done in a later PR, but the name better be on the top. (likely we need some custom sort logic).
95d5ec6
to
432fc7f
Compare
Converter([bool], ['Boolean', 'Bool', 'bool'], _serialize_bool, _bool_deserializer_code, _bool_deserializer_definitions), | ||
Converter([list], ['JsonArray', 'List', 'list'], _serialize_json, 'json.loads', 'import json'), # ! JSON map keys are always strings. Python converts all keys to strings without warnings | ||
Converter([dict], ['JsonObject', 'Dictionary', 'Dict', 'dict'], _serialize_json, 'json.loads', 'import json'), # ! JSON map keys are always strings. Python converts all keys to strings without warnings | ||
Converter([str,'str'], ['String', 'str'], _serialize_str, 'str', None), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is needed to convert the type 'str' to actual type 'String'. Otherwise, the input as 'str'
will be identified as None type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this is the right place to change though. It looks to me the code build around this won't really make use of the added "type".
For example, type_name_to_type
will only use the first item in this list:
type_name_to_type = {type_name: converter.types[0] for converter in _converters for type_name in converter.type_names if converter.types} |
And for type_to_type_name
, you'll get both {str, 'String'}
and {'str': 'String'}
type_to_type_name = {typ: converter.type_names[0] for converter in _converters for typ in converter.types} |
But then the key
'str'
(with quotes) will never be used because type(value)
will never return 'str'
.type_name = type_to_type_name.get(type(value), type(value).__name__) |
I assume you got an error without changing this? maybe we can sync offline and take a look at the callstack to see if there's a better way to address that.
component_spec.OutputSpec(name='output1', type=str), | ||
], | ||
inputs={ | ||
'input1': component_spec.InputSpec(type="str"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: be consistent with quote sign.
Converter([bool], ['Boolean', 'Bool', 'bool'], _serialize_bool, _bool_deserializer_code, _bool_deserializer_definitions), | ||
Converter([list], ['JsonArray', 'List', 'list'], _serialize_json, 'json.loads', 'import json'), # ! JSON map keys are always strings. Python converts all keys to strings without warnings | ||
Converter([dict], ['JsonObject', 'Dictionary', 'Dict', 'dict'], _serialize_json, 'json.loads', 'import json'), # ! JSON map keys are always strings. Python converts all keys to strings without warnings | ||
Converter([str,'str'], ['String', 'str'], _serialize_str, 'str', None), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this is the right place to change though. It looks to me the code build around this won't really make use of the added "type".
For example, type_name_to_type
will only use the first item in this list:
type_name_to_type = {type_name: converter.types[0] for converter in _converters for type_name in converter.type_names if converter.types} |
And for type_to_type_name
, you'll get both {str, 'String'}
and {'str': 'String'}
type_to_type_name = {typ: converter.type_names[0] for converter in _converters for typ in converter.types} |
But then the key
'str'
(with quotes) will never be used because type(value)
will never return 'str'
.type_name = type_to_type_name.get(type(value), type(value).__name__) |
I assume you got an error without changing this? maybe we can sync offline and take a look at the callstack to see if there's a better way to address that.
@@ -80,7 +80,7 @@ def test_instantiate_component_omitting_arguments_with_default( | |||
arguments={ | |||
'input1': 'hello', | |||
'input2': 100, | |||
'input3': 3.14, | |||
'input3': '3.14', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this a string? (yet the integer default is not a string)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed offline, it's a default field which gets parsed to string originally, will think about future improvement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be a good idea to only support string-serialized data here.
This data must be serialized at some point to a string that becomes a part of the command-line or a file.
It might be better to serialize as early as possible.
Otherwise, the serialization happens every time the pipeline is executed and the serialized data can be different every time (e.g. we Python <3.6 was serializing dicts to JSON with random key ordering).
This applies to default values as well. KFP v1's create_component_from_func
serializes all default values to strings at the time of the component creation. So every component instance in every pipeline has the same serialized value.
default: Optional[Any] = None | ||
# TODO(ji-yaqi): Add logic to cast default value into the specified type. | ||
type: str | ||
default: Union[str, int, float, bool, dict, list] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Optional[Union[...]] = None
Optional[...]
== Union[..., None]
component_spec.OutputSpec(name='output1', type=str), | ||
], | ||
inputs={ | ||
'input1': component_spec.InputSpec(type="str") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: single quote for consistency.
8301fd3
to
cd2ca37
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
inputs={ | ||
'input1': component_spec.InputSpec(type='String'), | ||
'input2': component_spec.InputSpec(type='int'), | ||
'input3': component_spec.InputSpec(type='float', default=3.14), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe Integer
?
/lgtm |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ji-yaqi The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
raise NotImplementedError | ||
with open(output_file, 'a') as output_file: | ||
json_component = self.json(exclude_unset=True, exclude_none=True) | ||
yaml_file = yaml.safe_dump(json.loads(json_component)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better use kfp.components._yaml_utils.dump_yaml(...)
def dump_yaml(data): |
open_mock = mock_open() | ||
expected_yaml = textwrap.dedent("""\ | ||
implementation: | ||
commands: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kubernetes and Google Cloud use command
+ args
naming.
- 'set -ex | ||
|
||
echo "$0" > "$1"' | ||
- name: input1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does name
mean? How does it differentiate between inputs and outputs (e.g. when the component is transforming Model
input to Model
output)?
commands=[ | ||
'sh', | ||
'-c', | ||
'set -ex\necho "$0" > "$1"', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JFYI: The resulting file will have \n
at the end. And unlike Argo, Vertex Pipelines will currently preserve that end-of-line character and put it on the command-line. Sometimes this is not desired.
No description provided.