Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[UI] inputs/outputs tab in KFP semantics #5670

Closed
Tracked by #5675
Bobgy opened this issue May 18, 2021 · 4 comments · Fixed by #5859
Closed
Tracked by #5675

[UI] inputs/outputs tab in KFP semantics #5670

Bobgy opened this issue May 18, 2021 · 4 comments · Fixed by #5859
Assignees
Labels

Comments

@Bobgy
Copy link
Contributor

Bobgy commented May 18, 2021

KFP Inputs/Outputs tab in run details page is currently very coupled to argo.

For v2 compatible pipelines, we can use information from MLMD to render the Inputs/Outputs tab in KFP semantics.

@Bobgy
Copy link
Contributor Author

Bobgy commented May 18, 2021

/assign @zijianjoy

@zijianjoy
Copy link
Collaborator

Current INPUT/OUTPUT tab

Render Input Parameters, Input Artifacts, Output Parameters, Output Artifacts.

They are read from Workflow object.

Use MLMD

Based on execution, we can find a list of events to identify artifact and input/output for this execution. Detail info is in

// An event represents a relationship between an artifact and an execution.
// There are different kinds of events, relating to both input and output, as
// well as how they are used by the mlmd powered system.
// For example, the DECLARED_INPUT and DECLARED_OUTPUT events are part of the
// signature of an execution. For example, consider:
//
// my_result = my_execution({"data":[3,7],"schema":8})
//
// Where 3, 7, and 8 are artifact_ids, Assuming execution_id of my_execution is
// 12 and artifact_id of my_result is 15, the events are:
// {
// artifact_id:3,
// execution_id: 12,
// type:DECLARED_INPUT,
// path:{step:[{"key":"data"},{"index":0}]}
// }
// {
// artifact_id:7,
// execution_id: 12,
// type:DECLARED_INPUT,
// path:{step:[{"key":"data"},{"index":1}]}
// }
// {
// artifact_id:8,
// execution_id: 12,
// type:DECLARED_INPUT,
// path:{step:[{"key":"schema"}]}
// }
// {
// artifact_id:15,
// execution_id: 12,
// type:DECLARED_OUTPUT,
// path:{step:[{"key":"my_result"}]}
// }
// Other event types include INPUT/OUTPUT and INTERNAL_INPUT/_OUTPUT.
// * The INPUT/OUTPUT is an event that actually reads/writes an artifact by an
// execution. The input/output artifacts may not declared in the signature,
// For example, the trainer may output multiple caches of the parameters
// (as an OUTPUT), then finally write the SavedModel as a DECLARED_OUTPUT.
// * The INTERNAL_INPUT/_OUTPUT are event types which are only meaningful to
// an orchestration system to keep track of the details for later debugging.
// For example, a fork happened conditioning on an artifact, then an execution
// is triggered, such fork implementating may need to log the read and write
// of artifacts and may not be worth displaying to the users.
//
// For instance, in the above example,
//
// my_result = my_execution({"data":[3,7],"schema":8})
//
// there is another execution (id: 15), which represents a `garbage_collection`
// step in an orchestration system
//
// gc_result = garbage_collection(my_result)
//
// that cleans `my_result` if needed. The details should be invisible to the
// end users and lineage tracking. The orchestrator can emit following events:
//
// {
// artifact_id: 15,
// execution_id: 15,
// type:INTERNAL_INPUT,
// }
// {
// artifact_id:16, // New artifact containing the GC job result.
// execution_id: 15,
// type:INTERNAL_OUTPUT,
// path:{step:[{"key":"gc_result"}]}
// }
.

@zijianjoy
Copy link
Collaborator

Questions

  1. What is the relationship of Input/Output tab vs the ML metadata tab in
    <SectionIO
    title={'Declared Inputs'}
    artifactIds={this.state.events[Event.Type.DECLARED_INPUT]}
    artifactTypeMap={this.state.artifactTypeMap}
    />
    <SectionIO
    title={'Inputs'}
    artifactIds={this.state.events[Event.Type.INPUT]}
    artifactTypeMap={this.state.artifactTypeMap}
    />
    <SectionIO
    title={'Declared Outputs'}
    artifactIds={this.state.events[Event.Type.DECLARED_OUTPUT]}
    artifactTypeMap={this.state.artifactTypeMap}
    />
    <SectionIO
    title={'Outputs'}
    artifactIds={this.state.events[Event.Type.OUTPUT]}
    artifactTypeMap={this.state.artifactTypeMap}
    />
    ? How should we merge them into one? Possible solution: Move the INPUT/OUTPUT/DECLARED_INPUT/DECLARED_OUTPUT to Input/Output tab, and shows only Properties/Custom Properties in ML Metadata tab.
  2. How do we differentiate Parameter and Artifact from MLMD?
  3. What is the relationship between DECLARED_INPUT and INPUT? How to show them in static pipeline mode?

@Bobgy
Copy link
Contributor Author

Bobgy commented Jun 6, 2021

These questions are right to the point! Let me try to explain some context, I don't have a clear answer to some of them, you'll need to do some designing.

Questions

  1. What is the relationship of Input/Output tab vs the ML metadata tab in
    <SectionIO
    title={'Declared Inputs'}
    artifactIds={this.state.events[Event.Type.DECLARED_INPUT]}
    artifactTypeMap={this.state.artifactTypeMap}
    />
    <SectionIO
    title={'Inputs'}
    artifactIds={this.state.events[Event.Type.INPUT]}
    artifactTypeMap={this.state.artifactTypeMap}
    />
    <SectionIO
    title={'Declared Outputs'}
    artifactIds={this.state.events[Event.Type.DECLARED_OUTPUT]}
    artifactTypeMap={this.state.artifactTypeMap}
    />
    <SectionIO
    title={'Outputs'}
    artifactIds={this.state.events[Event.Type.OUTPUT]}
    artifactTypeMap={this.state.artifactTypeMap}
    />

    ? How should we merge them into one? Possible solution: Move the INPUT/OUTPUT/DECLARED_INPUT/DECLARED_OUTPUT to Input/Output tab, and shows only Properties/Custom Properties in ML Metadata tab.

In KFP v1, input / output tab shows info parsed from argo workflows, but ML metadata tab shows info from MLMD. In v2 & v2 compatible, both will come from MLMD (and they are duplicate), so some merging or information rearrangement is necessary as you thought.

Here's my gut feeling arrangement (mostly similar to your proposal), feel free to discuss:

Input/output tab

  • shows info from MLMD
  • in addition to showing preview + download link, we can add a link to MLMD artifact details page

ML Metadata tab

  • suggest remove the tab altogether, because in KFP v2 compatible, we do not allow users to customize execution metadata/custom properties, so there's not much left to show

Link to execution details page should probably be shown all the time (e.g. as the side content title, see below
image

  1. How do we differentiate Parameter and Artifact from MLMD?

image
The KFP MLMD data model is that input parameters are logged as input:<parameter-name> custom properties of the execution.
Output parameters are logged as output:<parameter-name> custom properties.

Similar to PR: #5793, we will soon standardize to move parameters to fields of a custom property metadata. metadata is of Struct type, so it can include key value pairs like input:<param>, output:<param> like mentioned above.

Artifacts are what you already observed in ML metadata tab, they are connected to executions by event.

  1. What is the relationship between DECLARED_INPUT and INPUT? How to show them in static pipeline mode?

Answer (ref from proto file):

For example, the DECLARED_INPUT and DECLARED_OUTPUT events are part of the signature of an execution

https://github.com/google/ml-metadata/blob/47150524ee5ceee9766a034c4fbe5427440dd79e/ml_metadata/proto/metadata_store.proto#L100-L138

Thanks for the question, after re-reading the documentation, now I realized I had a wrong understanding of DECLARED_INPUT. For all inputs/outputs in KFP tasks, they should be declared input/outputs because they are part of the KFP component signature. Non-declared inputs/outputs is a concept only TFX uses.

However, until now, because KFP does not have the non-declared inputs/outputs concept, we are logging all inputs & outputs as pure inputs and outputs. We need to confirm whether this is sth we need to change.

Ref: MLMD Terminology section of KFP v2 design

google-oss-robot pushed a commit that referenced this issue Jun 25, 2021
…5670 (#5859)

* feat(frontend) Support Input/Output from MLMD for V2-compatible

* fix test

* address nit comments

* Artifact Preview component, use events to get artifact name.

* comment and UX rework

* downloadable link
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants