Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to link plots to an experiment #1626

Closed
NeroOkwa opened this issue Jun 17, 2022 · 7 comments
Closed

Ability to link plots to an experiment #1626

NeroOkwa opened this issue Jun 17, 2022 · 7 comments
Assignees
Labels
Component: Experiment Tracking 🧪 Issue/PR that addresses functionality related to experiment tracking Stage: Technical Design 🎨 Ticket needs to undergo technical design before implementation Type: Parent Issue

Comments

@NeroOkwa
Copy link
Contributor

Description

This is based on the first high priority issue resulting from the experiment tracking user research, which is:

Ability to save and link images of plots/model artefacts to an experiment. This would provide users with more insight (images and metrics together) to track/compare the evolution of runs across a timeline

What is the problem? 

  • User wants the ability to save images of model artefacts (such as Roc curve, or confusion matrix) alongside the metrics of a run 
  • "For example, you go into the UI, say okay, this is the run that that's important to me. I can get certain objects that I store". "90% of the cases would be CSVs and images"

Who are the users of this functionality? 

  • Data Scientist 

Why do our users currently have this problem?

  • Existing Solution 1: Use MLFlow - " MLflow allows us to save images and not just metrics"
  • Existing Solution 2: Kedro - "I am saving those as PNG files (in the azure blob storage) and using some parameters to set the sub folder names so that I can compare to previous runs … not perfect but works". "I’d like to be able to flag some pngs to be included in the experiment tracking so I have a record (with time line) how they’ve change"

What is the impact of solving this problem?

  • User can keep track of specific artefacts alongside the experiment results 
  • "If I run a model I want to save the columns that were created next to it, I might want to create a model saved next to it (artefacts below the model) - something I am used to that I didn't have. There is a lot of artefacts I would want to save with an experiment" 

What could we possibly do?

  • Enable the ability to save model artefacts such as images and CSVs which makeup 90% of usercases
@NeroOkwa NeroOkwa added Stage: Technical Design 🎨 Ticket needs to undergo technical design before implementation Component: Experiment Tracking 🧪 Issue/PR that addresses functionality related to experiment tracking labels Jun 17, 2022
@NeroOkwa NeroOkwa self-assigned this Jun 17, 2022
@yetudada
Copy link
Contributor

yetudada commented Jun 20, 2022

This makes sense for users, in a previous iteration of this functionality we used to allow users to:

  • Track PNGs, PDFs, CSVs and Excel spreadsheets as part of an experiment and see it on the UI
  • Compare the artifacts to each other

PAI - Comparison - Artifact groups - hover

Artifacts

PAI - Comparison - 2 artifacts

@antonymilne
Copy link
Contributor

antonymilne commented Jun 20, 2022

Copying this to here so we don't lose it:

I spoke to Lim about this a long time ago and made some notes on his thoughts. He thinks we should have a dataset called something like tracking.ArtifactDataSet which is basically for everything that's not a metric or json. kedro-viz would then work out how to render the dataset dependent on the file type (e.g. png).

I am not sure how this fits in with our existing matplotlib and plotly datasets. Especially because plotly dataset saves to json, how would kedro-viz know to render that as a plot? Do we need another type tracking.PlotlyDataSet to handle this case? Should we just be using the existing matplotlib/plotly datasets for this?

@idanov thought that tracking.JSONDataSet was not the right approach (vs. the pre-existing json.JSONDataSet) so I am guessing would also not like this tracking.ArtifactDataSet. We need to figure out exactly what datasets we want to use here and what the significance of a "tracked" dataset is (i.e. is it the same as a versioned one? is it a separate dataset altogether?).

@merelcht
Copy link
Member

merelcht commented Jun 22, 2022

Notes from Technical Design session:

The team discussed possible solutions to enable users to track plots and other artifacts.

Possible solutions:

  1. The tracking.ArtifactDataSet as proposed by Lim (see comment above). This dataset would allow users to store any type of data that can be considered an artifact, e.g. images, plots etc. Viz would then figure out how to render whatever data is stored under this dataset type.

The general consensus about this approach is that special tracking datasets shouldn't be the way to log more data as part of a run. It raises the question about how many "tracking" datasets we'd end up adding. The discussion led to the option of not having tracking datasets anymore at all.

  1. No tracking datasets at all
  • Tracking datasets are really just versioned datasets with some extra logic when it comes to the tracking.MetricsDataSet, but the tracking.JSONDataSet is just the same as the regular JSONDataSet with versioning on by default.
  • Originally, one of the main reasons why we decided we needed them was as a way to tell viz what data to show as part of the experiment tracking panel.
  • All existing datasets in Kedro now allow users to log artifacts (plots, images, etc.) so it's silly to add special tracking datasets that would pretty much do the same thing
  • Arguably, versioning isn't exactly the same as tracking. As in, a user might want to version a dataset, but not make it part of the experiment tracking data. Letting the user decide what data to show in the experiment tracking panel, could happen on the UI side (needs design).

Follow up actions:
The decision was made to go for option 2 and move away from special tracking datasets and instead showing all versioned and visualisable datasets on the experiment tracking panel. This leads to the following actions:

  • Kedro will throw an error when turning on versioning for a dataset later on in the process. We need to fix that workflow as showing versioned datasets in experiment tracking might be an incentive for users to turn on versioning later on when they find they need this data to be displayed.
  • We will not immediately remove or deprecate the existing tracking datasets, but we need to decide on the future of those, keeping in mind the use case for showing the metric timeline.
  • Add functionality to render all versioned datasets on the Viz side. This links to: Kedro-Viz to show preview of data kedro-viz#907

@antonymilne
Copy link
Contributor

antonymilne commented Jun 23, 2022

Just to record this in writing also: while I agree with the "tracked plot = versioned dataset" approach, it does feel like an inconsistent and confusing UX given the already-existing tracking datasets:

  • Want to track json data? Change your dataset type to tracking.JSONDataSet.
  • Want to track a plot? Keep the same dataset type but set versioned: true.

Hence I think we do need to work out what happens with tracking.JSONDataSet and tracking.MetricsDataSet sooner rather than later. tracking.JSONDataSet could be easily deprecated in favour of json.JSONDataSet with versioned: true, but tracking.MetricsDataSet is trickier. To me this is directly coupled to questions like "how do I search runs by metric" and "why not just do log_metric call" (which we decided against before). Overall, adding plots to experiment tracking sounds straightforward and I'm very happy to do it by versioned: true, but we need work out a more holistic and complete solution here or experiment tracking becomes a bit of a mish-mash of different approaches.

@antonymilne
Copy link
Contributor

antonymilne commented Jun 23, 2022

Now on the question of showing plots in experiment tracking:

@comym comym moved this from Todo to In Progress in Kedro-Viz Jun 23, 2022
@yetudada yetudada changed the title Experiment Tracking Adoption: Issue 1 - Ability to save and link plots/model artefacts to an experiment. Ability to link plots to an experiment Jun 23, 2022
@yetudada yetudada removed this from Kedro-Viz Jun 23, 2022
@yetudada yetudada added this to Roadmap Jun 23, 2022
@yetudada yetudada moved this to Now in Roadmap Jun 23, 2022
@yetudada yetudada moved this from Delivery to Discovery or Research in Roadmap Jun 23, 2022
@NeroOkwa
Copy link
Contributor Author

NeroOkwa commented Jun 23, 2022

Notes from Follow up Design/Engineering session:

The team discussed a way for users to be able to visualise and compare the dataset plots during experiment tracking.

Follow up actions:

  • Design(@GabrielComymQB and @Mackay031 ) to start exploratory designs: Low-Fi mockups and then provide feedback to the team
  • Once completed, engineering  (@tynandebold ) to scope and commence development

Timeline:

To be completed by the end of the next sprint: 15/07/22

@yetudada
Copy link
Contributor

yetudada commented Oct 4, 2022

This issue is completed in kedro-org/kedro-viz#953

@yetudada yetudada closed this as completed Oct 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Experiment Tracking 🧪 Issue/PR that addresses functionality related to experiment tracking Stage: Technical Design 🎨 Ticket needs to undergo technical design before implementation Type: Parent Issue
Projects
Status: Shipped 🚀
Development

No branches or pull requests

6 participants