Ability to link plots to an experiment #1626

NeroOkwa · 2022-06-17T12:36:31Z

Description

This is based on the first high priority issue resulting from the experiment tracking user research, which is:

Ability to save and link images of plots/model artefacts to an experiment. This would provide users with more insight (images and metrics together) to track/compare the evolution of runs across a timeline

What is the problem?

User wants the ability to save images of model artefacts (such as Roc curve, or confusion matrix) alongside the metrics of a run
"For example, you go into the UI, say okay, this is the run that that's important to me. I can get certain objects that I store". "90% of the cases would be CSVs and images"

Who are the users of this functionality?

Data Scientist

Why do our users currently have this problem?

Existing Solution 1: Use MLFlow - " MLflow allows us to save images and not just metrics"
Existing Solution 2: Kedro - "I am saving those as PNG files (in the azure blob storage) and using some parameters to set the sub folder names so that I can compare to previous runs … not perfect but works". "I’d like to be able to flag some pngs to be included in the experiment tracking so I have a record (with time line) how they’ve change"

What is the impact of solving this problem?

User can keep track of specific artefacts alongside the experiment results
"If I run a model I want to save the columns that were created next to it, I might want to create a model saved next to it (artefacts below the model) - something I am used to that I didn't have. There is a lot of artefacts I would want to save with an experiment"

What could we possibly do?

Enable the ability to save model artefacts such as images and CSVs which makeup 90% of usercases

yetudada · 2022-06-20T09:39:38Z

This makes sense for users, in a previous iteration of this functionality we used to allow users to:

Track PNGs, PDFs, CSVs and Excel spreadsheets as part of an experiment and see it on the UI
Compare the artifacts to each other

antonymilne · 2022-06-20T14:24:12Z

Copying this to here so we don't lose it:

I spoke to Lim about this a long time ago and made some notes on his thoughts. He thinks we should have a dataset called something like tracking.ArtifactDataSet which is basically for everything that's not a metric or json. kedro-viz would then work out how to render the dataset dependent on the file type (e.g. png).

I am not sure how this fits in with our existing matplotlib and plotly datasets. Especially because plotly dataset saves to json, how would kedro-viz know to render that as a plot? Do we need another type tracking.PlotlyDataSet to handle this case? Should we just be using the existing matplotlib/plotly datasets for this?

@idanov thought that tracking.JSONDataSet was not the right approach (vs. the pre-existing json.JSONDataSet) so I am guessing would also not like this tracking.ArtifactDataSet. We need to figure out exactly what datasets we want to use here and what the significance of a "tracked" dataset is (i.e. is it the same as a versioned one? is it a separate dataset altogether?).

merelcht · 2022-06-22T10:53:46Z

Notes from Technical Design session:

The team discussed possible solutions to enable users to track plots and other artifacts.

Possible solutions:

The tracking.ArtifactDataSet as proposed by Lim (see comment above). This dataset would allow users to store any type of data that can be considered an artifact, e.g. images, plots etc. Viz would then figure out how to render whatever data is stored under this dataset type.

The general consensus about this approach is that special tracking datasets shouldn't be the way to log more data as part of a run. It raises the question about how many "tracking" datasets we'd end up adding. The discussion led to the option of not having tracking datasets anymore at all.

No tracking datasets at all

Tracking datasets are really just versioned datasets with some extra logic when it comes to the tracking.MetricsDataSet, but the tracking.JSONDataSet is just the same as the regular JSONDataSet with versioning on by default.
Originally, one of the main reasons why we decided we needed them was as a way to tell viz what data to show as part of the experiment tracking panel.
All existing datasets in Kedro now allow users to log artifacts (plots, images, etc.) so it's silly to add special tracking datasets that would pretty much do the same thing
Arguably, versioning isn't exactly the same as tracking. As in, a user might want to version a dataset, but not make it part of the experiment tracking data. Letting the user decide what data to show in the experiment tracking panel, could happen on the UI side (needs design).

Follow up actions:
The decision was made to go for option 2 and move away from special tracking datasets and instead showing all versioned and visualisable datasets on the experiment tracking panel. This leads to the following actions:

Kedro will throw an error when turning on versioning for a dataset later on in the process. We need to fix that workflow as showing versioned datasets in experiment tracking might be an incentive for users to turn on versioning later on when they find they need this data to be displayed.
We will not immediately remove or deprecate the existing tracking datasets, but we need to decide on the future of those, keeping in mind the use case for showing the metric timeline.
Add functionality to render all versioned datasets on the Viz side. This links to: Kedro-Viz to show preview of data kedro-viz#907

antonymilne · 2022-06-23T05:27:22Z

Just to record this in writing also: while I agree with the "tracked plot = versioned dataset" approach, it does feel like an inconsistent and confusing UX given the already-existing tracking datasets:

Want to track json data? Change your dataset type to tracking.JSONDataSet.
Want to track a plot? Keep the same dataset type but set versioned: true.

Hence I think we do need to work out what happens with tracking.JSONDataSet and tracking.MetricsDataSet sooner rather than later. tracking.JSONDataSet could be easily deprecated in favour of json.JSONDataSet with versioned: true, but tracking.MetricsDataSet is trickier. To me this is directly coupled to questions like "how do I search runs by metric" and "why not just do log_metric call" (which we decided against before). Overall, adding plots to experiment tracking sounds straightforward and I'm very happy to do it by versioned: true, but we need work out a more holistic and complete solution here or experiment tracking becomes a bit of a mish-mash of different approaches.

antonymilne · 2022-06-23T05:37:16Z

Now on the question of showing plots in experiment tracking:

not for the MVP, but a killer feature here would be if there were a good way to compare plots between runs. This is something data scientists do A LOT and it's always done manually just by putting the plots next to each other on your screen and flicking your eyes between them for several minutes. This is also how the current compare screen on experiment tracking would work. Can we do something like this instead? https://github.blog/2011-03-21-behold-image-view-modes/
bear in mind that matplotlib dataset (which I think will cater for a very high percentage of artifacts) supports multiple plot pngs. This is the same complication I mentioned in Show Matplotlib dataset pngs in the metadata panel kedro-viz#783. Again, fine for the MVP to ignore this, but I do think it will come up as a user requirement. There's actually a separate question here of whether/how matplotlib dataset should allow this in the first place (Should MatplotilbWriter multiple plot functionality be removed in favour of PartitionedDataSet? kedro-plugins#529), but either way the case of multiple outputs plots remains.

NeroOkwa · 2022-06-23T13:56:54Z

Notes from Follow up Design/Engineering session:

The team discussed a way for users to be able to visualise and compare the dataset plots during experiment tracking.

Follow up actions:

Design(@GabrielComymQB and @Mackay031 ) to start exploratory designs: Low-Fi mockups and then provide feedback to the team
Once completed, engineering (@tynandebold ) to scope and commence development

Timeline:

To be completed by the end of the next sprint: 15/07/22

yetudada · 2022-10-04T10:40:04Z

This issue is completed in kedro-org/kedro-viz#953

NeroOkwa added Stage: Technical Design 🎨 Ticket needs to undergo technical design before implementation Component: Experiment Tracking 🧪 Issue/PR that addresses functionality related to experiment tracking labels Jun 17, 2022

NeroOkwa self-assigned this Jun 17, 2022

NeroOkwa added this to Kedro-Viz Jun 17, 2022

tynandebold moved this to Todo in Kedro-Viz Jun 20, 2022

yetudada mentioned this issue Jun 20, 2022

Allow users to track plots for Experiment Tracking #1296

Closed

comym moved this from Todo to In Progress in Kedro-Viz Jun 23, 2022

comym assigned comym and Mackay031 Jun 23, 2022

antonymilne mentioned this issue Jun 23, 2022

Should MatplotilbWriter multiple plot functionality be removed in favour of PartitionedDataSet? kedro-org/kedro-plugins#529

Open

yetudada added the Type: Parent Issue label Jun 23, 2022

yetudada changed the title ~~Experiment Tracking Adoption: Issue 1 - Ability to save and link plots/model artefacts to an experiment.~~ Ability to link plots to an experiment Jun 23, 2022

yetudada removed this from Kedro-Viz Jun 23, 2022

yetudada added this to Roadmap Jun 23, 2022

yetudada moved this to Now in Roadmap Jun 23, 2022

yetudada moved this from Delivery to Discovery or Research in Roadmap Jun 23, 2022

comym mentioned this issue Jun 23, 2022

Design exploration and final assets for linking plots to experiments kedro-org/kedro-viz#938

Closed

yetudada moved this from Now - Discovery or Research to Now - Delivery in Roadmap Jul 12, 2022

antonymilne mentioned this issue Sep 5, 2022

Fixing the Experiment tracking set up docs kedro-org/kedro-viz#1042

Closed

yetudada moved this from Delivery - Now ⌛ to Shipped 🚀 in Roadmap Sep 7, 2022

antonymilne mentioned this issue Jan 16, 2023

Experiment tracking: big open questions kedro-org/kedro-viz#1217

Closed

yetudada closed this as completed Oct 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ability to link plots to an experiment #1626

Ability to link plots to an experiment #1626

NeroOkwa commented Jun 17, 2022

yetudada commented Jun 20, 2022 •

edited

Loading

antonymilne commented Jun 20, 2022 •

edited

Loading

merelcht commented Jun 22, 2022 •

edited by idanov

Loading

antonymilne commented Jun 23, 2022 •

edited

Loading

antonymilne commented Jun 23, 2022 •

edited

Loading

NeroOkwa commented Jun 23, 2022 •

edited

Loading

yetudada commented Oct 4, 2022

Ability to link plots to an experiment #1626

Ability to link plots to an experiment #1626

Comments

NeroOkwa commented Jun 17, 2022

Description

What is the problem?

Who are the users of this functionality?

Why do our users currently have this problem?

What is the impact of solving this problem?

What could we possibly do?

yetudada commented Jun 20, 2022 • edited Loading

antonymilne commented Jun 20, 2022 • edited Loading

merelcht commented Jun 22, 2022 • edited by idanov Loading

antonymilne commented Jun 23, 2022 • edited Loading

antonymilne commented Jun 23, 2022 • edited Loading

NeroOkwa commented Jun 23, 2022 • edited Loading

yetudada commented Oct 4, 2022

yetudada commented Jun 20, 2022 •

edited

Loading

antonymilne commented Jun 20, 2022 •

edited

Loading

merelcht commented Jun 22, 2022 •

edited by idanov

Loading

antonymilne commented Jun 23, 2022 •

edited

Loading

antonymilne commented Jun 23, 2022 •

edited

Loading

NeroOkwa commented Jun 23, 2022 •

edited

Loading