-
Notifications
You must be signed in to change notification settings - Fork 909
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Experiment Tracking in Kedro #1070
Comments
(Comment copied over, originally written by @limdauto) Technical DesignIntroductionThis document describes the design of a set of features that enable experiment tracking as a native capability in Kedro. It will also break down the technical work required to implement it in an iterative manner. BackgroundExperiment tracking in Kedro means:
Being Kedro native means experiment tracking concepts/abstractions map directly to Kedro concepts/abstractions and can be visualised transparently with Kedro-Viz. This principle informs the following technical choices:
MilestonesThe MVP will be released iteratively with the following milestones (some of these milestones can be worked on in parallel): Milestone 1: Visualisation of metrics & other JSON-compatible artefactsDuring an experiment, users will want to track performance metrics of their ML models. This is usually captured as a dictionary of metric name and metric value. Furthermore, sometimes they would want to track arbitrary key-value pair as seen with the use of node(
train_model,
inputs="model_input",
outputs=dict(
model="model",
features="features",
metrics="metrics",
)
) To track and visualise quantitative metrics:
type: tracking.MetricsDataSet
path: data/06_models/metrics.json
To track and visualise an unstructured features:
type: tracking.JSONDataSet
path: data/05_model_input/features.json These datasets are versioned by default. For the first milestone release, Kedro-Viz can simply pick up this dataset's For metrics, we can construct a timeseries of the metrics by loading data from previous runs and automatically render a plot: Milestone 2: Runs list visualisation & comparisonSee #1070 (comment) Milestone 3: Mlflow compatibilitySee #1070 (comment) Milestone 4: Timeline view |
(Comments copied over, originally written by @datajoely) On Milestone 4 - I had a good chat with @mkretsch327 re the benefits of being able to compare different plotly vizs together. I think there is a lot of scope for (a) comparing one particular plot to different runs (b) selecting multiple plots and seeing them side by side. I would also say that Milestone 2 opens up two important points:
time-bar-play-540px.mp4In general there is a lot we could [get inspiration] from keylines... kronograph-flow-net.mp4 |
(Comments copied over, originally written by @yetudada ) This is fantastic work! I have the following comments:
I have one question about Milestone 1:
|
(Comment copied over, originally written by @limdauto) Milestone 2: Runs list visualisation - Technical DesignIntroductionIn this milestone, on Kedro-Viz, we will display:
Design (internal)Prototype: https://projects.invisionapp.com/share/E2113DWF5A7R#/screens/452880665 BackgroundSome background knowledge that will be useful:
{
"package_name": "spaceflights_0174",
"session_id": "2021-08-10T12.12.42.311Z",
"cli": {
"args": [],
"params": {
"from_inputs": [],
"to_outputs": [],
"from_nodes": [],
"to_nodes": [],
"node_names": [],
"runner": null,
"parallel": false,
"is_async": false,
"env": null,
"tag": [],
"load_version": {},
"pipeline": null,
"config": null,
"params": {}
},
"command_name": "run",
"command_path": "kedro run"
},
"project_path": "/Users/lim_Hoang/Projects/spaceflights-0174"
}
Challenges
To solve this challenge, we could adapt the store location to be divided into ProposalFor this milestone, I propose that:
TechnicalThe implementation for this milestone can live entirely in Kedro-Viz as follows: Session store schemaTo start with, we can use the following initial relational schema for the session store: A few notes:
Data AccessIn
In the future, we will allow users to query by metrics. To that extend, we need a metrics-friendly search index. At the very least, we need to setup an index in sqlite to do it: https://www.tutorialspoint.com/sqlite/sqlite_indexes.htm -- but there are other solution, including an in-memory search index where we pay the cost up front when starting viz or we can even us full-blown disk-based search index too: https://whoosh.readthedocs.io/en/latest/index.html. There are pros & cons for each approach. I will write a separate design doc just for the metrics query. But it will be for later iteration. Front-endThe technical design for the frontend will be dependent on the product design, i.e. where we want to show the runs list. We might do the following:
APIThe backend design is trivial: simply integrate the API responses for Open problems
|
Milestone 3: MLflow compatibilityIntroductionIn this milestone we will add compatibility with the MLflow Model Registry and potentially the MLflow UI. This means that Kedro users will be able to log MLflow models that can then be registered, viewed and served with MLflow. BackgroundMLflow is a popular tool to use for experiment tracking, which also contains a model registry. Experiment tracking in Kedro will focus on logging and visualising run data, but not on managing model lineage. Offering compatibility with MLflow models will allow Kedro users to use the MLflow model registry to manage model lifecycles. ProposalTo enable compatibility with the MLflow Model Registry I propose the following implementation:
Outstanding questions
|
(Comment copied over, originally written by @AntonyMilneQB) This is great stuff, amazing work all! I'm so happy we ended up going with the metrics = dataset idea in the end 😀 So far I've had a good think about everything until Milestone 3 and just have a few suggestions and challenges to make. Not wanting to undermine anything, since I agree with pretty much everything said above, but just some extra things I think we should consider 🙂 General conceptsExperiment
I think the concept that one kedro run = one set of tracked data = one experiment is correct. However, I wouldn't see this as an MVP but rather the full solution. If a user wants to track multiple models in their kedro run or generate lots of different metrics in different nodes then that's already possible. And the name, organisation and topology of the node/pipeline that generates the tracked datasets already provides the organisation into different "experiments" without the need for explicitly introducing experiment as a new concept. As such, I would propose that we don't use the term "experiment" at all within kedro. It just seems to be introducing more terminology for something that we don't need, and there are already enough concepts within kedro for a new user to pick up. It does make sense to describe our feature as adding "experiment tracking" to kedro, because that's what mlflow etc. refer to it as. This would provide a bridge for existing mlflow users to understand that kedro now supports experiment tracking as a feature and see how it fits into already existing kedro concepts. But apart from that, I don't see the need for the concept or terminology of experiments in kedro at all, as part of the MVP or the full version. Milestone 1How to mark which datasets are tracked on kedro vizWe need to make it clear which datasets are the tracked ones in kedro viz pipeline view or make these datasets easy to find in the search. Chances are that it's just going to be one or two datasets in a big pipeline with lots of datasets and nodes, so the user should easily be able to pick out which ones they need to click on to see their metrics. How to visualise metrics datasetSay you've tracked two metrics (accuracy and MSE) over 3 different runs and got this: What @limdauto suggests is something like the following plot (N.B. x axis should have uniformly spaced points even if timestamps aren't uniform): As per Yetu/Ignacio's comments, this doesn't work well if you're tracking metrics which have drastically different ranges. In the above plot accuracy is essentially just a flat line at the bottom of the plot since the whole scale is overwhelmed by the values for MSE. To fix this, you could have two separate y-axes with different scales, but as soon as you have 3 metrics this runs into problems. So it's best just to rescale the points like this: @datajoely In all the above graphs lines connect points corresponding to the same metric. The alternative way to plot that @mkretsch327 is suggesting is a parallel coordinates plot in which metric has its own y axis, and lines connect points corresponding to the same timestamp. Here this would look like: This naturally extends nicely to plotting many metrics, since you can have as many parallel y-axes as you like. PAI plotted metrics like this but with the axes arranged radially rather than parallel (called a radial or spider plot). Both the time series view and the parallel coordinate plots are useful in different scenarios, so if possible then I think it would be nice to have both options available in kedro viz (though just time series is fine as MVP). |
(Comment copied over, originally written by @AntonyMilneQB) Milestone 2This sounds awesome 🔥 and generally makes a lot of sense but I don't fully understand the design here, sorry @limdauto. I also have some concerns about scalability. Session type
So as I understand it the options are:
My questions here would be: what is Scalability of querying by metric
The biggest problem with PAI was always the performance, which came from the limitation of mlflow's storage system that you mentioned. Do not underestimate how many Now let's say you want to find all the runs that have accuracy > 0.8. How would this perform? Presumably you need to I'm wondering whether it would be wise to speed up querying by including some other information in the I really don't have an idea of how performant the proposed scheme is, so maybe this is going to be a complete non-issue. I would just caution that people are going to end up with a lot of metrics stored over the course of a project, and we should have something that scales well to that. Scalability of many runsRelated to the above, I'd just warn that you're going to end up with potentially a very long list of runs, and that would need to scale well. Not part of the MVP I know, but I think we should consider how people are going to be able to browse and filter a huge list. In PAI you could filter by run time, author and run tags. This filtering was absolutely essential to be able to use the tool (in particular tags, which allow for very powerful and flexible filtering). We should consider adding some of these things to the session. Dmitrii suggested in the past that the |
(Comment copied over, originally written by @limdauto) @AntonyMilneQB thanks for the amazing comments as always! Re: General Concept100% agree that we don't need experiment as an abstraction. I wrote "we can" but I also don't think "we should" do it. I'd be interested to see if any user has any legitimate use case after trying our workflow. It's nice to have an escape hatch in the design. Re: Milestone 1How to mark which datasets are tracked on kedro vizYea actually this is a great point. Let me bring it up with @GabrielComymQB tomorrow. We can do something similar to the parameters. Metrics Plot
Re: Milestone 2Session TypeI think I'm specifically discussing data type here when we represent the session in the viz database. For experimentation tracking purpose, we only care about Scalability of querying by metricsThis touches on a design iteration that I haven't mentioned. If we want to query by metrics, we need a metrics-friendly search index. At the very least, we need to setup an index in sqlite to do it: https://www.tutorialspoint.com/sqlite/sqlite_indexes.htm -- but there are other solution, including an in-memory search index where we pay the cost up front when starting viz or we can even us full-blown disk-based search index too: https://whoosh.readthedocs.io/en/latest/index.html. There are pros & cons for each approach. I will write a separate design doc just for the metrics query. But it will be for later iteration. Scalability of many runsSince this was still being (visually) designed when I wrote the tech design, I didn't put it in. But I absolutely agree with you that the ability to find runs in a long list is essential. In the first iteration, from a product point of view, our solution is:
In terms of technical performance, I'm still considering the pros and cons of whether to perform the search client-side or server-side. But I know for a fact we can do text search client side up to thousands of rows easily. For millions of rows, you can employ an embedded in-memory search index like this one to help: https://github.com/techfort/LokiJS. I'm still debating though. |
Some thoughts after today's tech design session. The general statement above gives me an impression that Kedro is offering some "MLOps" capabilities. I tried to group the experiment tracking features into 2 different categories:
I think the main focus of this GH issue is on point 1 and I see a lot of consideration with So my question is, how much do we expect Kedro plays in this space and how far do we want to go? Or what are the things that we are not going to do for experiment tracking ? (Like kedro is not going to do any orchestration work) |
Is there a way to know at which milestone do we stand at the moment? Or is progress mostly captured in linked issues? |
@astrojuanlu I think the plan has evolved a bit from what's written here, after research we did last year. We've done 1 and 2, but 3 isn't really a focus at the moment. We're now working on kedro-org/kedro-viz#1218. AFAIK most tickets are now tracked on the Kedro-Viz project. @NeroOkwa can probably give more insights as well on the priorities now 🙂 Perhaps I should close this issue so it's clear this is not the active plan anymore. |
kedro-org/kedro-viz#1218 was closed, so as per @merelcht's comment above, I'm closing this issue. |
Why should we care about Experiment Tracking?
Experiment tracking is a way to record all information that you would need to recreate a data science experiment. We think of it as logging for parameters, metrics, models and other artefacts.
Kedro currently has parts of this functionality. For example, it’s possible to log parameters as part of your codebase and snapshot models and other artefacts like plots with Kedro’s versioning capabilities for datasets. However, Kedro is missing a way to log metrics and capture all this logged metadata as a timestamped run of an experiment. It is also missing a way for users to visualise, discover and compare this logged metadata.
This change is essential to us because we want to standardise how logging for ML is done. There should be one easy way to capture this information, and we’re going to give users the Kedro way to do this.
This functionality is also expected to increase Kedro Lab usage by Data Scientists as it has anecdotally been known that people performing the Data Engineering workflow get the most benefits from Kedro-Viz while the Data Science workflow is not accounted for.
What evidence do we have to suggest that we do this?
Our users sense the gap, and one of the most common usage patterns of Kedro is with MLFlow Tracking, which provides this additional functionality. We have seen evidence here:
kedro-mlflow
pluginkedro-kubeflow
pluginWe also know that our internal users relied on PerformanceAI for this functionality. We sunset PerformanceAI, but PerformanceAI was fantastic to use because:
Our vertical teams, namely C1 (@deepyaman), InsureX (@imdoroshenko @benhorsburgh) and OptimusAI (@mkretsch327) consider this high priority and will be confirmed users of this functionality.
What metrics will we track to prove the success of this feature?
kedro viz
terminal runsWhat design requirements do we have?
We must allow users to:
We must think about:
KedroSession
The text was updated successfully, but these errors were encountered: