-
Notifications
You must be signed in to change notification settings - Fork 914
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Versioning: planning for engineering phase #4199
Comments
Versioning research result: |
Sharing a couple of thoughts after going through the materials above. Defining the version scopeIn the research, we touch a quite wide aspect of versioning, including experiment tracking. For the current stage we suggest defining the version scope by aligning on the exact meaning of versioning and deciding whether to separate it from experiment tracking. We suggest that the primary focus should be on mapping a single version number to the corresponding versions of parameters, I/O data, and code - our bare minimum target. So users are able to retrieve a full project state including data at any point in time. Pros:
Cons:
Options that we seeOption 1: Rework and Improve Kedro’s Existing Versioning Mechanism
Option 2: Deprecate current versioning mechanism and Integrate with an external tool like DVC
Suggestions for moving forward
Pros:
Cons:
More philosophical questionHow can we make Kedro more attractive for other tools to integrate with, similar to how MLflow integrates with Keras via the |
@astrojuanlu suggestions:
|
This is a major issue IMO. If this is introduced, I believe it should be a plugin (i.e. Furthermore, how widely adopted is |
That's a valid point and the ideal solution for us would be having them ( On the other hand, if we don't go with them, we will have to design and implement optimal data manipulations and caching, which is not a 5-minute task and, given the amount of resources we have, might take some time. Additionally, it will probably require some other dependencies. So, before going there, we would like to check what we can get from the existing tools. |
This isn't completely true. Kedro-Viz uses the same timestamp format for experiment tracking, but it doesn't use the same mechanism used in the catalog. So removing |
Kedro-Viz uses a combination of session storage and data from the catalog. The session ID which is a timestamp is the same timestamp is applied to the dataset if |
@merelcht, @rashidakanchwala thanks for clarifying it! What I meant is that it will require some effort from our side to make it work with Kedro-Viz when versioning is updated. |
It has around the same level of downloads as Kedro, a bit less than 700k per month as per https://clickpy.clickhouse.com/dashboard/dvc?min_date=2024-01-14&max_date=2024-11-03 |
Description
The initial research phase is now completed for Versioning. This task signals the start of the engineering phase. In order to plan this the lead engineer(s) on this workstream need to:
The text was updated successfully, but these errors were encountered: