-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No experiment tracking for datasets defined with a dataset factory #1480
Comments
I looked a bit more carefully at the code and could come up with a hook that seems to solve the problem. It enforces the discovery of data sets defined in the registered pipelines: import logging
from typing import Dict
from kedro.framework.hooks import hook_impl
from kedro.framework.project import pipelines
from kedro.io.core import DataSetNotFoundError
from kedro.io.data_catalog import DataCatalog
from kedro.pipeline import Pipeline
LOGGER = logging.getLogger(__name__)
class DataCatalogDiscoveryHooks:
"""
Custom hooks for Kedro.
"""
@hook_impl
def after_catalog_created(self, catalog: DataCatalog) -> None:
"""
Enforce the discovery of all the data sets in the project.
"""
_pipelines: Dict[str, Pipeline] = dict(pipelines)
LOGGER.info("Enforcing data set pattern discovery...")
data_set_names = {data_set_name for pipeline in _pipelines.values() for data_set_name in pipeline.data_sets()}
# Sort data sets by name, then by namespace to display similar data sets together in kedro viz
sorted_data_set_names = sorted(data_set_names, key=lambda name: ".".join(reversed(name.split("."))))
for data_set_name in sorted_data_set_names:
try:
# Enforce data set pattern discovery
catalog._get_dataset(data_set_name) # pylint: disable=protected-access
except DataSetNotFoundError:
continue Would this be a suitable solution? I so, I can come up with a PR to add this logic in |
Hi @pierre-godard, thanks so much for the ticket and this investigation! Please do open a PR with this solution you've outlined above and we can start taking a closer look to get it merged in and the problem fixed. |
Hi! Here is the PR: #1491 |
Amazing, thank you! We'll have a look soon. |
Hi @pierre-godard , Thank you for the PR. After looking at the PR and the way we access datasets, I feel the discovery should be via the catalog object. We need to get all the datasets available (both factory pattern and normal) via the DataCatalog object. Further looking into the issue, we get the list of dataset names via DataCatalog object's Happy to discuss further with the team. @ankatiyar @merelcht Thank you |
@ravi-kumar-pilla I'll look into this! |
I've left some comments on the PR but the approach does seem like the most straightforward one. :) |
Hello I am using Kedro-viz 7.10. I try using experiment tracking with factories and it does not work. If I write the following in the catalog data is generated but not shown in the Kedro viz "{dataset_name}#metrics":
type: tracking.MetricsDataset
filepath: data/10_tracking/{dataset_name}.json If I add after this code, (not running experiments again, just refreshing kedro viz) I can then see the results in the experiment tracking "pca_target_regression.train_dataset_metrics":
type: tracking.MetricsDataset
filepath: data/10_tracking/pca_target_regression.train_dataset_metrics.json Any idea of why this is happening? |
Hi @EloyID , I think #1689 hasn't been solved yet, might be affecting you. @ravi-kumar-pilla , is this still the parent issue of #1689 ? |
@astrojuanlu yes. As mentioned in the ticket, we tried resolving this on viz side but we run into few issues. As of now, experiment tracking does not support factory patterns. We will try to resolve this in future sprints. Thank you |
Description
Experiment tracking does not detect datasets when defined with a dataset factory.
Context
I've been using the recent dataset factory feature and it's been wonderful!
However, when I tried to visualize the tracked dataset with Kedro Viz experiment tracking feature, no tracked dataset is detected.
Steps to Reproduce
my_namespace.metrics
namedata/my_namespace.metrics.json/<SESSION_ID>/my_namespace.metrics.json
kedro viz
and go to the latest experiment, under the "Overview" tabExpected Result
The expected behavior would be for the
my_namespace.metrics
JSON content to appear on the UI.Actual Result
There is no JSON content on the UI.
Your Environment
Include as many relevant details as possible about the environment you experienced the bug in:
Checklist
The text was updated successfully, but these errors were encountered: