Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kedro tries to instantiate DataSet '_sql' that is used for YAML factorization #2363

Closed
edwardcjohnson opened this issue Feb 25, 2023 · 6 comments
Assignees
Labels
Issue: Bug Report 🐞 Bug that needs to be fixed

Comments

@edwardcjohnson
Copy link

Description

Hello, i upgraded from 0.18.2 to 0.18.5 and now get an error with a section of my catalog that leverages the YAML factorization capability (ref https://kedro.readthedocs.io/en/stable/data/data_catalog.html#load-multiple-datasets-with-similar-configuration) with pandas.SQLQueryDataSet.

It seems that kedro 0.18.5 is now trying to instantiate my factorized YAML despite the prefix "_" i am using with "_sql" as mentioned in the docs:
"It’s important that the name of the template entry starts with a _ so Kedro knows not to try and instantiate it as a dataset."

Context

I can easily work around the bug by not using the factorization capability, but of course this means i have to copy+paste the same lines of code multiple times in my catalog.yml

Note that i have only checked this for pandas.SQLQueryDataSet.

Steps to Reproduce

  1. add the following to catalog.yml:
_sql: &sql
  type: pandas.SQLQueryDataSet
  credentials: <a reference to an entry in credentials.yml>


my_test_sql_query:
  <<: *sql
  sql: <a test SQL query goes here>
  1. Reference the my_test_sql_query dataset in a kedro pipeline
  2. run said kedro pipeline

Expected Result

The pipeline should complete without error when it uses the referenced catalog dataset

Actual Result

Error message that stops the pipeline run

DataSetError: 
'sql' and 'filepath' arguments cannot both be empty.Please provide a sql query or path to a sql query file..
Failed to instantiate DataSet '_sql' of type 'kedro.extras.datasets.pandas.sql_dataset.SQLQueryDataSet'.```

Your Environment

Include as many relevant details about the environment in which you experienced the bug:

  • Kedro version used (pip show kedro or kedro -V): kedro, version 0.18.5
  • Python version used (python -V): Python 3.9.12
  • Operating system and version: Debian GNU/Linux 11 (bullseye)
@datajoely
Copy link
Contributor

Hi @edwardcjohnson this looks like a bug - downgrade to 0.18.4 for now and we'll look into it on our end

@datajoely datajoely added the Issue: Bug Report 🐞 Bug that needs to be fixed label Feb 27, 2023
@edwardcjohnson
Copy link
Author

thank you @datajoely

@merelcht merelcht moved this to To Do in Kedro Framework Mar 1, 2023
@ankatiyar ankatiyar self-assigned this Mar 6, 2023
@ankatiyar ankatiyar moved this from To Do to In Progress in Kedro Framework Mar 8, 2023
@ankatiyar
Copy link
Contributor

Hi @edwardcjohnson, could you confirm which config loader you were using in your project? I'm able to reproduce this error with OmegaConfigLoader but not the default ConfigLoader (I've tried with pandas.CSVDataSet). If you're using OmegaConfigLoader, the templating of datasets is not yet supported. See this comment -> #2399 (comment)

@edwardcjohnson
Copy link
Author

to reproduce the issue, i used OmegaConfigLoader as you said. Indeed ConfigLoader is fine.
I will postpone using OmegaConfigLoader for the time being. Thank you for your help!

@ankatiyar
Copy link
Contributor

Thanks for reporting this @edwardcjohnson! I'll close this issue but this feature is on the agenda for OmegaConfigLoader #2175 😄

@github-project-automation github-project-automation bot moved this from In Progress to Done in Kedro Framework Mar 13, 2023
@edwardcjohnson
Copy link
Author

Thanks @ankatiyar!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue: Bug Report 🐞 Bug that needs to be fixed
Projects
Archived in project
Development

No branches or pull requests

3 participants