Support for Databae queries/Views as data sources #2945

mtsz-thiago · 2019-12-12T21:10:38Z

As it is not always convenient to keep all data sources as text files, csv and/or related formats, i think it would be nice to declare views and queries from databases as project's data sources.

efiop · 2019-12-12T21:24:05Z

Hi @mtsz-thiago ! Thanks for the request. Could you please elaborate? Maybe share some thoughts on how you see that working in dvc.

dmpetrov · 2019-12-23T08:46:24Z

@mtsz-thiago DVC usually works when you already extracted data from DB to files and ready for ML phase (which usually consumes raw data).

However, we are thinking about a tighter integration to DBs (see #1577 and #2378) when you can version and control ML phase and have some huck with some control over DB.

If your scenario is not ML, but more analytical and you spend 100% time in DB then dbt (data build tool) might be a better fit for you.

If I understand your question correctly - this issue is a duplicate of #1577. Please let me know if it's not.

mike-weinberg · 2020-02-04T17:31:11Z

Data governance on S3 is painful, data access on s3 is slow, and s3 has no built in compute. Warehouse environments like Snowflake, Bigquery, and (recently) redshift all offer low cost storage pricing. In many cases a data science project involves performing data prep on a data warehouse, so why not let the database table be an option for a storage backend. this would greatly reduce the amount of glue code required to make DVC work.

DBT is a good tool, but often times it's more overhead than data scientists want. Don't even get me started on airflow for development.

It's becoming increasingly common for companies to implement snapshot strategies of their raw data because storage prices are so cheap on BQ, Snowflake, etc, and so a snapshot timestamp plus a set of tables is a great abstraction that DVC could leverage. Food for thought.

triage-new-issues bot added the triage Needs to be triaged label Dec 12, 2019

efiop added awaiting response we are waiting for your reply, please respond! :) feature request Requesting a new feature labels Dec 12, 2019

triage-new-issues bot removed the triage Needs to be triaged label Dec 12, 2019

dmpetrov closed this as completed Dec 23, 2019

mdekstrand mentioned this issue Mar 12, 2020

Support callback dependencies #2378

Open

dberenbaum mentioned this issue Sep 14, 2023

Epic: Database table/non-file dependencies #9945

Closed

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for Databae queries/Views as data sources #2945

Support for Databae queries/Views as data sources #2945

mtsz-thiago commented Dec 12, 2019

efiop commented Dec 12, 2019

dmpetrov commented Dec 23, 2019

mike-weinberg commented Feb 4, 2020

Support for Databae queries/Views as data sources #2945

Support for Databae queries/Views as data sources #2945

Comments

mtsz-thiago commented Dec 12, 2019

efiop commented Dec 12, 2019

dmpetrov commented Dec 23, 2019

mike-weinberg commented Feb 4, 2020