SDK support for Database-type Streams #74

MeltyBot · 2021-03-31T18:16:33Z

Migrated from GitLab: https://gitlab.com/meltano/sdk/-/issues/74

Originally created by @aaronsteers on 2021-03-31 18:16:33

This feature would add formal support for Database type streams.

Background

We removed database-type streams from the initial 0.1.0 on the basis of being a lower priority versus API-type streams.
Unlike SaaS taps, for DB-type taps, we can assume:
1. Catalogs are less stable. We should expect cached catalogs will require explicit refreshing.
2. Catalog detection should be decoupled from Stream definition. We expect the catalog to be defined by querying the information_schema or similar - and it should be much more performant to query this at the DB-level or schema-level, versus individually for each stream/table.
3. Fewer Stream classes are needed. One Stream class per source or one per extraction type is probably sufficient.
4. Catalogs are more authoritative. If a catalog declares a table and column, the Tap should assume it exists as defined in the catalog.
  1. Types need to be overridable. Moreso than other stream types, we know database-type sources have a history of type incompatibilities. When provided, we should take care that we apply custom type declarations as given in the input catalog.

Implementation Proposal

Decoupling Streams from Catalog Discovery

Instead of streams objects reporting back their own schema, a SQLCatalogDiscovery class (or similar) will create a full catalog. Then, instances of the appropriate Stream class can be instantiated from the discovered catalog. This also means that the if an input catalog is provided, we can skip the discovery process entirely.

Docs links:

https://docs.sqlalchemy.org/en/14/core/reflection.html#reflecting-all-tables-at-once

Entry-Level capabilities with `SQLAlchemy`

The SQLAlchemy tool (powered by DB-API 2.0) can provide out-of-box catalog discovery capabilities as well as entry-level select capabilities. This means that given a valid SQLAlchemy driver and connection URI, we can provide generic discovery and get_records() capabilities.

NOTE: We should probably develop a generic tap-sqlalchemy or tap-dbapi which purely leverage SQL Alchemy generic constructs. This would never be as performant as a custom-built tap but it would be good in general for interop purposes.

Developers Provide Performance improvements

There is probably little case for overriding discovery unless SQLAlchemy does not support Inspection for the database type in question. Since catalog discovery is generally not a performance bottleneck, and since it is likely to be cached anyway, a generic implementation should be "good enough" for 95% of DB types.

Provide developers with bulk-based performance boosts

The SQLAlchemy library will be good at generic selects but will not be able to take advantage of batching capabilities. I suggest we tap into the discussions around #9 (batch message type) to allow developers to define their batch capabilities.

The text was updated successfully, but these errors were encountered:

MeltyBot · 2022-05-29T23:33:22Z

View 13 previous comments from the original issue on GitLab

MeltyBot assigned aaronsteers May 29, 2022

MeltyBot closed this as completed May 29, 2022

This was referenced May 29, 2022

Planning for our "1.0" SDK release #188

Closed

SQL-type Targets and Sinks #261

Closed

Add support for SQL Taps and Targets - [merged] #435

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SDK support for Database-type Streams #74

SDK support for Database-type Streams #74

MeltyBot commented Mar 31, 2021 •

edited

Loading

MeltyBot commented May 29, 2022

SDK support for Database-type Streams #74

SDK support for Database-type Streams #74

Comments

MeltyBot commented Mar 31, 2021 • edited Loading

Background

Implementation Proposal

Decoupling Streams from Catalog Discovery

Entry-Level capabilities with SQLAlchemy

Developers Provide Performance improvements

Provide developers with bulk-based performance boosts

MeltyBot commented May 29, 2022

MeltyBot commented Mar 31, 2021 •

edited

Loading

Entry-Level capabilities with `SQLAlchemy`