You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Originally created by @aaronsteers on 2021-03-31 18:16:33
This feature would add formal support for Database type streams.
Background
We removed database-type streams from the initial 0.1.0 on the basis of being a lower priority versus API-type streams.
Unlike SaaS taps, for DB-type taps, we can assume:
Catalogs are less stable. We should expect cached catalogs will require explicit refreshing.
Catalog detection should be decoupled from Stream definition. We expect the catalog to be defined by querying the information_schema or similar - and it should be much more performant to query this at the DB-level or schema-level, versus individually for each stream/table.
Fewer Stream classes are needed. One Stream class per source or one per extraction type is probably sufficient.
Catalogs are more authoritative. If a catalog declares a table and column, the Tap should assume it exists as defined in the catalog.
Types need to be overridable. Moreso than other stream types, we know database-type sources have a history of type incompatibilities. When provided, we should take care that we apply custom type declarations as given in the input catalog.
Implementation Proposal
Decoupling Streams from Catalog Discovery
Instead of streams objects reporting back their own schema, a SQLCatalogDiscovery class (or similar) will create a full catalog. Then, instances of the appropriate Stream class can be instantiated from the discovered catalog. This also means that the if an input catalog is provided, we can skip the discovery process entirely.
The SQLAlchemy tool (powered by DB-API 2.0) can provide out-of-box catalog discovery capabilities as well as entry-level select capabilities. This means that given a valid SQLAlchemy driver and connection URI, we can provide generic discovery and get_records() capabilities.
NOTE: We should probably develop a generic tap-sqlalchemy or tap-dbapi which purely leverage SQL Alchemy generic constructs. This would never be as performant as a custom-built tap but it would be good in general for interop purposes.
Developers Provide Performance improvements
There is probably little case for overriding discovery unless SQLAlchemy does not support Inspection for the database type in question. Since catalog discovery is generally not a performance bottleneck, and since it is likely to be cached anyway, a generic implementation should be "good enough" for 95% of DB types.
Provide developers with bulk-based performance boosts
The SQLAlchemy library will be good at generic selects but will not be able to take advantage of batching capabilities. I suggest we tap into the discussions around #9 (batch message type) to allow developers to define their batch capabilities.
The text was updated successfully, but these errors were encountered:
Migrated from GitLab: https://gitlab.com/meltano/sdk/-/issues/74
Originally created by @aaronsteers on 2021-03-31 18:16:33
This feature would add formal support for Database type streams.
Background
information_schema
or similar - and it should be much more performant to query this at the DB-level or schema-level, versus individually for each stream/table.Stream
classes are needed. One Stream class per source or one per extraction type is probably sufficient.Implementation Proposal
Decoupling Streams from Catalog Discovery
Instead of streams objects reporting back their own schema, a
SQLCatalogDiscovery
class (or similar) will create a full catalog. Then, instances of the appropriateStream
class can be instantiated from the discovered catalog. This also means that the if an input catalog is provided, we can skip the discovery process entirely.Docs links:
Entry-Level capabilities with
SQLAlchemy
The SQLAlchemy tool (powered by
DB-API 2.0
) can provide out-of-box catalog discovery capabilities as well as entry-level select capabilities. This means that given a valid SQLAlchemy driver and connection URI, we can provide generic discovery andget_records()
capabilities.tap-sqlalchemy
ortap-dbapi
which purely leverage SQL Alchemy generic constructs. This would never be as performant as a custom-built tap but it would be good in general for interop purposes.Developers Provide Performance improvements
There is probably little case for overriding discovery unless SQLAlchemy does not support Inspection for the database type in question. Since catalog discovery is generally not a performance bottleneck, and since it is likely to be cached anyway, a generic implementation should be "good enough" for 95% of DB types.
Provide developers with bulk-based performance boosts
The SQLAlchemy library will be good at generic selects but will not be able to take advantage of batching capabilities. I suggest we tap into the discussions around #9 (batch message type) to allow developers to define their batch capabilities.
The text was updated successfully, but these errors were encountered: