You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Deduping on primary keys during insert or upsert: TODO
Latest Spec as of 2021-10-29:
SQLConnector class
The new SQLConnector class handles the following:
- connecting to the source
- generating SQLAlchemy connection and engine objects
- discovering schema catalog entries
- performing type conversions to/from JSONSchema types
- dialect-specific functions, such as escaping and fully qualified names
Most developers should only need to override get_sqlalchemy_url. We also have the option for developers override create_sqlalchemy_engine() and/or create_sqlalchemy_connection() for purposes of performance tuning on the base engine and connection config.
Discovery
Discovery is handled by the method discover_catalog_entries(tap_config: dict) -> Dict[str, List[dict]]. This should not need to be overridden unless there are bugs and/or gaps in the DB API implementation.
Type conversions from SQL to JSON Schema
In the short term, some developers may also need to extend to_jsonschema_type() for any type conversions which are not standard or not yet handled by the SDK. That effort should shrink as more contributions come back to the SDK for generic and robust type conversion for an increasing number of SQL data types.
SQLStream class
The SQLStream leverages a SQLConnector for all core functionality.
connector_class - reference to the custom SQLConnector class
The long-term vision for performance tuning is to implement batch handlers as part of #9.
SQLTap class
The SQLTap leverages the SQLConnector from your stream class for all core functionality.
default_stream_class - reference to the custom SQLStream class. That tap will use its connector to discovery available streams.
Target and Sink implementation
This MR now incorporates the target implementation from !200. Essentially, both taps and targets rely on the SQLConnector class (which in turn build on sqlalchemy and DBAPI 2.0). By combining both taps and targets, the "end-to-end" and "round-trip" tests are more robust, and the overall implementation of SQLConnector class is better and more fully implemented.
Future iterations should add more tests and should expand the "standard" tests suite.
Important remaining todos:
Decide on connection pool strategy (currently each table's stream gets its own engine and connection).
Decide on stream_name and tap_stream_id naming conventions.
Address execution_options(stream_results=True) feedback regarding streaming instead of holding records in-mem
JSON Schema to SQL Type conversions - flexibility of allowing developers to override those. (Per @edgarrmondragon comment.)
Incremental not yet supported.
Property selection from catalog metadata (handled in base SDK code, can be further optimized in future.)
The text was updated successfully, but these errors were encountered:
Merges 74-database-type-streams -> main
Migrated from GitLab: https://gitlab.com/meltano/sdk/-/merge_requests/44
Not handled here:
Latest Spec as of 2021-10-29:
SQLConnector
classThe new SQLConnector class handles the following:
Most developers should only need to override
get_sqlalchemy_url
. We also have the option for developers overridecreate_sqlalchemy_engine()
and/orcreate_sqlalchemy_connection()
for purposes of performance tuning on the base engine and connection config.Discovery
Discovery is handled by the method
discover_catalog_entries(tap_config: dict) -> Dict[str, List[dict]]
. This should not need to be overridden unless there are bugs and/or gaps in the DB API implementation.Type conversions from SQL to JSON Schema
In the short term, some developers may also need to extend
to_jsonschema_type()
for any type conversions which are not standard or not yet handled by the SDK. That effort should shrink as more contributions come back to the SDK for generic and robust type conversion for an increasing number of SQL data types.SQLStream
classThe
SQLStream
leverages aSQLConnector
for all core functionality.connector_class
- reference to the custom SQLConnector classStream performance tuning - option 1:
get_records()
To improve performance, developers may optionally override
get_records()
if they can provide better performance vs the generic SQLAlchemy interfaces.The base built-in implementation for
get_records()
is:Stream performance tuning - option 2:
BATCH
(#9)The long-term vision for performance tuning is to implement batch handlers as part of #9.
SQLTap
classThe
SQLTap
leverages theSQLConnector
from your stream class for all core functionality.default_stream_class
- reference to the customSQLStream
class. That tap will use its connector to discovery available streams.Target and Sink implementation
This MR now incorporates the target implementation from !200. Essentially, both taps and targets rely on the
SQLConnector
class (which in turn build onsqlalchemy
andDBAPI 2.0
). By combining both taps and targets, the "end-to-end" and "round-trip" tests are more robust, and the overall implementation ofSQLConnector
class is better and more fully implemented.Future iterations should add more tests and should expand the "standard" tests suite.
Important remaining todos:
stream_name
andtap_stream_id
naming conventions.execution_options(stream_results=True)
feedback regarding streaming instead of holding records in-memThe text was updated successfully, but these errors were encountered: