Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: create schema and table on add_sink #1036

Merged
merged 28 commits into from
Oct 19, 2022
Merged

fix: create schema and table on add_sink #1036

merged 28 commits into from
Oct 19, 2022

Conversation

kgpayne
Copy link
Contributor

@kgpayne kgpayne commented Oct 4, 2022

@kgpayne kgpayne marked this pull request as draft October 4, 2022 14:52
@kgpayne kgpayne marked this pull request as ready for review October 4, 2022 16:39
@codecov
Copy link

codecov bot commented Oct 4, 2022

Codecov Report

Attention: Patch coverage is 90.47619% with 4 lines in your changes missing coverage. Please review.

Project coverage is 83.39%. Comparing base (fab2cc3) to head (7139d08).
Report is 1069 commits behind head on main.

Files with missing lines Patch % Lines
singer_sdk/streams/sql.py 80.95% 1 Missing and 3 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1036      +/-   ##
==========================================
+ Coverage   83.13%   83.39%   +0.26%     
==========================================
  Files          39       39              
  Lines        3747     3758      +11     
  Branches      628      628              
==========================================
+ Hits         3115     3134      +19     
+ Misses        470      464       -6     
+ Partials      162      160       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@kgpayne
Copy link
Contributor Author

kgpayne commented Oct 5, 2022

I see codecov is failing. Wasn't exactly sure where best to add tests, but will look again in the morning 👍

@aaronsteers
Copy link
Contributor

@kgpayne - Looks like we've all had a first pass at this, and I see you are making a couple iterations, so I'll mark as draft. Feel free to move back to ready status once you are ready for final review. 👍

@aaronsteers aaronsteers marked this pull request as draft October 11, 2022 20:54
@kgpayne kgpayne marked this pull request as ready for review October 12, 2022 12:34
tests/core/test_sqlite.py Show resolved Hide resolved
singer_sdk/sinks/sql.py Show resolved Hide resolved
singer_sdk/sinks/sql.py Outdated Show resolved Hide resolved
@BuzzCutNorman
Copy link
Contributor

BuzzCutNorman commented Oct 18, 2022

@kgpayne I was wondering how should a developer set a target's default schema once this code is put in place? Say for argument's sake I created a target-postgres and set the default schema on connect with some code like this.

        @event.listens_for(engine, "connect", insert=True)
        def set_search_path(dbapi_connection, connection_record):
            existing_autocommit = dbapi_connection.autocommit
            dbapi_connection.autocommit = True
            cursor = dbapi_connection.cursor()
            cursor.execute(f"SET SESSION search_path='{self.config.get('schema') or 'public'}'")
            cursor.close()
            dbapi_connection.autocommit = existing_autocommit

I believe with this PR in place any blank table would be created in the tap's schema not in the target's default schema from the meltano.yml

@aaronsteers
Copy link
Contributor

aaronsteers commented Oct 18, 2022

@BuzzCutNorman - Thanks for raising.

@kgpayne - There are probably several discussion points to unwind here, but what do you think of using a setting name of load_schema (aligned with internal Meltano naming) or default_target_schema (Pipelinewise convention)?

New discussion here where we can dive deep:

If we can remove the reference to config.get('schema') and/or config["schema"] from this PR and apply a calculated schema when available (and just don't set a load schema otherwise?), that safely would push the new scope to a subsequent PR.

What do you think?

@kgpayne
Copy link
Contributor Author

kgpayne commented Oct 19, 2022

@aaronsteers @BuzzCutNorman

If we can remove the reference to config.get('schema') and/or config["schema"] from this PR...

We don't currently have any references to config.get('schema') in this PR or on main - that call is in BuzzCutNormans' example only. This PR already implements the behaviour you describe:

apply a calculated schema when available (and just don't set a load schema otherwise?)

I.e. the incoming schema_name property will try to derive a schema name (by splitting the stream name on -) and return None if the name is not splittable 👍

I posted feedback on the discussion too, but TLDR; is that we don't currently allow the specification of a schema/search path during the construction of the SQLAlchemy engine, but could easily do so with a connect_args setting (here or in another PR). This would then provide a default schema in cases where the schema_name property returns None, without the need for a custom snippet (like the one BuzzCutNorman suggested) or the overriding of create_sqlalchemy_engine on the SQLConnecter class 🙂 In future we could additionally support a load_schema to 'force' the schema to a user value irrespective of connection or derived schema names.

WDYT?

@aaronsteers
Copy link
Contributor

We don't currently have any references to config.get('schema') in this PR or on main - that call is in BuzzCutNormans' example only.

This was my misreading then. Thanks very much for clarifying.

@kgpayne
Copy link
Contributor Author

kgpayne commented Oct 19, 2022

We don't currently have any references to config.get('schema') in this PR or on main - that call is in BuzzCutNormans' example only.

This was my misreading then. Thanks very much for clarifying.

@aaronsteers I had to check, as I thought we used it somewhere too 🙂 Whilst its common in SQL target implementations (as you point out in the discussion) it isn't (yet) in the SDK 👍

Ken Payne and others added 2 commits October 19, 2022 17:08
Copy link
Contributor

@aaronsteers aaronsteers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kgpayne - Looks good from my side.

@edgarrmondragon - Any other changes you'd want to see in this iteration?

Copy link
Collaborator

@edgarrmondragon edgarrmondragon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kgpayne Looks good to me!

@aaronsteers
Copy link
Contributor

aaronsteers commented Oct 19, 2022

@BuzzCutNorman re:

@kgpayne I was wondering how should a developer set a target's default schema once this code is put in place?

I think we should follow this PR up with this one:

That would give a standard way for users to control the target load schemas. And presumably those behaviors would be built into the SQLStream.schema_name property (or a similar built-in featureset), such that the schema_name returned there would honor the user's default schema preference and/or a preference for schema (re)mappings.

@kgpayne kgpayne merged commit 2721dc5 into main Oct 19, 2022
@kgpayne kgpayne deleted the kgpayne/issue1027 branch October 19, 2022 22:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bug: Tables not created if no records arrive with SQLSink
4 participants