Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: STATE not emitted if get_records produces no values #1750

Closed
1 task
shopmatfournier opened this issue Jun 5, 2023 · 4 comments · Fixed by #1753
Closed
1 task

bug: STATE not emitted if get_records produces no values #1750

shopmatfournier opened this issue Jun 5, 2023 · 4 comments · Fixed by #1753
Labels
kind/Bug Something isn't working valuestream/SDK

Comments

@shopmatfournier
Copy link

Singer SDK Version

master

Is this a regression?

  • Yes

Python Version

3.10

Bug scope

Taps (catalog, state, etc.)

Operating System

OS X

Description

Due to Meltano/Singer format being vague around STATE messages, it is possible for some targets to nuke state when the tap does not produce a message when in INCREMENTAL mode.

When you subclass the Stream class and overwrite the get_records method, it is possible that when using a state/bookmark that the stream produces no records --perhaps because there are no new records in the source to be produced. The meltanoSDK does not produce a State message in this case when get_records does not produce a single record. Other, non-meltano taps have a different approach where they will always produce a state message even if no records were produced (example: tap-prometheus) .

So we have a situation where one tap will produce a record and another tap may not produce a record. Targets seem to interpret this in various ways. For example, target-bigquery assumes it will always receive at least on SCHEMA message when it processes the data. Unfortunately, if it does not receive a STATE message it ends up writing the empty dictionary as the state during finalization. I have opened a PR here to fix it on their end. Target-jsonl is not affected --it does not overwrite the state when no STATE message is ingested.

Now you can argue that Target-Bigquery shouldn't make this assumption and I would agree; however, I can find no specification in Singer for what the expected behavior should be when no records are produced in incremental mode. Due to the ambiguity, I think the SDK should always emit a STATE message in incremental mode even if that STATE message is the existing state due to no records being produced otherwise we are depending on undefined behavior in the target implementations.

Code

No response

@shopmatfournier shopmatfournier added kind/Bug Something isn't working valuestream/SDK labels Jun 5, 2023
@tayloramurphy
Copy link
Collaborator

This seems like a reasonable proposal to me. @edgarrmondragon @kgpayne ?

@edgarrmondragon
Copy link
Collaborator

edgarrmondragon commented Jun 6, 2023

This seems like a reasonable proposal to me. @edgarrmondragon @kgpayne ?

@tayloramurphy I agree, it seems like a quick win and should be safe for any target to receive a state message before anything else.

EDIT: Got a PR #1753

@tayloramurphy
Copy link
Collaborator

@edgarrmondragon awesome! and just to confirm, that PR it's not always going to emit an empty state, it's going to emit either the state the tap was invoked with (i.e. the incremental state from Meltano) or an empty state if none is provided, correct?

@edgarrmondragon
Copy link
Collaborator

@tayloramurphy correct. It'll emit the value contained in the state file if one is passed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/Bug Something isn't working valuestream/SDK
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants