You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Due to Meltano/Singer format being vague around STATE messages, it is possible for some targets to nuke state when the tap does not produce a message when in INCREMENTAL mode.
When you subclass the Stream class and overwrite the get_records method, it is possible that when using a state/bookmark that the stream produces no records --perhaps because there are no new records in the source to be produced. The meltanoSDK does not produce a State message in this case when get_records does not produce a single record. Other, non-meltano taps have a different approach where they will always produce a state message even if no records were produced (example: tap-prometheus) .
So we have a situation where one tap will produce a record and another tap may not produce a record. Targets seem to interpret this in various ways. For example, target-bigquery assumes it will always receive at least on SCHEMA message when it processes the data. Unfortunately, if it does not receive a STATE message it ends up writing the empty dictionary as the state during finalization. I have opened a PR here to fix it on their end. Target-jsonl is not affected --it does not overwrite the state when no STATE message is ingested.
Now you can argue that Target-Bigquery shouldn't make this assumption and I would agree; however, I can find no specification in Singer for what the expected behavior should be when no records are produced in incremental mode. Due to the ambiguity, I think the SDK should always emit a STATE message in incremental mode even if that STATE message is the existing state due to no records being produced otherwise we are depending on undefined behavior in the target implementations.
Code
No response
The text was updated successfully, but these errors were encountered:
@edgarrmondragon awesome! and just to confirm, that PR it's not always going to emit an empty state, it's going to emit either the state the tap was invoked with (i.e. the incremental state from Meltano) or an empty state if none is provided, correct?
Singer SDK Version
master
Is this a regression?
Python Version
3.10
Bug scope
Taps (catalog, state, etc.)
Operating System
OS X
Description
Due to Meltano/Singer format being vague around STATE messages, it is possible for some targets to nuke state when the tap does not produce a message when in INCREMENTAL mode.
When you subclass the Stream class and overwrite the
get_records
method, it is possible that when using a state/bookmark that the stream produces no records --perhaps because there are no new records in the source to be produced. The meltanoSDK does not produce a State message in this case when get_records does not produce a single record. Other, non-meltano taps have a different approach where they will always produce a state message even if no records were produced (example: tap-prometheus) .So we have a situation where one tap will produce a record and another tap may not produce a record. Targets seem to interpret this in various ways. For example, target-bigquery assumes it will always receive at least on SCHEMA message when it processes the data. Unfortunately, if it does not receive a STATE message it ends up writing the empty dictionary as the state during finalization. I have opened a PR here to fix it on their end. Target-jsonl is not affected --it does not overwrite the state when no STATE message is ingested.
Now you can argue that Target-Bigquery shouldn't make this assumption and I would agree; however, I can find no specification in Singer for what the expected behavior should be when no records are produced in incremental mode. Due to the ambiguity, I think the SDK should always emit a STATE message in incremental mode even if that STATE message is the existing state due to no records being produced otherwise we are depending on undefined behavior in the target implementations.
Code
No response
The text was updated successfully, but these errors were encountered: