-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rename most usages of “bad rows” to “failed events” #915
Conversation
“Failed events” and “bad rows” are used interchangeably in the docs, however we should have a single term for this concept (which is “failed events”). That said, it still makes sense to use “bad rows” in parts of the docs that are closely tied to working with the (soon to be legacy) badrow format. This commit changes all colloquial usages of “bad rows” to “failed events” and retains “bad rows” in APIs and pages related to recovery and querying S3/GCS. In the future, we might need to distinguish between “failed events, new format” (warehouse) and “failed events, old format” (aka “bad rows”, S3/GCS), but we will do so as needed by searching for “failed events” references and qualifying them. In most contexts, only the fact that there is a failed event matters, not the format.
✅ Deploy Preview for snowplow-docs ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
@@ -38,7 +38,7 @@ The Java tracker does not yet provide the ability to automatically assign entiti | |||
|
|||
The Java tracker provides the `SelfDescribingJson` class for custom events and entities. There is no in-built distinction between schemas used for events and those used for entities: they can be used interchangably. | |||
|
|||
Your schemas must be accessible to your pipeline, within an [Iglu server](/docs/pipeline-components-and-applications/iglu/index.md). Tracked events containing self-describing JSON are validated against their schemas during the enrichment phase of the pipeline. If the data don't match the schema, the events end up in the Bad Rows storage instead of the data warehouse. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the kind of sentences I want to avoid, as it implies that failed events don’t make it to the warehouse :)
@@ -217,7 +217,7 @@ The sink is configured using a HOCON file, for which you can find examples [her | |||
| output.good.cluster.documentType | Optional. The Elasticsearch index type. Index types are deprecated in ES >=7.x Therefore, it shouldn't be set with ES >=7.x | | |||
| output.good.chunk.byteLimit | Optional. Bulk request to Elasticsearch will be splitted to chunks according given byte limit. Default value 1000000. | | |||
| output.good.chunk.recordLimit | Optional. Bulk request to Elasticsearch will be splitted to chunks according given record limit. Default value 500. | | |||
| output.bad.type | Required. Configure where to write bad rows. Can be "kinesis", "nsq", "stderr" or "none". | | |||
| output.bad.type | Required. Configure where to write failed events. Can be "kinesis", "nsq", "stderr" or "none". | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will need to adjust this for Enrich 5.0.0 because we will have 2 streams of failed events. I prefer to say they are both failed events in 2 different formats, rather than say there are bad rows and there are failed events (or there are bad rows and there are incomplete events).
But for now just making it uniform.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
output.incomplete.*
is on its way, I'll prepare the docs next week, how should we differentiate them ? Failed events for the warehouse and failed events for storage ?
@@ -137,7 +137,7 @@ To disable `ttl` so keys could be stored in cache until job is done `0` valu | |||
|
|||
#### `ignoreOnError` | |||
|
|||
When set to `true`, no bad row will be emitted if the API call fails and the enriched event will be emitted without the context added by this enrichment. | |||
When set to `true`, no failed event will be emitted if the API call fails and the enriched event will be emitted without the context added by this enrichment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Useful change, because a failed event is emitted regardless of the format.
- If a pulled record is a valid event, Repeater will wait some time (15 minutes by default) after the `etl_tstamp` before attempting to re-insert it, in order to let Mutator do its job. | ||
- If the database responds with an error, the row will get transformed into a `loader_recovery_error` bad row. | ||
- All entities in the dead-letter bucket are valid Snowplow [bad rows](https://github.com/snowplow/snowplow-badrows). | ||
- If the database responds with an error, the row will get transformed into a `loader_recovery_error` failed event. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: docs for very old versions are deleted because I don’t have the patience to address this again and again :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great changes 👌
“Failed events” and “bad rows” are used interchangeably in the docs, however we should have a single term for this concept (which is “failed events”).
That said, it still makes sense to use “bad rows” in parts of the docs that are closely tied to working with the (soon to be legacy) badrow format.
This commit changes all colloquial usages of “bad rows” to “failed events” and retains “bad rows” in APIs and pages related to recovery and querying S3/GCS.
In the future, we might need to distinguish between “failed events, new format” (warehouse) and “failed events, old format” (aka “bad rows”, S3/GCS), but we will do so as needed by searching for “failed events” references and qualifying them. In most contexts, only the fact that there is a failed event matters, not the format.