Rename most usages of “bad rows” to “failed events” #915

stanch · 2024-06-07T15:51:43Z

“Failed events” and “bad rows” are used interchangeably in the docs, however we should have a single term for this concept (which is “failed events”).

That said, it still makes sense to use “bad rows” in parts of the docs that are closely tied to working with the (soon to be legacy) badrow format.

This commit changes all colloquial usages of “bad rows” to “failed events” and retains “bad rows” in APIs and pages related to recovery and querying S3/GCS.

In the future, we might need to distinguish between “failed events, new format” (warehouse) and “failed events, old format” (aka “bad rows”, S3/GCS), but we will do so as needed by searching for “failed events” references and qualifying them. In most contexts, only the fact that there is a failed event matters, not the format.

“Failed events” and “bad rows” are used interchangeably in the docs, however we should have a single term for this concept (which is “failed events”). That said, it still makes sense to use “bad rows” in parts of the docs that are closely tied to working with the (soon to be legacy) badrow format. This commit changes all colloquial usages of “bad rows” to “failed events” and retains “bad rows” in APIs and pages related to recovery and querying S3/GCS. In the future, we might need to distinguish between “failed events, new format” (warehouse) and “failed events, old format” (aka “bad rows”, S3/GCS), but we will do so as needed by searching for “failed events” references and qualifying them. In most contexts, only the fact that there is a failed event matters, not the format.

netlify · 2024-06-07T15:52:01Z

✅ Deploy Preview for snowplow-docs ready!

Name	Link
🔨 Latest commit	`1e09d2d`
🔍 Latest deploy log	https://app.netlify.com/sites/snowplow-docs/deploys/66632c9149d8a50008e0c26f
😎 Deploy Preview	https://deploy-preview-915--snowplow-docs.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

stanch · 2024-06-07T15:52:59Z

...ons/java-tracker/previous-versions/java-tracker-v0-12/custom-tracking-using-schemas/index.md

@@ -38,7 +38,7 @@ The Java tracker does not yet provide the ability to automatically assign entiti

 The Java tracker provides the `SelfDescribingJson` class for custom events and entities. There is no in-built distinction between schemas used for events and those used for entities: they can be used interchangably.

-Your schemas must be accessible to your pipeline, within an [Iglu server](/docs/pipeline-components-and-applications/iglu/index.md). Tracked events containing self-describing JSON are validated against their schemas during the enrichment phase of the pipeline. If the data don't match the schema, the events end up in the Bad Rows storage instead of the data warehouse.


This is the kind of sentences I want to avoid, as it implies that failed events don’t make it to the warehouse :)

stanch · 2024-06-07T15:54:28Z

docs/destinations/forwarding-events/elasticsearch/index.md

@@ -217,7 +217,7 @@ The sink is configured using a HOCON file, for which you can find examples [her
 | output.good.cluster.documentType | Optional. The Elasticsearch index type. Index types are deprecated in ES >=7.x Therefore, it shouldn't be set with ES >=7.x |
 | output.good.chunk.byteLimit | Optional. Bulk request to Elasticsearch will be splitted to chunks according given byte limit. Default value 1000000. |
 | output.good.chunk.recordLimit | Optional. Bulk request to Elasticsearch will be splitted to chunks according given record limit. Default value 500. |
-| output.bad.type | Required. Configure where to write bad rows. Can be "kinesis", "nsq", "stderr" or "none". |
+| output.bad.type | Required. Configure where to write failed events. Can be "kinesis", "nsq", "stderr" or "none". |


We will need to adjust this for Enrich 5.0.0 because we will have 2 streams of failed events. I prefer to say they are both failed events in 2 different formats, rather than say there are bad rows and there are failed events (or there are bad rows and there are incomplete events).

But for now just making it uniform.

output.incomplete.* is on its way, I'll prepare the docs next week, how should we differentiate them ? Failed events for the warehouse and failed events for storage ?

stanch · 2024-06-07T15:55:05Z

docs/enriching-your-data/available-enrichments/custom-api-request-enrichment/index.md

@@ -137,7 +137,7 @@ To disable `ttl` so keys could be stored in cache until job is done `0` valu

 #### `ignoreOnError`

-When set to `true`, no bad row will be emitted if the API call fails and the enriched event will be emitted without the context added by this enrichment.
+When set to `true`, no failed event will be emitted if the API call fails and the enriched event will be emitted without the context added by this enrichment.


Useful change, because a failed event is emitted regardless of the format.

stanch · 2024-06-07T15:56:22Z

docs/pipeline-components-and-applications/loaders-storage-targets/bigquery-loader/index.md

 - If a pulled record is a valid event, Repeater will wait some time (15 minutes by default) after the `etl_tstamp` before attempting to re-insert it, in order to let Mutator do its job.
- If the database responds with an error, the row will get transformed into a `loader_recovery_error` bad row.
- All entities in the dead-letter bucket are valid Snowplow [bad rows](https://github.com/snowplow/snowplow-badrows).
+- If the database responds with an error, the row will get transformed into a `loader_recovery_error` failed event.


Note: docs for very old versions are deleted because I don’t have the patience to address this again and again :D

benjben

Great changes 👌

stanch commented Jun 7, 2024

View reviewed changes

benjben approved these changes Jun 8, 2024

View reviewed changes

stanch merged commit c7f2079 into main Jun 11, 2024
4 checks passed

stanch deleted the failed-events-bad-rows branch June 11, 2024 10:55

stanch mentioned this pull request Jun 11, 2024

Add BigQuery Loader V2 #877

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rename most usages of “bad rows” to “failed events” #915

Rename most usages of “bad rows” to “failed events” #915

stanch commented Jun 7, 2024

netlify bot commented Jun 7, 2024 •

edited

Loading

stanch Jun 7, 2024

stanch Jun 7, 2024

benjben Jun 8, 2024

stanch Jun 7, 2024

stanch Jun 7, 2024

benjben left a comment

		@@ -38,7 +38,7 @@ The Java tracker does not yet provide the ability to automatically assign entiti

		The Java tracker provides the `SelfDescribingJson` class for custom events and entities. There is no in-built distinction between schemas used for events and those used for entities: they can be used interchangably.

		Your schemas must be accessible to your pipeline, within an [Iglu server](/docs/pipeline-components-and-applications/iglu/index.md). Tracked events containing self-describing JSON are validated against their schemas during the enrichment phase of the pipeline. If the data don't match the schema, the events end up in the Bad Rows storage instead of the data warehouse.

Rename most usages of “bad rows” to “failed events” #915

Rename most usages of “bad rows” to “failed events” #915

Conversation

stanch commented Jun 7, 2024

netlify bot commented Jun 7, 2024 • edited Loading

✅ Deploy Preview for snowplow-docs ready!

stanch Jun 7, 2024

Choose a reason for hiding this comment

stanch Jun 7, 2024

Choose a reason for hiding this comment

benjben Jun 8, 2024

Choose a reason for hiding this comment

stanch Jun 7, 2024

Choose a reason for hiding this comment

stanch Jun 7, 2024

Choose a reason for hiding this comment

benjben left a comment

Choose a reason for hiding this comment

netlify bot commented Jun 7, 2024 •

edited

Loading