Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename most usages of “bad rows” to “failed events” #915

Merged
merged 1 commit into from
Jun 11, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@ There are several key concepts in Snowplow: events (self-describing, structured)
**Please, use up-to-date terms:**
* _Self-describing event_, not _unstructured event_
* _Entities_, not _contexts_ (it’s ok-ish to refer to a set of entities as “context”, but only in a casual sense, as in “these provide some context to the event”)
* _Failed events_ and not _bad rows_
* _Failed events_ and not _bad rows_, unless specifically referring to the legacy bad row JSON format and associated tooling
* If you are writing about schemas, pick “schema” or “data structure” and stick with it

**Please, do not over-explain these in any of your writing.** Instead, just link to one of the existing concept pages:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,15 @@ sidebar_position: 30

Self-describing (self-referential) JSON schemas are at the core of Snowplow tracking. Read more about them [here](/docs/understanding-your-pipeline/schemas/index.md). They allow you to track completely customised data, and are also used internally throughout Snowplow pipelines.

In all our trackers, self-describing JSON are used in two places. One is in the `SelfDescribing` event type that wraps custom self-describing JSONs for sending. The second use is to attach custom data to any tracked event. It's one of the most powerful Snowplow features.
In all our trackers, self-describing JSON are used in two places. One is in the `SelfDescribing` event type that wraps custom self-describing JSONs for sending. The second use is to attach custom data to any tracked event. It's one of the most powerful Snowplow features.

When tracking user behavior, the event describes the specific activity they performed, e.g. a user added an item to an eCommerce cart. To understand the meaning of the event, and how it relates to your business, it's ideal to also track the relatively persistent environment in which the activity was performed. For example, is the user a repeat customer? Which item did they add, and how many are in stock?
When tracking user behavior, the event describes the specific activity they performed, e.g. a user added an item to an eCommerce cart. To understand the meaning of the event, and how it relates to your business, it's ideal to also track the relatively persistent environment in which the activity was performed. For example, is the user a repeat customer? Which item did they add, and how many are in stock?

These environmental factors can be tracked as the event "context", using self-describing JSON. When self-describing JSON are tracked as part of an event, they are called "entities". All the entities of an event together form the context. Read more in this [thorough blog post](https://snowplowanalytics.com/blog/2020/03/25/what-are-snowplow-events-and-entities-and-what-makes-them-so-powerful/).

### Adding custom entities to any event

Every `Event.Builder` in the Java tracker allows for a list of `SelfDescribingJson` objects to be added to the `Event`. It's fine to add multiple entities of the same type. There's no official limit to how many entities you can add to a single event, but consider if the payload size could become problematic if you are adding a large number.
Every `Event.Builder` in the Java tracker allows for a list of `SelfDescribingJson` objects to be added to the `Event`. It's fine to add multiple entities of the same type. There's no official limit to how many entities you can add to a single event, but consider if the payload size could become problematic if you are adding a large number.

Context entities can be added to any event using the `customContext()` Builder method:
```java
Expand All @@ -36,11 +36,11 @@ The Java tracker does not yet provide the ability to automatically assign entiti

The Java tracker provides the `SelfDescribingJson` class for custom events and entities. There is no in-built distinction between schemas used for events and those used for entities: they can be used interchangably.

Your schemas must be accessible to your pipeline, within an [Iglu server](/docs/pipeline-components-and-applications/iglu/index.md). Tracked events containing self-describing JSON are validated against their schemas during the enrichment phase of the pipeline. If the data don't match the schema, the events end up in the Bad Rows storage instead of the data warehouse.
Your schemas must be accessible to your pipeline, within an [Iglu server](/docs/pipeline-components-and-applications/iglu/index.md). Tracked events containing self-describing JSON are validated against their schemas during the enrichment phase of the pipeline. If the data don't match the schema, the events end up as [failed events](/docs/understanding-your-pipeline/failed-events/index.md).

A self-describing JSON needs two keys, `schema` and `data`. The `schema` key is the Iglu URI for the schema. The `data` value must match the properties described by the specified schema. It is usually provided as a map.

A simple initialisation looks like this:
A simple initialisation looks like this:
```java
// This map will be used for the "data" key
Map<String, String> eventData = new HashMap<>();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ The Java tracker does not yet provide the ability to automatically assign entiti

The Java tracker provides the `SelfDescribingJson` class for custom events and entities. There is no in-built distinction between schemas used for events and those used for entities: they can be used interchangably.

Your schemas must be accessible to your pipeline, within an [Iglu server](/docs/pipeline-components-and-applications/iglu/index.md). Tracked events containing self-describing JSON are validated against their schemas during the enrichment phase of the pipeline. If the data don't match the schema, the events end up in the Bad Rows storage instead of the data warehouse.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the kind of sentences I want to avoid, as it implies that failed events don’t make it to the warehouse :)

Your schemas must be accessible to your pipeline, within an [Iglu server](/docs/pipeline-components-and-applications/iglu/index.md). Tracked events containing self-describing JSON are validated against their schemas during the enrichment phase of the pipeline. If the data don't match the schema, the events end up as [failed events](/docs/understanding-your-pipeline/failed-events/index.md).

A self-describing JSON needs two keys, `schema` and `data`. The `schema` key is the Iglu URI for the schema. The `data` value must match the properties described by the specified schema. It is usually provided as a map.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ A Self Describing event is a [self-describing JSON](http://snowplowanalytics.com
**Required properties**

- `schema`: (string) – A valid Iglu schema path. This must point to the location of the custom event’s schema, of the format: `iglu:{vendor}/{name}/{format}/{version}`.
- `data`: (object) – The custom data for your event. This data must conform to the schema specified in the `schema` argument, or the event will fail validation and land in bad rows.
- `data`: (object) – The custom data for your event. This data must conform to the schema specified in the `schema` argument, or the event will fail validation and become a [failed event](/docs/understanding-your-pipeline/failed-events/index.md).

To track a custom self-describing event, use the `trackSelfDescribingEvent` method of the tracker.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ tracker.trackScreenViewEvent({screenName: 'myScreenName'});
In the previous 0.1.x releases, initializing the tracker was done differently. As an example describing the API change for a quick migration to v0.2.0:

```typescript
/* Previous API (v0.1.x)
/* Previous API (v0.1.x)
import Tracker from '@snowplow/react-native-tracker'; // (a)

const initPromise = Tracker.initialize({ // (b)
Expand Down Expand Up @@ -273,7 +273,7 @@ tracker.trackSelfDescribingEvent({
**Required properties**:

- `schema`: (string) – A valid Iglu schema path. This must point to the location of the custom event’s schema, of the format: `iglu:{vendor}/{name}/{format}/{version}`.
- `data`: (object) – The custom data for your event. This data must conform to the schema specified in the `schema` argument, or the event will fail validation and land in bad rows.
- `data`: (object) – The custom data for your event. This data must conform to the schema specified in the `schema` argument, or the event will fail validation and become a [failed event](/docs/understanding-your-pipeline/failed-events/index.md).

To attach custom contexts, pass a second argument to the function, containing an array of self-describing JSON.

Expand Down
2 changes: 1 addition & 1 deletion docs/destinations/forwarding-events/elasticsearch/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -217,7 +217,7 @@ The sink is configured using a HOCON file, for which you can find examples [her
| output.good.cluster.documentType | Optional. The Elasticsearch index type. Index types are deprecated in ES >=7.x Therefore, it shouldn't be set with ES >=7.x |
| output.good.chunk.byteLimit | Optional. Bulk request to Elasticsearch will be splitted to chunks according given byte limit. Default value 1000000. |
| output.good.chunk.recordLimit | Optional. Bulk request to Elasticsearch will be splitted to chunks according given record limit. Default value 500. |
| output.bad.type | Required. Configure where to write bad rows. Can be "kinesis", "nsq", "stderr" or "none". |
| output.bad.type | Required. Configure where to write failed events. Can be "kinesis", "nsq", "stderr" or "none". |
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will need to adjust this for Enrich 5.0.0 because we will have 2 streams of failed events. I prefer to say they are both failed events in 2 different formats, rather than say there are bad rows and there are failed events (or there are bad rows and there are incomplete events).

But for now just making it uniform.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

output.incomplete.* is on its way, I'll prepare the docs next week, how should we differentiate them ? Failed events for the warehouse and failed events for storage ?

| output.bad.streamName | Required. Stream name for events which are rejected by Elasticsearch. |
| output.bad.region | Used when `output.bad.type` is kinesis. Optional if it can be resolved with [AWS region provider chain](https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/regions/providers/DefaultAwsRegionProviderChain.html). Region where the bad Kinesis stream is located. |
| output.bad.customEndpoint | Used when `output.bad.type` is kinesis. Optional. Custom endpoint to override AWS Kinesis endpoints, this can be used to specify local endpoints when using localstack. |
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -137,7 +137,7 @@ To disable `ttl` so keys could be stored in cache until job is done `0` valu

#### `ignoreOnError`

When set to `true`, no bad row will be emitted if the API call fails and the enriched event will be emitted without the context added by this enrichment.
When set to `true`, no failed event will be emitted if the API call fails and the enriched event will be emitted without the context added by this enrichment.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Useful change, because a failed event is emitted regardless of the format.


### Data sources

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -171,7 +171,7 @@ A Snowplow enrichment can run many millions of time per hour, effectively launch

#### `ignoreOnError`

When set to `true`, no bad row will be emitted if the SQL query fails and the enriched event will be emitted without the context added by this enrichment.
When set to `true`, no failed event will be emitted if the SQL query fails and the enriched event will be emitted without the context added by this enrichment.

## Examples

Expand Down Expand Up @@ -279,7 +279,7 @@ This single context would be added to the `derived_contexts` array:
{
"SKU": "456",
"prod_name": "Ray-Bans"
}
}
]
}
```
Expand Down
Loading