Skip to content

Commit

Permalink
[DSTREAM][DOC] Add documentation for kinesis retry configurations
Browse files Browse the repository at this point in the history
## What changes were proposed in this pull request?

The changes were merged as part of - #17467.
The documentation was missed somewhere in the review iterations. Adding the documentation where it belongs.

## How was this patch tested?
Docs. Not tested.

cc budde , brkyvz

Author: Yash Sharma <[email protected]>

Closes #18028 from yssharma/ysharma/kinesis_retry_docs.
  • Loading branch information
yashs360 authored and brkyvz committed May 18, 2017
1 parent 8fb3d5c commit 92580bd
Showing 1 changed file with 4 additions and 0 deletions.
4 changes: 4 additions & 0 deletions docs/streaming-kinesis-integration.md
Original file line number Diff line number Diff line change
Expand Up @@ -216,3 +216,7 @@ de-aggregate records during consumption.
- If no Kinesis checkpoint info exists when the input DStream starts, it will start either from the oldest record available (`InitialPositionInStream.TRIM_HORIZON`) or from the latest tip (`InitialPositionInStream.LATEST`). This is configurable.
- `InitialPositionInStream.LATEST` could lead to missed records if data is added to the stream while no input DStreams are running (and no checkpoint info is being stored).
- `InitialPositionInStream.TRIM_HORIZON` may lead to duplicate processing of records where the impact is dependent on checkpoint frequency and processing idempotency.

#### Kinesis retry configuration
- `spark.streaming.kinesis.retry.waitTime` : Wait time between Kinesis retries as a duration string. When reading from Amazon Kinesis, users may hit `ProvisionedThroughputExceededException`'s, when consuming faster than 5 transactions/second or, exceeding the maximum read rate of 2 MB/second. This configuration can be tweaked to increase the sleep between fetches when a fetch fails to reduce these exceptions. Default is "100ms".
- `spark.streaming.kinesis.retry.maxAttempts` : Max number of retries for Kinesis fetches. This config can also be used to tackle the Kinesis `ProvisionedThroughputExceededException`'s in scenarios mentioned above. It can be increased to have more number of retries for Kinesis reads. Default is 3.

0 comments on commit 92580bd

Please sign in to comment.