Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update 4.1 Executor 端长时容错详解.md #52

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -337,7 +337,7 @@ override def next(): ByteBuffer = synchronized {
- 由 Spark Streaming 直接管理 offset —— 可以给定 offset 范围,直接去 Kafka 的硬盘上读数据,使用 Spark Streaming 自身的均衡来代替 Kafka 做的均衡
- 这样可以保证,每个 offset 范围属于且只属于一个 batch,从而保证 exactly-once

这里我们以 Direct 方式为例,详解一下 Spark Streaming 在源头数据实效后,是如果从上游重放数据的。
这里我们以 Direct 方式为例,详解一下 Spark Streaming 在源头数据失效后,是如果从上游重放数据的。

这里的实现分为两个层面:

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -263,7 +263,7 @@ writeStream
| **Kafka** | ![negative](1.imgs/negative.png) | 已支持 | Kafka 目前不支持幂等写入,所以可能会有重复写入<br/>(但推荐接着 Kafka 使用 streaming de-duplication 来去重) |
| **ForeachSink** (自定操作不幂等) | ![negative](1.imgs/negative.png) | 已支持 | 不推荐使用不幂等的自定操作 |

这里我们特别强调一下,虽然 Structured Streaming 也内置了 `console` 这个 Source,但其实它的主要用途只是在技术会议/讲座上做 demo,不应用于线上生产系统。
这里我们特别强调一下,虽然 Structured Streaming 也内置了 `console` 这个 Sink,但其实它的主要用途只是在技术会议/讲座上做 demo,不应用于线上生产系统。

## 参考资料

Expand Down