[SPARK-23033][SS][Follow Up] Task level retry for continuous processing #20675

xuanyuanking · 2018-02-26T07:33:15Z

What changes were proposed in this pull request?

Here we want to reimplement the task level retry for continuous processing, changes include:

Add a new EpochCoordinatorMessage named GetLastEpochAndOffset, it is used for getting last epoch and offset of particular partition while task restarted.
Add function setOffset for ContinuousDataReader, it supported BaseReader can restart from given offset.

How was this patch tested?

Add new UT in ContinuousSuite and new StreamAction named CheckAnswerRowsContainsOnlyOnce for more accurate result checking.

xuanyuanking · 2018-02-26T07:45:33Z

cc @tdas and @jose-torres
#20225 gives a quickly fix for task level retry, this is just an attempt for a maybe better implementation. Please let me know if I do something wrong or have misunderstandings of Continuous Processing. Thanks :)

SparkQA · 2018-02-26T08:05:01Z

Test build #87665 has finished for PR 20675 at commit 21f574e.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

xuanyuanking · 2018-02-26T08:07:36Z

retest this please

SparkQA · 2018-02-26T11:29:35Z

Test build #87666 has finished for PR 20675 at commit 21f574e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

jose-torres

I think this does work, but the outstanding question is whether we should support task-level retry for continuous processing at all. My instinct is that we should not, for a few reasons:

The semantics aren't quite right. Task-level retry can happen a fixed number of times for the lifetime of the task, which is the lifetime of the query - even if it runs for days after, the attempt number will never be reset.
We end up with two divergent ways for continuous readers to get their starting offset, through the constructor and through the GetLastEpochAndOffset RPC.
We have to complicate the data source API.

What I think we should do instead is detect when a task retry would have happened, convert it to a global retry in ContinuousExecution, and impose some kind of time-aware (e.g. N per day) limit. Since every task is checkpointing and retries are expected to be rare, making the other tasks restart too shouldn't cause problems.

jose-torres · 2018-02-26T17:10:10Z

sql/core/src/test/scala/org/apache/spark/sql/streaming/continuous/ContinuousSuite.scala

+        IncrementEpoch(),
+        // Check the answer exactly, if there's duplicated result, CheckAnserRowsContains
+        // will also return true.
+        CheckAnswerRowsContainsOnlyOnce(scala.Range(0, 20).map(Row(_))),


Checking exact answer can just be CheckAnswer(0 to 20: _*).

Actually I firstly use CheckAnswer(0 to 19: _*) here, but I found the test case failure probably because the CP maybe not stop between Range(0, 20) every time. See the logs below:

== Plan == == Parsed Logical Plan == WriteToDataSourceV2 org.apache.spark.sql.execution.streaming.sources.MemoryStreamWriter@6435422d +- Project [value#13L] +- StreamingDataSourceV2Relation [timestamp#12, value#13L], org.apache.spark.sql.execution.streaming.continuous.RateStreamContinuousReader@5c5d9c45 == Analyzed Logical Plan == WriteToDataSourceV2 org.apache.spark.sql.execution.streaming.sources.MemoryStreamWriter@6435422d +- Project [value#13L] +- StreamingDataSourceV2Relation [timestamp#12, value#13L], org.apache.spark.sql.execution.streaming.continuous.RateStreamContinuousReader@5c5d9c45 == Optimized Logical Plan == WriteToDataSourceV2 org.apache.spark.sql.execution.streaming.sources.MemoryStreamWriter@6435422d +- Project [value#13L] +- StreamingDataSourceV2Relation [timestamp#12, value#13L], org.apache.spark.sql.execution.streaming.continuous.RateStreamContinuousReader@5c5d9c45 == Physical Plan == WriteToDataSourceV2 org.apache.spark.sql.execution.streaming.sources.MemoryStreamWriter@6435422d +- *(1) Project [value#13L] +- *(1) DataSourceV2Scan [timestamp#12, value#13L], org.apache.spark.sql.execution.streaming.continuous.RateStreamContinuousReader@5c5d9c45 ScalaTestFailureLocation: org.apache.spark.sql.streaming.StreamTest$class at (StreamTest.scala:436) org.scalatest.exceptions.TestFailedException: == Results == !== Correct Answer - 20 == == Spark Answer - 25 == !struct<value:int> struct<value:bigint> [0] [0] [10] [10] [11] [11] [12] [12] [13] [13] [14] [14] [15] [15] [16] [16] [17] [17] [18] [18] [19] [19] [1] [1] ![2] [20] ![3] [21] ![4] [22] ![5] [23] ![6] [24] ![7] [2] ![8] [3] ![9] [4] ! [5] ! [6] ! [7] ! [8] ! [9] == Progress == StartStream(ContinuousTrigger(3600000),org.apache.spark.util.SystemClock@343e225a,Map(),null) AssertOnQuery(<condition>, ) AssertOnQuery(<condition>, ) AssertOnQuery(<condition>, ) AssertOnQuery(<condition>, ) AssertOnQuery(<condition>, ) AssertOnQuery(<condition>, ) => CheckAnswer: [0],[1],[2],[3],[4],[5],[6],[7],[8],[9],[10],[11],[12],[13],[14],[15],[16],[17],[18],[19] StopStream

Ah, right, my bad.

jose-torres · 2018-02-26T18:19:46Z

Now that I think about it, we may eventually need a way to set the starting partition offset after creation for other reasons, so I'm less confident in those second and third reasons. But on the whole I still think converting to global restarts makes sense.

xuanyuanking · 2018-02-27T06:49:12Z

Great thanks for your detailed reply!

The semantics aren't quite right. Task-level retry can happen a fixed number of times for the lifetime of the task, which is the lifetime of the query - even if it runs for days after, the attempt number will never be reset.

I think the attempt number never be reset is not a problem, as long as the task start with right epoch and offset. Maybe I don't understand the meaning of the semantics, could you please give more explain?
As far as I'm concerned, while we have a larger parallel number, whole stage restart is a too heavy operation and will lead a data shaking.
Also want to leave a further thinking, after CP support shuffle and more complex scenario, task level retry need more work to do in order to ensure data is correct. But it maybe still a useful feature? I just want to leave this patch and initiate a discussion about this :)

jose-torres · 2018-02-27T18:02:44Z

It's not semantically wrong that the attempt number is never reset; it just means that for very long-running streams task restarts will eventually run out.

You make a good point that in high parallelism cases we might need to be able to restart only a single task, although I think we'd still need query-level restart on top of that. But if you're worried that the current implementation of task restart will become incorrect as more complex scenarios are supported, I'd definitely lean towards deferring it until continuous processing is more feature-complete.

I was working on getting basic aggregation working, and I think we definitely will need some kind of setOffset-like functionality. Do you want to spin that off into a separate PR? (I can handle it otherwise.)

jose-torres · 2018-02-27T18:07:34Z

...ore/src/main/java/org/apache/spark/sql/sources/v2/reader/streaming/ContinuousDataReader.java

+     *
+     * @param offset last offset before task retry.
+     */
+    default void setOffset(PartitionOffset offset) {


I think it might be better to create a new interface ContinuousDataReaderFactory, and implement this there as something like createDataReaderWithOffset(PartitionOffset offset). That way the intended lifecycle is explicit.

Cool, that's more clearer.

xuanyuanking · 2018-02-28T07:45:11Z

it just means that for very long-running streams task restarts will eventually run out.

Ah, I know your means. Yeah, if we support task level retry we should also set the task retry number unlimited.

But if you're worried that the current implementation of task restart will become incorrect as more complex scenarios are supported, I'd definitely lean towards deferring it until continuous processing is more feature-complete.

Yep, the "complex scenarios" I mentioned mainly including shuffle and aggregation scenario like comments in https://issues.apache.org/jira/browse/SPARK-20928?focusedCommentId=16245556&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16245556, in those scenario maybe task level retry should consider epoch align, but current implementation of task restart is completed for map-only continuous processing I think.

Agree with you about deferring it, so I just leave a comment in SPARK-23033 and close this or you think this should reviewed by others?

Do you want to spin that off into a separate PR? (I can handle it otherwise.)

Of cause, #20689 added a new interface ContinuousDataReaderFactory as our comments.

…rtOffset ## What changes were proposed in this pull request? As discussion in #20675, we need add a new interface `ContinuousDataReaderFactory` to support the requirements of setting start offset in Continuous Processing. ## How was this patch tested? Existing UT. Author: Yuanjian Li <[email protected]> Closes #20689 from xuanyuanking/SPARK-23533.

…rtOffset ## What changes were proposed in this pull request? As discussion in apache#20675, we need add a new interface `ContinuousDataReaderFactory` to support the requirements of setting start offset in Continuous Processing. ## How was this patch tested? Existing UT. Author: Yuanjian Li <[email protected]> Closes apache#20689 from xuanyuanking/SPARK-23533.

HeartSaVioR · 2018-07-30T04:42:48Z

Looks like the patch is outdated, and when continuous query supports shuffled stateful operators, implementing task level retry is not that trivial. To get correct result of aggregation, when one of task fails at epoch N, all the tasks and states should be restored to epoch N.

I definitely agree that it would be ideal to have stable task level retry, just wondering this patch would work with follow-up tasks for continuous mode.

xuanyuanking · 2018-11-05T03:26:01Z

@HeartSaVioR Thanks for your reply, sorry for just seen your comment. Yep, will keep tracking this feature after we supports shuffled stateful operators.

…rtOffset ## What changes were proposed in this pull request? As discussion in apache#20675, we need add a new interface `ContinuousDataReaderFactory` to support the requirements of setting start offset in Continuous Processing. ## How was this patch tested? Existing UT. Author: Yuanjian Li <[email protected]> Closes apache#20689 from xuanyuanking/SPARK-23533. RB=1844647 G=superfriends-reviewers R=mshen,fli,zolin,yezhou,latang A=

[SPARK-23033][SS][Follow Up] Task level retry for continuous processing

21f574e

jose-torres reviewed Feb 26, 2018

View reviewed changes

jose-torres reviewed Feb 27, 2018

View reviewed changes

xuanyuanking mentioned this pull request Feb 28, 2018

[SPARK-23533][SS] Add support for changing ContinuousDataReader's startOffset #20689

Closed

xuanyuanking closed this Nov 5, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-23033][SS][Follow Up] Task level retry for continuous processing #20675

[SPARK-23033][SS][Follow Up] Task level retry for continuous processing #20675

xuanyuanking commented Feb 26, 2018

xuanyuanking commented Feb 26, 2018

SparkQA commented Feb 26, 2018

xuanyuanking commented Feb 26, 2018

SparkQA commented Feb 26, 2018

jose-torres left a comment

jose-torres Feb 26, 2018

xuanyuanking Feb 27, 2018

jose-torres Feb 27, 2018

jose-torres commented Feb 26, 2018

xuanyuanking commented Feb 27, 2018

jose-torres commented Feb 27, 2018

jose-torres Feb 27, 2018

xuanyuanking Feb 28, 2018

xuanyuanking commented Feb 28, 2018

HeartSaVioR commented Jul 30, 2018

xuanyuanking commented Nov 5, 2018

[SPARK-23033][SS][Follow Up] Task level retry for continuous processing #20675

[SPARK-23033][SS][Follow Up] Task level retry for continuous processing #20675

Conversation

xuanyuanking commented Feb 26, 2018

What changes were proposed in this pull request?

How was this patch tested?

xuanyuanking commented Feb 26, 2018

SparkQA commented Feb 26, 2018

xuanyuanking commented Feb 26, 2018

SparkQA commented Feb 26, 2018

jose-torres left a comment

Choose a reason for hiding this comment

jose-torres Feb 26, 2018

Choose a reason for hiding this comment

xuanyuanking Feb 27, 2018

Choose a reason for hiding this comment

jose-torres Feb 27, 2018

Choose a reason for hiding this comment

jose-torres commented Feb 26, 2018

xuanyuanking commented Feb 27, 2018

jose-torres commented Feb 27, 2018

jose-torres Feb 27, 2018

Choose a reason for hiding this comment

xuanyuanking Feb 28, 2018

Choose a reason for hiding this comment

xuanyuanking commented Feb 28, 2018

HeartSaVioR commented Jul 30, 2018

xuanyuanking commented Nov 5, 2018