Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve robustness when adaptive playbacks encounter unexpected discontinuities #7030

Closed
stevemayhew opened this issue Feb 28, 2020 · 16 comments

Comments

@stevemayhew
Copy link
Contributor

stevemayhew commented Feb 28, 2020

Issue description

ExoPlayer freezes playback in Player.STATE_BUFFERING forever when faced with un-marked discontinuities in the stream.

Other players (Safari/QuickTime, VisualOn, Hls.js) recover from this to varying degrees (Safari looses A/V sync, not desirable, but at least the evidence is more clear the problem is with the stream not the player.

I'm not suggesting ExoPlayer's design of insisting on streams that conform to the HLS Draft is incorrect, just that best effort should be made to report the playback error without freezes.

Reproduction steps

Play the sample HLS stream through the Verizon commercial, starting at 20 seconds in, ExoPlayer freezes the last frame in the OR scene before the Cheerios commercial. Note, the stream has continuity issues around the two ad insertions, that is clear.

Actual Behavior

ExoPlayer freezes indefinitely in Player.STATE_BUFFERING with plenty of buffered segments.

Expected Behavior

In order of decreasing desirability, one of the following:

  1. Recover from the error by introducing a discontinuity internally (split the segment and create a new TimestampAdjuster linked to the media time)
  2. Detect the error during playback (one or more Renderer will be stuck with Renderer.isReady() false but playback is fully buffered.
  3. Detect the error at load and throw a load exception (simplest course of action, this is in the pull request)

Link to test content

The content was mailed to [email protected]

A full bug report captured from the device

Again emailed to [email protected]

Version of ExoPlayer being used

Version 2.10.4, with the pull request code. Note multiple versions of ExoPlayer all reproduce this.
The pull request for the suggested fix is: #7034

Device(s) and version(s) of Android being used

Multiple devices, the bug-report is from an Arris vip6102w running Android P.

@phhusson
Copy link
Contributor

This seems to be related to #6983

@stevemayhew
Copy link
Contributor Author

stevemayhew commented Feb 28, 2020 via email

@ojw28
Copy link
Contributor

ojw28 commented Mar 1, 2020

[Internal note: Content emailed to dev.exoplayer@ for this issue refs 7029 rather than 7030]

@ojw28
Copy link
Contributor

ojw28 commented Mar 1, 2020

Am I correct in saying that the types of discontinuity you're seeing do not necessarily occur at segment boundaries? It seems that way, both from the sample provided and the comment above about splitting the segment. Just double checking!

@stevemayhew
Copy link
Contributor Author

Yes, that is correct. The example shows not only PTS resets twice, but the sound was incorrectly spliced (audio on the Verizon ad ends early so the audio for the Cheerios ad overlaps it). This is where Safari appears to try and 'fix things', I would guess by re-syncing media time to the video from sound, so sound lags from then on.

A better fix is to drop the remainder of the bad segment (what the exception does) and re-sync playback after the error.

ffplay reports the error in segment 263346109.tsv:

[mpegts @ 0x7fe0eb000000] DTS discontinuity in stream 0: packet 37 with DTS 124104395, packet 38 with DTS 8590019591
[mpegts @ 0x7fe0eb000000] pid=201 pes_code=0x1e0
[mpegts @ 0x7fe0eb000000] pid=201 pes_code=0x1e0B sq=    0B f=0/0   
...

So, seeking past the end of the Verizon ad (segment 263346110.tsv) and thus reseting the TimestampAdjuster at that point would playback with perfect A/V sync from then on.

Once we have a solution we can provide many test cases for it ;-). The ProBowl broadcast alone has a half dozen issues like this.

Again we wholly agree this is a stream/origin server issue, some origin servers solve this better than others, some simply drop content to solve it.

A solution where ExoPlayer:

  1. Allowed the client (UI) to skip past the bad content
  2. Reported the actual segments with issues (so the origin servers that care to do so can fix it)

Would be welcomed. We are grateful for your interest and happy to contribute however we can.

@stevemayhew
Copy link
Contributor Author

To skip the balance of the bad segment (with the reported intra-extractor discontinuity) we would:

  1. throw the exception and discard the remainder of the samples
  2. mark the next chunk as requiring a new Extractor / TimestampAdjuster, this resets the A/V sync to the new PTS sequence

Looking at the TimestampAdjusterProvider this is currently keyed by DSN. So this idea would require a separate indication on the chunk and storing the TimestampAdjuster by media sequence or some other ID.

One question I have, why are the TimestampAdjuster's saved? Is this avoid having the wait() block loads for the time master to initializes the first timestamp when you seek back to a segment? Also, when the playlist DSN changes (on reload of a live) it would seem you can certainly discard any adjusters that are lower sequence numbers (as these segments are gone from the playlist)

@stevemayhew
Copy link
Contributor Author

Here is the proposed solution. The detection part of the solution handles the issue @ojw28 pointed out here: #7034 (comment) as the timestamps are compared to the range of the segment itself.

Since GitHub does not support embedding images in markdown, here is a PDF: https://drive.google.com/file/d/1dT64gp8ES7tPYX_dEErvuHhb0XfY9a8X/view?usp=sharing

Also, the Markdown source, if you want to edit it or use it for anything else feel free. https://drive.google.com/file/d/1FJnIUC7Z-3lDAagntNH7dkeuxqQ1aFHg/view?usp=sharing

@krocard
Copy link
Contributor

krocard commented Mar 9, 2020

@ojw28, assigned to you as you seem to have already looked into it. Feel free to reassigned.

@ojw28
Copy link
Contributor

ojw28 commented Mar 10, 2020

Hi. I took a look at the proposal. It looks sensible! What's the current status of the pull request? Is it still being worked on?

@stevemayhew
Copy link
Contributor Author

stevemayhew commented Mar 11, 2020

@ojw28 Yes, I've coded the first pass at the design (link above) and am testing with our various streams and origin servers.

If you catch the exception and 'hack' the timestamp adjuster as follows:

  private void feedDataToExtractor(
      DataSource dataSource, DataSpec dataSpec, boolean dataIsEncrypted)
      throws IOException, InterruptedException {
...
    try {
      ExtractorInput input = prepareExtraction(dataSource, loadDataSpec);
      if (skipLoadedBytes) {
        input.skipFully(nextLoadPosition);
      }
      try {
        int result = Extractor.RESULT_CONTINUE;
        while (result == Extractor.RESULT_CONTINUE && !loadCanceled) {
          result = extractor.read(input, /* seekPosition= */ null);
        }
      } catch (UnreportedDiscontinuityException e) {
        Log.d(TAG, "Unreported discontinuity at timeUs: "+ e.timesUs + " uri: " + dataSpec.uri);

        timestampAdjuster.reset();
        timestampAdjuster.setFirstSampleTimestampUs(TimestampAdjuster.DO_NOT_OFFSET);
      } finally {
        nextLoadPosition = (int) (input.getPosition() - dataSpec.absoluteStreamPosition);
      }

Then the sample "Pro Bowl" video I sent plays through. Note the stream duration does not match the playlist duration (as samples are discarded) but it plays out in perfect A/V sync!

Next bit I would appreciate input is how we should integrate this into ExoPlayer in a way that:

  1. Clients could selectively enable it (eg maybe, mode == "pedantic", specify the tolerance on the time values?).
  2. Where recovery logic should go (if any)? My thought would be the LoadErrorAction allowing the client to select hard playback fail vs some form of recovery (skipping content and reseting timestamp adjustment)

At this point, I think the method and algorithm are solid, just how to mainstream it with no risk to existing ExoPlayer.

@ojw28
Copy link
Contributor

ojw28 commented Mar 12, 2020

(1) is just plumbing, so we can sort it out at the end if we need it. I'm unsure about where best to put (2). Ideally we'll come up with a recovery mechanism that's safe enough to be enabled by default, and robust enough so that we don't need to provide app developers with the ability to customize.

@ojw28 ojw28 changed the title ExoPlayer stalls forever with unreported discontinuity Improve robustness when adaptive playbacks encounter unexpected discontinuities Mar 12, 2020
@stevemayhew
Copy link
Contributor Author

Thanks for the feedback @ojw28 I'll make the changes today. My current thinking for (2) is to use load error reporting to allow the UI client to choose to make it fatal or not. The sample code in the comment above should be a safe recovery, log it and reset the timestampAdjuster, the only downside is there is no report up to the UI client that content has disappeared, so the load error report would cover that.

@ojw28
Copy link
Contributor

ojw28 commented Mar 12, 2020

I do agree that there should be an event surfaced to indicate to the client that an unexpected discontinuity was encountered, but I don't think the client should get to respond on a per-error basis whether to recover or not. We should probably just automatically recover (if the client wants to stop the player upon receiving the event, then it'll still be possible for them to do that with this setup). If needed, we can provide a flag to turn automatic recovery on and off (but not on a per-error basis).

@stevemayhew
Copy link
Contributor Author

Sounds good... I'll have your comments all addressed and a checkin later today you can look at (I'm US PST).

@ojw28 ojw28 assigned tonihei and unassigned ojw28 Jul 12, 2022
@ojw28
Copy link
Contributor

ojw28 commented Jul 12, 2022

I'm a little unclear where this ended up, since the pull request ref'd at the top of the issue was merged. Given this hasn't received updates for quite some time, I'll go ahead and close this. If there's still remaining work to be tracked, it's probably going to be clearer to file a fresh issue.

@ojw28 ojw28 closed this as completed Jul 12, 2022
@stevemayhew
Copy link
Contributor Author

stevemayhew commented Jul 12, 2022 via email

@google google locked and limited conversation to collaborators Sep 11, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants