Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SSA/ASS subtitles - Overlapping start/end times and position tag is not handled #6595

Merged
merged 10 commits into from
Dec 5, 2019

Conversation

szaboa
Copy link
Contributor

@szaboa szaboa commented Oct 29, 2019

Pull request for #6320.

Copy link
Collaborator

@icbaker icbaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution!

The position stuff looks good, just a few small comments.

I've added a larger comment about the general approach to the handling of overlapping start/end times. It's an awkward bit of algorithm/logic to get right (and in the future if we need to implement it a third time we might try and abstract it into a common utility class that handles all the overlapping resolution etc.).

On the subject of testing:
I think it probably makes sense to test at the level of SsaDecoder (rather than directly testing SsaSubtitle for example). It's probably worth adding my "equal start/end times" case as one of your tests :) As well as nested subtitles too:

[3, 7] -> "A"
[4, 5] -> "B"

i++;
} while (i != endTimeIndex);
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could be wrong, but I'm not convinced this correctly handles multiple cues that have the same start or end time.
e.g.

[3, 5] -> "A"
[4, 5] -> "B"

We'll insert the first one, after which we have: cueTimesUs=[3, 5], startTimeIndex=0 & endTimeIndex=1 and cues=[["A"], []].

Then insert the second one: cueTimesUs=[3, 4, 5, 5], startTimeIndex=1 & endTimeIndex=3 and cues=[["A"], ["A", "B"], ["B"], []]

Whereas I think with this input we'd want these lists: cueTimesUs=[3, 4, 5] and cues=[["A"], ["A", "B"], []]

It also took quite a lot of thought for me to follow that through, especially with the multiple mutations to the cue list on each iteration of the outer loop.

Note that we have this same overlapping challenge in the webvtt package and we actually solve it in a slightly different way, by more lazily evaluating Subtitle#getCues (every call to that iterates over all the subtitles we have) [1]. I chatted a bit to the team, and that seems unfortunately inefficient, so I think it makes sense to keep the logic here and not copy webvtt, but maybe we can correct it and make it a bit easier to follow.

My suggestion is to get rid of insertToCueTimes() and do something more like (I haven't tested this, it might have other problems...):

  • (binary?) search for startTimeUs in cueTimesUs
    • if startTimeUs is already there, then get the matching list (by index) from cues and add cue to it.
    • else insert startTimeUs to cueTimesUs and insert a new matching list to cues (containing all the Cues from index - 1 plus cue).
  • Walk through cueTimesUs, adding cue to every entry matching entry in cues until you find a time that's either equal to or greater than endTimeUs (mostly your existing do/while loop)
    • On each step, store a reference to the matching list of cues before you add cue. (This reference should also store the list from the else in the first bullet before cue is added)
    • If the time you stopped on is equal to endTimeUs, then do nothing (the cues list already has the correct 'end' value, right?)
    • If it's greater, then insert a new cues list equal to the most recent list you stored at the top of this sub-section.

[1]

@@ -226,4 +285,15 @@ public static long parseTimecodeUs(String timeString) {
return timestampUs;
}

@Nullable
public static Pair<Float, Float> parsePosition(String line){
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might make more sense to use android.graphics.PointF here because it's a little less general/ambiguous than Pair<Float, Float>

Also avoids auto-boxing

@@ -98,6 +100,12 @@ protected Subtitle decode(byte[] bytes, int length, boolean reset) {
private void parseHeader(ParsableByteArray data) {
String currentLine;
while ((currentLine = data.readLine()) != null) {
if (currentLine.startsWith("PlayResX:")) {
playResX = Integer.valueOf(currentLine.substring(9).trim());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a tiny bit clearer to use the string literal again instead of a 'magic' length (saves a future reader manually counting the number of characters in "PlayResX:" :))

playResX = Integer.valueOf(currentLine.substring("PlayResX:".length()).trim());

@@ -196,16 +204,67 @@ private void parseDialogueLine(String dialogueLine, List<Cue> cues, LongArray cu
}
}

// Parse \pos{x,y} attribute
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'd move this comment to the javadoc of parsePosition

@@ -50,6 +53,9 @@
private int formatEndIndex;
private int formatTextIndex;

private int playResX;
private int playResY;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably safer to explicitly initialise these to an invalid value to clearly indicate 'unset', because 0 seems like a potentially genuine value we could see in a subtitle file? And then also update the comparison below on L216.

We have C.LENGTH_UNSET which I think would work well.
https://github.com/google/ExoPlayer/blob/release-v2/library/core/src/main/java/com/google/android/exoplayer2/C.java

}

@Override
public List<Cue> getCues(long timeUs) {
int index = Util.binarySearchFloor(cueTimesUs, timeUs, true, false);
if (index == -1 || cues[index] == Cue.EMPTY) {
if (index == -1 || cues.get(index).isEmpty()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can get rid of the empty check, since we'll just return the empty list below anyway (right?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's right.

Comment on lines 1 to 5
<component name="InspectionProjectProfileManager">
<profile version="1.0">
<option name="myName" value="Project Default" />
</profile>
</component>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleted in commit fb2a702 :)

@szaboa
Copy link
Contributor Author

szaboa commented Oct 30, 2019

Applied the suggested changes (thank you!) except the one with same start or end time. I'll follow up on that tomorrow :)

@szaboa
Copy link
Contributor Author

szaboa commented Nov 3, 2019

I've pushed the latest changes.

Testing:
Couldn't test the code strictly on decoder level, as the parseDialogueLine is private, that's why I verified the cue times and length of cues from the decoded subtitle. Also needed to correct the "no endtime" test case, with this overlapping mechanism that use case behaves differently from now on (it carries over the previous subtitles). It is ok like this?

Overlapping challenge:

I could be wrong, but I'm not convinced this correctly handles multiple cues that have the same start or end time.

Yes, the previous implementation resulted redundant cues and cueTimes in this case, visually it was fine though.

I agree that we shouldn't add redundant things, and the algorithm you've described is more clear and bit more optimal (no need to search ahead where endTimeUs fits).

I've tried to implement it following that approach (a lot of trial and error) but couldn't really succeed because of the corner cases (e.g. handling first cue, endTimeUs is greater than all of the times, same startTime/endTime, storing reference of cues - to later add it - means we need to inspect the next+1 cueTime, not just the next one etc.) so decided to go back to the first approach and just correct the same startTime/endTime problem.

What do you think?

@icbaker
Copy link
Collaborator

icbaker commented Nov 5, 2019

Looks good, thanks! I'll work on getting this merged.

@szaboa
Copy link
Contributor Author

szaboa commented Nov 5, 2019

Great, let me know if any further changes are needed.

@icbaker
Copy link
Collaborator

icbaker commented Nov 11, 2019

Just to keep you posted, I haven't forgotten about this :)

It turns out it's a little tricky to merge this while also supporting the 'blank' end timecode behaviour currently in SsaDecoder (where the intention is that the line appears only until the next line...). I've chatted with the team, and we're likely to remove the blank end timecode 'feature' as it's not really supported by the spec afaict.

Once that support is removed, I'll be able to merge this more easily.

@szaboa
Copy link
Contributor Author

szaboa commented Nov 11, 2019

Sure, there's no hurry :)

ojw28 pushed a commit that referenced this pull request Nov 15, 2019
SSA spec allows the lines in any order, so they must all have an end time:
http://moodub.free.fr/video/ass-specs.doc

The Matroska write-up of SubRip assumes the end time is present:
https://matroska.org/technical/specs/subtitles/srt.html

This will massively simplify merging issue:#6595

PiperOrigin-RevId: 279926730
@MurtadhaS
Copy link

Hi guys, any news regarding this PR?

icbaker added a commit that referenced this pull request Dec 5, 2019
@icbaker icbaker merged commit 3f5654a into google:dev-v2 Dec 5, 2019
@icbaker
Copy link
Collaborator

icbaker commented Dec 5, 2019

Just merged it in to dev-v2 (with fairly significant changes, but the functionality originally proposed should all be there).

ojw28 pushed a commit that referenced this pull request Dec 6, 2019
@google google locked and limited conversation to collaborators Feb 4, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants