Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Robustness of Daisy3 SMIL generation #67

Open
rom1mouret opened this issue Dec 2, 2014 · 4 comments
Open

Robustness of Daisy3 SMIL generation #67

rom1mouret opened this issue Dec 2, 2014 · 4 comments
Assignees

Comments

@rom1mouret
Copy link
Collaborator

We are in trouble with this kind of structure:

<sent id="sent3">begin sentence <pagenum id="pn1">42</pagenum> after</sent>

DAISY3 specs expect the pagenum to have a SMIL counterpart. But if sent3 is synthesized, the current script will create a <par> node in the SMIL that links back to sent3. Yet <par> nodes can't have any children. In particular, they can't have the pagenum as child.

So far it worked because the NLP scripts transform the structure into:

<sent id="sent3">
    <span id="b1">begin sentence</span>
    <pagenum id="pn1">42</pagenum>
    <span id="a1">after</span>
</sent>

So the Pipeline can have a different <par> for b1, pn1 and a1, though depending on NLP is not a robust approach, especially if NLP scripts are mocked for testing purposes.

When this kind of structure is detected, the Daisy3 SMIL generation algorithm should ignore the audio attached to sent3 and deletes the corresponding mp3 from the audio-map (otherwise it will be copied to the output dir). Alternatively, perhaps it is okay with the specs to place the pagenum after the sent in the SMIL file.

@josteinaj
Copy link
Member

If NLP is not robust enough, could you instead wrap text siblings in a span (or p depending on context) in a preprocessing XSLT step?

If you have to place the pagenums after the sentence then I doubt it would be a problem. We often move pagebreaks to the end of sentences when producing DTBooks. I'm not sure what our narrators do when pagebreaks occurs in the middle of a sentence but I can check.

@rom1mouret rom1mouret self-assigned this Dec 2, 2014
@rom1mouret
Copy link
Collaborator Author

Yes, I can wrap text siblings in case NLP didn't do it.

I don't mind moving the pagenums. I think we are allowed to move them in the SMIL files only, thereby leaving the DTBook unchanged, but I have to check. Moving the noterefs is more questionable.

@josteinaj
Copy link
Member

For the record, I asked Roald here at NLB who produces narrated books about what we do in these cases. Where the pagebreak and noterefs are read aloud is something that narrators have to use their own judgement to decide in each individual case.

The pagebreaks are almost never read aloud mid-sentence. Usually they are moved to after the sentence, but often they are also moved to before the sentence (there's no clear rule). They can also be moved to after the paragraph, if it makes sense based on the content (and I assume they make similar judgements when the pagebreaks occurs within figures, tables, etc).

For noterefs they seem to usually either read the note at the location of the noteref or at the end of the sentence where the noteref occurs, depending on what the content calls for. Endnotes are read only as noterefs it seems, while the notes themselves are read together with the other endnotes at the end.

I believe there would be less "judgement calls" and more strict guidelines if there was widespread support for skippable content in reading systems (but I'm not sure).

Anyway, I don't know if this helps your issue at all as we don't encounter the SMIL problem you describe (we almost exclusively do audio-only books, what full-text we have is still experimental).

@rom1mouret
Copy link
Collaborator Author

Thanks. This is good to know.
In normal circumstances, NLP does the job and the TTS engines will use SSML marks to find where the noterefs are located in the audio files, so I think I'll choose the less costly option that complies with the specs, even if it is not perfect with regard to end users' expectations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants