Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Languages issue #39

Closed
NPavie opened this issue Feb 5, 2024 · 3 comments
Closed

Languages issue #39

NPavie opened this issue Feb 5, 2024 · 3 comments

Comments

@NPavie
Copy link
Contributor

NPavie commented Feb 5, 2024

Issue extracted from Pipeline_App_Issues_Tracker.xlsx

Dipendra : Word to DTBook xml to DAISY 3 with TTS audio is our prefered work flow. Most time wrong language tags are  added in DTBook xml. Thus almost all books do not get created in right language and voice. Users have to manually open DTBook xml to correct language tags for any book to get recorded. Let Pipeline app over-ride language tag of DTBook xml until the Save As DAISY issue on this is resolved.

Lukerya : In many sentences Armenian text was ignored and not recorded. Only numbers were recorded in English voice. In other places Armenian voice was used.
This is happening because of wrong language  tags in the DTBook XML due to an issue in Save As DAISY. Separate issue is being created for this. If language tags are correct, then recording is fine.

@prashantverma2014 if you can gather additionnal informations regarding addin version used by testers and example documents with expected results, i will look at them asap.

@NPavie
Copy link
Contributor Author

NPavie commented Feb 12, 2024

I first thought it was related to the latest changes I added for bidirectionnal element handling but after more carefully reading the issue description, I realize it might have been a latent issue from a long time ago, linked to how the plugin retrieve the document language.

If i understand correctly, even if the authors are setting a different document proofing language in the word ui, the wrong language code is reported.

By default the plugin was looking into the default styles incorporated in the document but the might not how word is storing this "proofing language" information.

I'll do some research on that.

@NPavie
Copy link
Contributor Author

NPavie commented Feb 19, 2024

A small summary of the research on where is stored the "document language" within word

Currently in the plugin conversion process, the main language is extracted from the language associated with the default style in the styles.xml data file within a word document.
One problem noted in some example files is that multiple languages are defined in the corresponding element with 3 different attributes: one in val which seems to be the default, a "bidirectionnal" alternative in bidi and an East Asia alternative in eastAsia

When trying to change the document style with word proofing language selector or with the language button provided in the plugin ribbon, no changes were noticed in this field.

The following actions are planned next:

  • Check the plugin ribbon language button action to see what is changed by it

Based on the results, i think the following options are available

  • If the ribbon button was supposed to change that default style, check if it can be fixed
  • Set a language selector in the conversion UI with detected languages in the document
  • Compute the main language based on how much text is associated with each language

A small note : for each runner with a dedicated language that is not bidirectionnal, a span with the targeted lang should be defined in the result with the correct associated language.

NPavie added a commit to NPavie/word-save-as-daisy that referenced this issue Feb 21, 2024
- Computing a list of language in document, order by a counter of runner using it in the document
- Modified conversion parameters form to propose a list of language code
- Modified xsls to use the new language parameter as document main language if provided
NPavie added a commit that referenced this issue Feb 28, 2024
- For automatic pagination, inline pagenum elements in paragraph (fix #40)
- A new language selector is provided in conversion parameters (fix #39)
  - Document languages are computed by checking the text languages instead of default style, that was not updated by Word language dialogs in tests.
- Fixed a language test for bdo element
@NPavie
Copy link
Contributor Author

NPavie commented Apr 2, 2024

Fixed : a language selector is now provided, with languages computed in order of appearance in the document (if tagged correctly in the document)

@NPavie NPavie closed this as completed Apr 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant