Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add plural forms for needed strings in NVDA's code base #15864

Merged
merged 22 commits into from
Dec 8, 2023

Conversation

CyrilleB79
Copy link
Collaborator

@CyrilleB79 CyrilleB79 commented Nov 30, 2023

Link to issue number:

Closes #12445
Follow-up of #15013.

Summary of the issue:

When possible NVDA needs to distinguish between singular and plural in UI messages. The impact may be more significant in languages such as Slavic ones, where there are more than one plural form.

Description of user facing changes

In NVDA UI messages, i.e. in GUI or when NVDA speaks or brailles messages, messages will use singular/plural form as needed.

Description of development approach

  • Look for all strings containing "%" or "{" in the .pot and check if a plural form is needed. And replace _ / pgettext by ngettext / npgettext where needed.
  • Some expression containing a number have been pluralized even if they do not differ between singular and plural form in English due to the difference being visible in translations due to a different grammar. E.g.: "{numFingers} finger {action}" in English to be translated to "{action} {numFingers} doigt" (singular) vs "{action} {numFingers} doigts" (plural) in French.
  • table-rowcount and table-columncount control fields of the virtual buffer's text info have had to be converted to integer (previously strings), so that computation can be done.
  • No change log needed except for the code's API breaking change.

Testing strategy:

  • Manual smoke test on a subset of strings.
  • Feedback from translators during translation period and communication with the translators mailing list

Known issues with pull request:

No conversion of foobar messages

Regarding the foobar appModule which uses ``utils.localisation.TimeOutputFormat`, it manages plural as follows:

  • "{D} day {H}:{M}:{S}"
  • "{D} days {H}:{M}:{S}"
    That does not match needs of languages having more than one plural form.

I have not modified utils.localisation.TimeOutputFormat not foobar appModule however since it is not straight forward at all; I am not sure that making TimeOutputFormat class inheriting from DisplayStringEnum was a good idea.
Also there would be more things to change to how time is reported in foobar:

  • "3 remaining" would be reported for 3 seconds remaining; it's not very clear.
  • "1:0:4 remaining" is not parsed correctly by many synthesizers (OneCore, eSpeak, IBMTTS) due to single digit used where two would be needed by the synthesizer to recognize a time format
  • ":44" reports "3 44 remaining" in English, which is not very clear and it even reports "3 hours 44 remaining" in French instead of "3 minutes 44 remaining" with eSpeak or OneCore.

Since utils.localisation.TimeOutputFormat is only used in foobar, since tracks lasting more than 24 hours are very rare and since more work needs to be done in utils.localisation.TimeOutputFormat anyway, I have not included changes in utils.localisation.TimeOutputFormat or foobar appModule in this PR.

Declension matching issue

In some languages, there may still be issues in translation for messages such as "list with %s item" due to declension. Indeed, when encountering a number written with digits, the synth does not know which case should be used to pronounce it and usually uses nominative case. This cause something strange because the number (nominative case) and the noun (instrumental case) do not match whereas they should match, i.e. the number should be declensed to instrumental case too. However synth cannot handle the complexity of this correctly.

It has been discussed in this thread of the translators mailing list. And it has been agreed that such situations should be handled by translators through work-around in their translations.
E.g. maybe translate to "list, 5 items" instead of "list with 5 items".

Open questions

Here are various expressions that I have not converted yet. The relevance of converting them is to be discussed with reviewers as well as with translators on the mailing list. If a language needs one of them to be converted, please inform me and I will do it.

pt

Even if in English the abbreviation is invariable (e.g. "1 pt", "3 pt" without "s"), some language may have translated it as a full word, i.e. translation of "point" instead of "pt".

  • "%s pt"
  • "at least %.1f pt"
  • "Cell width: {0.x:.1f} pt"
  • "Cell height: {0.y:.1f} pt"
  • "exactly {space:.1f} pt"
  • "{offset:.3g} pt"

Abbreviations of measurement units

  • "{val:.2f} in"
  • "{val:.2f} cm"
  • "%s px"
  • "%s em"
  • "%s ex"
  • "%s rem"

Are there languages where a plural form is used for these measurement units? E.g. "8 cms" vs "1 cm"
For the last one which are HTML character size specification ("ex", "em", "rem"), I doubt there is any difference at all between English and translation.

Percent (full word)

Message where percent is written as a word ("percent" or "per cent") and not as a symbol ("%")

  • "%d percent"
  • "Object edges positioned {left:.1f} per cent from left edge of screen, "
  • " fraction {fractionValue:.2f} Percent slice {pointIndex} of {pointCount}"

Are there languages where "percent" / "per cent" is used? If yes, can the "%" symbol be used instead to avoid to deal with a plural form?

Percent symbol ("%")

Are there languages where percent symbol is translated to something that have singular and plural forms?

  • "%s highlighted" (from Kindle appModule)

"3 highlighted" is not a very consistent sentence from a grammar point of view. Some languages may have translated to something more consistent, with a possible plural form.
From the languages that I have seen, only French however uses a translation of something like "highlighted %s times"; but in French the translation would not change depending on singular or plural ("times" and "time" both translate to "fois").

  • Are there languages which need a specific plural form for this message?
  • Or should I convert the source English to something more grammatically correct, e.g. "highlighted 3 times"; the drawback would be a longer form, and the presence of "2 times" where "twice" may be preferred.

Code Review Checklist:

  • Documentation:
    • Change log entry
    • User Documentation
    • Developer / Technical Documentation
    • Context sensitive help for GUI changes
  • Testing:
    • Unit tests
    • System (end to end) tests
    • Manual testing
  • UX of all users considered:
    • Speech
    • Braille
    • Low Vision
    • Different web browsers
    • Localization in other languages / culture than English
  • API is compatible with existing add-ons.
  • Security precautions taken.

source/globalCommands.py Outdated Show resolved Hide resolved
@AppVeyorBot
Copy link

See test results for failed build of commit 13dce07ef8

source/globalCommands.py Outdated Show resolved Hide resolved
@AppVeyorBot
Copy link

See test results for failed build of commit 67faf09ab0

@CyrilleB79 CyrilleB79 marked this pull request as ready for review November 30, 2023 14:49
@CyrilleB79 CyrilleB79 requested a review from a team as a code owner November 30, 2023 14:49
@CyrilleB79 CyrilleB79 requested a review from seanbudd November 30, 2023 14:49
@Adriani90
Copy link
Collaborator

Adriani90 commented Dec 1, 2023

In the romanian language we are using plural for cm, and also for percent, as sign and as a full written word but this is being handled in the translation itself so I think there is no rule needed from NVDA side as far as I understand. However I think this change is wellcome though maybe for other languages.

In my view the biggest problems in this regard are (I) plural for acronyms such as GUIs, PPAs NGOs etc. where all letters are upper case apart from the s at the end which is small letter see proposal in #11472) and (II) the plurals where changes in the middle or at the beginning of the word are needed to indicate it such as some latin languages, I guess also in slavic languages. cc: @zstanecic, @hkatic, @lukaszgo1 for slavic languages and @ultrasound1372, @amirsol81 and @Mohamed00 for acronyms.

@cary-rowen, @josephsl, @larry801, @minakononogaki, @nishimotz, @ManshulBelani, @sumandogra, @dinakar-td, @khsbory, @ungjinPark, @dnz3d4c maybe you guys can give some feedback on how it works in asian languages such as chinese Japanese, indian and corean,

and @mohammad-suliman, @Mohamed00 maybe you have some feedback on arabic languages perspective.

Maybe @nvdaes, @abdel792 or @fernando-jose-silva have some feedback on spanish and portuguese language, although I think this is very similar to romanian gramar.

@CyrilleB79
Copy link
Collaborator Author

@Adriani90, to clarify, this PR has nothing to do with how speech pronounce things. It only deals with better spelling of what is spoken or written.

In the romanian language we are using plural for cm, and also for percent, as sign and as a full written word but this is being handled in the translation itself so I think there is no rule needed from NVDA side as far as I understand.

I do not understand: do you use plural forms for the abbreviation "cm" and even for the symbole "%". How do you write it?

Also, you say that plural is handled by the translation. How?
With today's translation, in Romanian, NVDA reports:

  • "1 cm" and "7 cm". Is it a correct spelling? If no, what would be the correct spelling for these two expressions?
  • "1 la sută" and "7 la sută" for progress bars to 1% and 7%. Is it correct spelling? If not which one would it be?
  • "Marginile obiectului poziționate la 1 procente față de marginea stângă a ecranului" and "Marginile obiectului poziționate la 7 procente față de marginea stângă a ecranului" for object positionning. Is it correct spelling?

In my view the biggest problems in this regard are (I) plural for acronyms such as GUIs, PPAs NGOs etc. where all letters are upper case apart from the s at the end which is small letter see proposal in #11472)

As written before, this seems off-topic of this PR since #11472 deals with how words are spoken, not with good spelling. Could you clarify the link?

(II) the plurals where changes in the middle or at the beginning of the word are needed to indicate it such as some latin languages,

Could you provide some examples, if possible well known latin languages such as Spanish or French, else in Romanian (that I do not know at all unfortunately).

@josephsl
Copy link
Collaborator

josephsl commented Dec 1, 2023

Hi,

The overall topic of this PR deals with how text is presented and localized in NVDA's user interface and its messages, which is separate from speech output which is controlled by users and TTS engines. If words like "GUI's" and "NGO's" are included in NVDA's user interface, then we can consider them, but I can't find them via Grep.

As for Korean, singular and plural forms are announced the same.

By the way, in case it isn't covered, I advise looking at employing gettext.npgettext to cover plural forms with context as Python 3.8 and later includes gettext.pgettext family of functions.

Thanks.

@CyrilleB79
Copy link
Collaborator Author

By the way, in case it isn't covered, I advise looking at employing gettext.npgettext to cover plural forms with context as Python 3.8 and later includes gettext.pgettext family of functions.

Already covered. I.e. where pgettext was used and I needed to add a plural form, I have replaced by a call to npgettext.

@Adriani90
Copy link
Collaborator

I do not understand: do you use plural forms for the abbreviation "cm" and even for the symbole "%".

Not for the symbols when writing. After reading more exactly your PR description I understood that this is not targeting pronounciation but the spelling in NVDA core.

Also, you say that plural is handled by the translation. How?

For example for % often we use "x of hundreed" to avoid the difference between singular and plural when pronouncing. So 3 of hundreed or 1 of hundreed means 3% or 1% but 3% is spoken "3 procente" and 1% is spoken "1 procent". The most percent or % or cm are spelled correctly as of now in NVDA. However, I didn't test all situations where % is used to see if it makes sense to let % or "la suta". I have to test the kindle use case where NVDA reports the current location by percentage.

"Marginile obiectului poziționate la 1 procente față de marginea stângă a ecranului"

For example this is wrong because procente is plural and it is spoken like this even though it is only 1 percent so it should be singular. But this is something we can handle in the translation as I said above. For this specific example I guess it has not been done properly yet.

@CyrilleB79
Copy link
Collaborator Author

Also, you say that plural is handled by the translation. How?

For example for % often we use "x of hundreed" to avoid the difference between singular and plural when pronouncing. So 3 of hundreed or 1 of hundreed means 3% or 1% but 3% is spoken "3 procente" and 1% is spoken "1 procent". The most percent or % or cm are spelled correctly as of now in NVDA. However, I didn't test all situations where % is used to see if it makes sense to let % or "la suta". I have to test the kindle use case where NVDA reports the current location by percentage.

"Marginile obiectului poziționate la 1 procente față de marginea stângă a ecranului"

For example this is wrong because procente is plural and it is spoken like this even though it is only 1 percent so it should be singular. But this is something we can handle in the translation as I said above. For this specific example I guess it has not been done properly yet.

OK. So if I understand correctly, you may use "la suta" (i.e. "of hundred") for any translation of "percent", "per cent" and even "%" if you want. Testing with eSpeak, I realize that in Romanian "1%" seems to be pronounced correctly, but all other ones ("2%", "3%") are not correct.
Anyway, in all these cases, if Romanian translators feel that "1 procent" / "3 procente" sounds better, it can be handled correctly in this PR. Same for other languages if impacted. But beware that replacing "%" by a full word may be less interesting if the message is also brailled.

@CyrilleB79 CyrilleB79 marked this pull request as draft December 2, 2023 00:01
@codeofdusk
Copy link
Contributor

A few notes:

  • For measurement units, "in" might be pluralized (depending on language) but "cm" is universally singular as part of SI.
  • From your PR description, in English, you'd say "declined", not "declensed" (which isn't a word).

@Adriani90
Copy link
Collaborator

Testing with eSpeak, I realize that in Romanian "1%" seems to be pronounced correctly

even this is not completely correct but I think this is synthesizer related. eSpeak says "unu procent" which is actually wrong, it should pronounce "un procent". "Un" means the number has no preposition after it and is not related to another subject. "Unu" means there is a reference to another subject so there is a preposition between number and the subject it refers to. I.e. Unu la suta means one of hundreed so one refers to hundreed. Un procent means 1% but there is no direct reference to another number.

in all these cases, if Romanian translators feel that "1 procent" / "3 procente" sounds better, it can be handled correctly in this PR. Same for other languages if impacted.

Actually for Romanian "la suta" is the more common used expression in day to day language anyway, procent or procente is used in a more professional context. In fact procente would even consume more braille cells than "la suta".

@CyrilleB79 CyrilleB79 marked this pull request as ready for review December 4, 2023 22:13
@CyrilleB79
Copy link
Collaborator Author

Thanks @Adriani90 for this clarification.

OK. So no more change needed for now, either for Romanian or for other languages.

If translators ask more changes later, it can always be done in a subsequent PR.

ource Outdated Show resolved Hide resolved
source/NVDAObjects/window/_msOfficeChart.py Outdated Show resolved Hide resolved
source/NVDAObjects/window/_msOfficeChart.py Outdated Show resolved Hide resolved
source/appModules/powerpnt.py Show resolved Hide resolved
source/appModules/powerpnt.py Outdated Show resolved Hide resolved
source/appModules/powerpnt.py Outdated Show resolved Hide resolved
source/appModules/powerpnt.py Outdated Show resolved Hide resolved
source/appModules/powerpnt.py Outdated Show resolved Hide resolved
source/appModules/powerpnt.py Outdated Show resolved Hide resolved
Comment on lines 2805 to 2811
columnCountText = ngettext(
# Translators: Sub-part of the compound string to report a table
"{columnCount} column",
"{columnCount} columns",
columnCount,
).format(columnCount=columnCount)
rowCountText = ngettext(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps all these repeated components should be factored out into helper functions that just take the count?

e.g. _columnCountStr(count), _rowCountStr(count)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I do not understand which part of the code I can put in a common helper function. Could you clarify please?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following component is repeated multiple times in this code, same with the row equivalent:

ngettext(
	"{columnCount} column",
	"{columnCount} columns",
	columnCount,
).format(columnCount=columnCount)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seanbudd, I do not see the point in making a helper function to just call ngettext and `format``; IMO it makes code less easy to read.
Instead, I have made a helper function to factor out the whole computation of table size announcement. See commit 70d32e4.

If you do not agree with this approach, I can revert this change.

Copy link
Member

@seanbudd seanbudd Dec 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be reverted.

The reason this was suggested is there is 2 instances of each of these components.
This is like when we assign translatable strings that are used repeatedly to a variable, so that there is less duplication and the strings can be updated across usages easily.

The current approach doesn't remove the duplication - the components are still here, only the other usage of the components were refactored.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I got it. Done in 6db8c7d.

Note that I have kept the function _rowAndColumnCountText that I had extracted from getPropertiesSpeech since it allows to reduce a bit its complexity. getPropertiesSpeech remains much too complex though.

CyrilleB79 and others added 3 commits December 5, 2023 09:07
Accept first part of suggestions in review

Co-authored-by: Sean Budd <[email protected]>
source/appModules/powerpnt.py Outdated Show resolved Hide resolved
source/appModules/powerpnt.py Show resolved Hide resolved
@seanbudd seanbudd self-requested a review December 6, 2023 02:07
Copy link
Member

@seanbudd seanbudd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @CyrilleB79

@seanbudd seanbudd merged commit 0179a84 into nvaccess:master Dec 8, 2023
@nvaccessAuto nvaccessAuto added this to the 2024.1 milestone Dec 8, 2023
seanbudd added a commit that referenced this pull request Dec 8, 2023
@CyrilleB79 CyrilleB79 deleted the plural branch December 9, 2023 15:22
CyrilleB79 added a commit to CyrilleB79/nvda that referenced this pull request Jan 7, 2024
seanbudd pushed a commit that referenced this pull request Jan 7, 2024
Fix-up of #15864.

Reported in this thread of the translators mailing list.

Summary of the issue:
When pluralizing a lot of string in #15864, the one dedicated to battery time reporting was forgotten.

Description of user facing changes
Battery time estimation will be reported with correct plural forms.

Description of development approach
Split up the string in subparts and use ngettext for hours and for minutes.
Adriani90 pushed a commit to Adriani90/nvda that referenced this pull request Mar 13, 2024
…access#16011)

Fix-up of nvaccess#15864.

Reported in this thread of the translators mailing list.

Summary of the issue:
When pluralizing a lot of string in nvaccess#15864, the one dedicated to battery time reporting was forgotten.

Description of user facing changes
Battery time estimation will be reported with correct plural forms.

Description of development approach
Split up the string in subparts and use ngettext for hours and for minutes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Plural forms should be reported with correct grammar
9 participants