-
-
Notifications
You must be signed in to change notification settings - Fork 655
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add plural forms for needed strings in NVDA's code base #15864
Conversation
See test results for failed build of commit 13dce07ef8 |
See test results for failed build of commit 67faf09ab0 |
Co-authored-by: Luke Davis <[email protected]>
In the romanian language we are using plural for cm, and also for percent, as sign and as a full written word but this is being handled in the translation itself so I think there is no rule needed from NVDA side as far as I understand. However I think this change is wellcome though maybe for other languages. In my view the biggest problems in this regard are (I) plural for acronyms such as GUIs, PPAs NGOs etc. where all letters are upper case apart from the s at the end which is small letter see proposal in #11472) and (II) the plurals where changes in the middle or at the beginning of the word are needed to indicate it such as some latin languages, I guess also in slavic languages. cc: @zstanecic, @hkatic, @lukaszgo1 for slavic languages and @ultrasound1372, @amirsol81 and @Mohamed00 for acronyms. @cary-rowen, @josephsl, @larry801, @minakononogaki, @nishimotz, @ManshulBelani, @sumandogra, @dinakar-td, @khsbory, @ungjinPark, @dnz3d4c maybe you guys can give some feedback on how it works in asian languages such as chinese Japanese, indian and corean, and @mohammad-suliman, @Mohamed00 maybe you have some feedback on arabic languages perspective. Maybe @nvdaes, @abdel792 or @fernando-jose-silva have some feedback on spanish and portuguese language, although I think this is very similar to romanian gramar. |
@Adriani90, to clarify, this PR has nothing to do with how speech pronounce things. It only deals with better spelling of what is spoken or written.
I do not understand: do you use plural forms for the abbreviation "cm" and even for the symbole "%". How do you write it? Also, you say that plural is handled by the translation. How?
As written before, this seems off-topic of this PR since #11472 deals with how words are spoken, not with good spelling. Could you clarify the link?
Could you provide some examples, if possible well known latin languages such as Spanish or French, else in Romanian (that I do not know at all unfortunately). |
Hi, The overall topic of this PR deals with how text is presented and localized in NVDA's user interface and its messages, which is separate from speech output which is controlled by users and TTS engines. If words like "GUI's" and "NGO's" are included in NVDA's user interface, then we can consider them, but I can't find them via Grep. As for Korean, singular and plural forms are announced the same. By the way, in case it isn't covered, I advise looking at employing gettext.npgettext to cover plural forms with context as Python 3.8 and later includes gettext.pgettext family of functions. Thanks. |
Already covered. I.e. where |
Not for the symbols when writing. After reading more exactly your PR description I understood that this is not targeting pronounciation but the spelling in NVDA core.
For example for % often we use "x of hundreed" to avoid the difference between singular and plural when pronouncing. So 3 of hundreed or 1 of hundreed means 3% or 1% but 3% is spoken "3 procente" and 1% is spoken "1 procent". The most percent or % or cm are spelled correctly as of now in NVDA. However, I didn't test all situations where % is used to see if it makes sense to let % or "la suta". I have to test the kindle use case where NVDA reports the current location by percentage.
For example this is wrong because procente is plural and it is spoken like this even though it is only 1 percent so it should be singular. But this is something we can handle in the translation as I said above. For this specific example I guess it has not been done properly yet. |
OK. So if I understand correctly, you may use "la suta" (i.e. "of hundred") for any translation of "percent", "per cent" and even "%" if you want. Testing with eSpeak, I realize that in Romanian "1%" seems to be pronounced correctly, but all other ones ("2%", "3%") are not correct. |
A few notes:
|
even this is not completely correct but I think this is synthesizer related. eSpeak says "unu procent" which is actually wrong, it should pronounce "un procent". "Un" means the number has no preposition after it and is not related to another subject. "Unu" means there is a reference to another subject so there is a preposition between number and the subject it refers to. I.e. Unu la suta means one of hundreed so one refers to hundreed. Un procent means 1% but there is no direct reference to another number.
Actually for Romanian "la suta" is the more common used expression in day to day language anyway, procent or procente is used in a more professional context. In fact procente would even consume more braille cells than "la suta". |
Thanks @Adriani90 for this clarification. OK. So no more change needed for now, either for Romanian or for other languages. If translators ask more changes later, it can always be done in a subsequent PR. |
source/speech/speech.py
Outdated
columnCountText = ngettext( | ||
# Translators: Sub-part of the compound string to report a table | ||
"{columnCount} column", | ||
"{columnCount} columns", | ||
columnCount, | ||
).format(columnCount=columnCount) | ||
rowCountText = ngettext( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps all these repeated components should be factored out into helper functions that just take the count?
e.g. _columnCountStr(count), _rowCountStr(count)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I do not understand which part of the code I can put in a common helper function. Could you clarify please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The following component is repeated multiple times in this code, same with the row equivalent:
ngettext(
"{columnCount} column",
"{columnCount} columns",
columnCount,
).format(columnCount=columnCount)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@seanbudd, I do not see the point in making a helper function to just call ngettext
and `format``; IMO it makes code less easy to read.
Instead, I have made a helper function to factor out the whole computation of table size announcement. See commit 70d32e4.
If you do not agree with this approach, I can revert this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be reverted.
The reason this was suggested is there is 2 instances of each of these components.
This is like when we assign translatable strings that are used repeatedly to a variable, so that there is less duplication and the strings can be updated across usages easily.
The current approach doesn't remove the duplication - the components are still here, only the other usage of the components were refactored.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I got it. Done in 6db8c7d.
Note that I have kept the function _rowAndColumnCountText
that I had extracted from getPropertiesSpeech
since it allows to reduce a bit its complexity. getPropertiesSpeech
remains much too complex though.
Accept first part of suggestions in review Co-authored-by: Sean Budd <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @CyrilleB79
Fix-up of #15864. Reported in this thread of the translators mailing list. Summary of the issue: When pluralizing a lot of string in #15864, the one dedicated to battery time reporting was forgotten. Description of user facing changes Battery time estimation will be reported with correct plural forms. Description of development approach Split up the string in subparts and use ngettext for hours and for minutes.
…access#16011) Fix-up of nvaccess#15864. Reported in this thread of the translators mailing list. Summary of the issue: When pluralizing a lot of string in nvaccess#15864, the one dedicated to battery time reporting was forgotten. Description of user facing changes Battery time estimation will be reported with correct plural forms. Description of development approach Split up the string in subparts and use ngettext for hours and for minutes.
Link to issue number:
Closes #12445
Follow-up of #15013.
Summary of the issue:
When possible NVDA needs to distinguish between singular and plural in UI messages. The impact may be more significant in languages such as Slavic ones, where there are more than one plural form.
Description of user facing changes
In NVDA UI messages, i.e. in GUI or when NVDA speaks or brailles messages, messages will use singular/plural form as needed.
Description of development approach
_
/pgettext
byngettext
/npgettext
where needed.table-rowcount
andtable-columncount
control fields of the virtual buffer's text info have had to be converted to integer (previously strings), so that computation can be done.Testing strategy:
Known issues with pull request:
No conversion of foobar messages
Regarding the foobar appModule which uses ``utils.localisation.TimeOutputFormat`, it manages plural as follows:
That does not match needs of languages having more than one plural form.
I have not modified
utils.localisation.TimeOutputFormat
not foobar appModule however since it is not straight forward at all; I am not sure that makingTimeOutputFormat
class inheriting fromDisplayStringEnum
was a good idea.Also there would be more things to change to how time is reported in foobar:
Since
utils.localisation.TimeOutputFormat
is only used in foobar, since tracks lasting more than 24 hours are very rare and since more work needs to be done inutils.localisation.TimeOutputFormat
anyway, I have not included changes inutils.localisation.TimeOutputFormat
or foobar appModule in this PR.Declension matching issue
In some languages, there may still be issues in translation for messages such as "list with %s item" due to declension. Indeed, when encountering a number written with digits, the synth does not know which case should be used to pronounce it and usually uses nominative case. This cause something strange because the number (nominative case) and the noun (instrumental case) do not match whereas they should match, i.e. the number should be declensed to instrumental case too. However synth cannot handle the complexity of this correctly.
It has been discussed in this thread of the translators mailing list. And it has been agreed that such situations should be handled by translators through work-around in their translations.
E.g. maybe translate to "list, 5 items" instead of "list with 5 items".
Open questions
Here are various expressions that I have not converted yet. The relevance of converting them is to be discussed with reviewers as well as with translators on the mailing list. If a language needs one of them to be converted, please inform me and I will do it.
pt
Even if in English the abbreviation is invariable (e.g. "1 pt", "3 pt" without "s"), some language may have translated it as a full word, i.e. translation of "point" instead of "pt".
Abbreviations of measurement units
Are there languages where a plural form is used for these measurement units? E.g. "8 cms" vs "1 cm"
For the last one which are HTML character size specification ("ex", "em", "rem"), I doubt there is any difference at all between English and translation.
Percent (full word)
Message where percent is written as a word ("percent" or "per cent") and not as a symbol ("%")
Are there languages where "percent" / "per cent" is used? If yes, can the "%" symbol be used instead to avoid to deal with a plural form?
Percent symbol ("%")
Are there languages where percent symbol is translated to something that have singular and plural forms?
"3 highlighted" is not a very consistent sentence from a grammar point of view. Some languages may have translated to something more consistent, with a possible plural form.
From the languages that I have seen, only French however uses a translation of something like "highlighted %s times"; but in French the translation would not change depending on singular or plural ("times" and "time" both translate to "fois").
Code Review Checklist: