-
Notifications
You must be signed in to change notification settings - Fork 501
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hebrew long-form spelling, gender, ordinals, fractions, maxval=1e66, construct forms, etc #490
Conversation
…odern long-form (ktiv male); add hebrew gender; and hebrew ordinals; add hebrew decimal fraction and negative; fix hebrew word order for singular; fix hebrew currency names; fix global double space after minus; fix tests
sorry not sure if there is anything i need to do? |
@eyaler The issue is related to the Flake8 max character in a line, you can check the test report to see which lines need adjusting, thanks for your contribution |
@mrodriguezg1991 fixed. flake8 + tox pass ok on my side |
@eyaler can you add some tests?, the coverage decreased with this PR, thanks |
@mrodriguezg1991 i actually added a lot of tests! could you point me out where the coverage is lacking? |
|
@mrodriguezg1991 thanks for your guidance!
|
Fixes Eyal Gruss
Changes proposed in this pull request:
Hebrew:
** change spelling from biblical and short-form (ktiv haser) to modern long-form (ktiv male)
** male/female genders (default to female for cardinal and to male for ordinal)
** ordinals (with optional definite and plural forms)
** decimal fraction and negatives
** fix currency name (NIS->ILS), add currency genders, pluralize סנט, add optional singular forms for plural when allowed.
** fix word order for singular form with currency
** fix "two" to always use construct form with currency
** construct forms (סמיכות)
** max val = 1e66 (up from 10,000)
** simplify tables for hundreds and thousands (DRY)
** add tests
Global:
** fix potential unintended double-space after negword
** fix new lines in test CLI
Status
How to verify this change
for spelling search the word here: https://hebrew-academy.org.il/ and observe the form labeled "ללא ניקוד" (=without diacritics)
Additional notes
about short vs long form: https://hebrew-academy.org.il/topic/hahlatot/missingvocalizationspelling/
note that for this commit i chose to override the short-form. a future feature could bring back (non-biblical) short-form as a non-default secondary option, preferably with diacritics. the biblical form (e.g. שלש), could be yet another option.
about decimal fractions: https://hebrew-academy.org.il/2019/12/03/%D7%A2%D7%9C-%D7%94%D7%91%D7%A2%D7%AA-%D7%94%D7%9E%D7%A1%D7%A4%D7%A8-%D7%94%D7%9E%D7%A2%D7%95%D7%A8%D7%91/
Note that there are actually 4 allowed forms. i chose the one the seems more relevant to the convention of digit by digit readout after the decimal point.
For large numbers millions, etc. there are various allowed variations, i chose the simplest one (singular non-construct)
Missing features: add comma separators in long numbers, allow hyphen (or hebrew hyphen) for numbers in construct form
If applicable, explain the rationale behind your change.
This brings the Hebrew support to usable level for modern use.