-
Notifications
You must be signed in to change notification settings - Fork 152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
apostrophe fixes #39
apostrophe fixes #39
Conversation
…nunciation Dictionary (AmEPD) pronunciations.
This looks good to me. I guess the first commit should only be pulled together with the amepd dictionary? (as I understand the first commit message) All in all it looks good to me I'll download some shakespear to test it further =) @zeehio do you have any opinion? |
Re: the first commit -- yes, that is correct. Re: Shakespeare -- currently, it will spell out words like "giv'st" and "mak'st" as it does not have the rule for |
To me it makes sense to improve the handling of apostrophes to cover more cases. @rhdunn language skills are much better than mine, so I trust him. My main concern when extending rules is to minimize possible regressions. In this case the patch seems local and targeted so I would not worry. However, it would be awesome to have a tool that uses mimic to transcribe a list of words into a list of phonetic transcriptions, so we can run it and check if any word has changed (and manually then we can confirm that the new transcription is better than before). (Sorry for the delay replying, I am trying to catch up with everything) |
@zeehio the late reply is quite alright, we're just glad to have you back =) I'll cherrypick the first commit into the amepd PR and the other into the development branch then? |
@zeehio: I can write such a script and add it to the unittest directory if you want. |
@forslund Or we could merge both PR into development? The amepd is to me an upgrade of the cmulex dictionary, with more entries and being properly maintained. Why not merge it? |
This is applying various fixes to the way apostrophe-based words are handled, especially in archaic texts (e.g. Shakespeare and Milton) where
'd
is used as a variation of-ed
(FLOW'D
,WITNESS'D
).This follows the rules for pronunciation of
-ed
forms, except for adjectives ("Bless'd be thy name." which is an adjective and should use/ih d/
, vs "I am bless'd." which should use the unvoiced/t/
form). The other place where this does not work properly is for words like "MOV'D" (MOVED) and "OPPOS'D" (OPPOSED) where the vowel takes "long" forms (/uw/ and /ow/ respectively for MOV'D and OPPOS'D).I have tried to support
'st
for -est words like "MAK'ST" (MAKEST) by first adding'st
to the list atlang/usenglish/us_text.c
(line 684), but using the schwa code to add an/ih/
vowel results in/ih s ih t/
instead of/ih s t/
.