Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

apostrophe fixes #39

Merged
merged 2 commits into from
May 18, 2016
Merged

apostrophe fixes #39

merged 2 commits into from
May 18, 2016

Conversation

rhdunn
Copy link
Contributor

@rhdunn rhdunn commented May 11, 2016

This is applying various fixes to the way apostrophe-based words are handled, especially in archaic texts (e.g. Shakespeare and Milton) where 'd is used as a variation of -ed (FLOW'D, WITNESS'D).

This follows the rules for pronunciation of -ed forms, except for adjectives ("Bless'd be thy name." which is an adjective and should use /ih d/, vs "I am bless'd." which should use the unvoiced /t/ form). The other place where this does not work properly is for words like "MOV'D" (MOVED) and "OPPOS'D" (OPPOSED) where the vowel takes "long" forms (/uw/ and /ow/ respectively for MOV'D and OPPOS'D).

I have tried to support 'st for -est words like "MAK'ST" (MAKEST) by first adding 'st to the list at lang/usenglish/us_text.c (line 684), but using the schwa code to add an /ih/ vowel results in /ih s ih t/ instead of /ih s t/.

@forslund
Copy link
Collaborator

This looks good to me. I guess the first commit should only be pulled together with the amepd dictionary? (as I understand the first commit message)

All in all it looks good to me I'll download some shakespear to test it further =)

@zeehio do you have any opinion?

@rhdunn
Copy link
Contributor Author

rhdunn commented May 12, 2016

Re: the first commit -- yes, that is correct.

Re: Shakespeare -- currently, it will spell out words like "giv'st" and "mak'st" as it does not have the rule for 'st as discussed above. You will also find other words where the e has been elided with an apostrophe (e.g. "temp'rance") and other uses not covered by the postlex rules.

@zeehio
Copy link
Contributor

zeehio commented May 16, 2016

To me it makes sense to improve the handling of apostrophes to cover more cases. @rhdunn language skills are much better than mine, so I trust him.

My main concern when extending rules is to minimize possible regressions. In this case the patch seems local and targeted so I would not worry. However, it would be awesome to have a tool that uses mimic to transcribe a list of words into a list of phonetic transcriptions, so we can run it and check if any word has changed (and manually then we can confirm that the new transcription is better than before).

(Sorry for the delay replying, I am trying to catch up with everything)

@forslund
Copy link
Collaborator

@zeehio the late reply is quite alright, we're just glad to have you back =)

I'll cherrypick the first commit into the amepd PR and the other into the development branch then?

@forslund
Copy link
Collaborator

@zeehio: I can write such a script and add it to the unittest directory if you want.

@zeehio
Copy link
Contributor

zeehio commented May 17, 2016

@forslund Or we could merge both PR into development? The amepd is to me an upgrade of the cmulex dictionary, with more entries and being properly maintained. Why not merge it?

@forslund
Copy link
Collaborator

@zeehio: I'm all for merging them both! I've been hesitant on merging the amepd PR since I created it. Would be nice to close #15 after all this time =)

@zeehio zeehio closed this May 18, 2016
@zeehio zeehio reopened this May 18, 2016
@zeehio zeehio merged commit 97a9b76 into MycroftAI:development May 18, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants