Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Proper Tagging of Names with Possessive Apostrophes #1059

Closed
MarketingPip opened this issue Nov 4, 2023 · 6 comments
Closed

Comments

@MarketingPip
Copy link
Contributor

MarketingPip commented Nov 4, 2023

Description:
I would like to propose / request a new feature / rule for Compromise.js that handles names with possessive apostrophes (e.g., "Steve's") by tagging the name (of person / proper noun's - tagged as person) before the apostrophe. Currently, Compromise.js does not provide a straightforward way to handle such cases. (That I do know of)

Why is this feature valuable?
Many texts and documents contain names in possessive form, and it's essential to extract and process these names correctly. By implementing this feature, Compromise.js can possibly improve the tagging process & extract more names properly.

And as far as I know we as human's add plural to pronouns only when finishing the name. Correct me if I am wrong.

Example:

Consider the following text:

"George Lucas's Lucasfilm"

should become:

George Lucas

Proposed Implementation:

  • Update .people() to remove from names or split text / rule at instance of "'s". (Second solution would be ideal - rule set based)
  • Identify possessive forms (e.g., "'s") of people tagged.
  • Keep original text before split ie first possessive found of person (e.g "Bill's" in stored value (JSON)
  • Split each possessive form (of person tag) to isolate the name.
  • Provide a way for users to access the extracted names without the pluralization removed (ie: in doc.people().json()).
@MarketingPip MarketingPip changed the title Feature Request: Splitting Names with Possessive Apostrophes Feature Request: Proper Tagging of Names with Possessive Apostrophes Nov 4, 2023
@spencermountain
Copy link
Owner

Hey Jared, believe that is already

.possessives().strip() - "Spencer's" -> "Spencer"

Cheers

@MarketingPip
Copy link
Contributor Author

Hey Jared, believe that is already

.possessives().strip() - "Spencer's" -> "Spencer"

Cheers

I think I might have phrased / wrote this issue wrongly.

When calling .people() on "George Lucas's Lucasfilm" it will return

 ["George Lucas's Lucasfilm"] 

when expected output should be

 ["George Lucas"] or  ["George Lucas's"]

Which there should be a rule for ALL people, so when people are tagged and tags look like this

George Lucas's film club
                      ^ stop tagger here
Spencers's awesome library
               ^ stop tagger here

As again - we as humans (as far as I know) do not ever pluralize / have possessive at the end of our names (indicating a stop / split for tagger). Which again should help tagging process so the next words ahead such as "Lucasfilm" can be properly tagged.

Hoping that made more sense & hoping you're having an awesome weekend. :cheers:

ps; look AT that compression issue you closed on me - still think I am onto something!

@MarketingPip
Copy link
Contributor Author

@spencermountain was trying to think of any names that this rule might not work with - unless Elon must spawns more kids I think it should would work. 😂

ps: I want to add their names / weird names like those to people lexicon. But I got some cool stuff coming up for Compromise.js - plus a way better way for you to get data to populate other versions. Nouns, verbs etc - in the same format you like them / need them. 👌

@spencermountain
Copy link
Owner

ah, yeah of course.
yeah - George Lucas's Lucasfilm is definetly mis-tagged, and a .splitAfter a posessive would work great.
Oh man, I didn't know the .people() match logic was this bad. You're welcome to improve it.

I actually have been working on the same thing - i've added a couple hundred missing names on the dev branch, from a wikipedia analysis, which I think is similar to what you're doing.

I hope to get dev stable for a release this week. There are a few tests failing. If you make a pr before then, please do it off the dev branch. cheers

@MarketingPip
Copy link
Contributor Author

MarketingPip commented Nov 7, 2023

@spencermountain - will do! And that's exactly what I was looking for but not sure how to properly write that in tag rules. If you wanna drop a example for reference - feel free.

And jeez this looked like my half baked idea on determining names from locations.

Food for thought to - save yourself some time by using Wikidata via query! And at last resort - start making data pulls from Wikipedia (I was foolishly doing this before).

Tho I will make a gist with a preview on some basic tool I made for Compromise and see where you wanna shove it. I am hoping to package it as a separate library under MP & you can use it via import & build etc. As I was thinking it will be good for the library instead of a plugin - but that choice will be yours!

It ironically has to do with names too ie: Diminutives, check's if a human name is known as something else such as Steve / Steven. So I got you covered with lots of first names etc. Plus I got a huge DB of Hispanic names etc of inmates that I was planning on making a PR for to add to this and other versions of compromise. Just I have been waiting to slowly make PR's to not piss you off. lol

So look for that notification soon enough & drop me some CONTACT info soon enough lol 👍

ps; apologizes - didn't mean to close on comment lol

@spencermountain
Copy link
Owner

this should be fixed in 14.10.1
cheers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants