Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to recognize something immediately after a #Value, when the #Value is not followed by a space #801

Closed
numberslk opened this issue Dec 30, 2020 · 3 comments · Fixed by #808
Labels

Comments

@numberslk
Copy link

numberslk commented Dec 30, 2020

Is there a way to match something immediately after a #Value, when the #Value is not followed by a space?

let nlp = require('compromise');

let doc1 = nlp('An 80 year old from Panadura has died')
let doc2 = nlp('A 79-years-old woman from Colombo 13 has died in her r')
let doc3 = nlp('A 79 years old woman from Colombo 13 has died in her r')

console.log(doc1.match('[(years|-years-|year)]').lookBehind('#Value$').text()) //80
console.log(doc2.match('([years|-years-|year)]').lookBehind('#Value$').text()) //79
console.log(doc3.match('([years|-years-|year])').lookBehind('#Value$').text()) // ?

is there a way to get this working for the 3rd?

@spencermountain
Copy link
Owner

hey @numberslk good question - admittedly, there are two things going wrong here in compromise -

nlp('79-years-old').debug() is being treated like one word - think because of the number-letter combo - as though it were a long ID number or hash. I can look at improving the tokenization regex for this, it's a good example.

the second thing is there is a @hasDash match function that doesn't seem to be working - nlp('foo-bar').match('@hasDash').debug() I can take a look at that one, too.

woops!
good issue! Thanks.

@numberslk
Copy link
Author

Great. Instead of going from behind, I end up dealing with it from the front. :) don't thank me. Thank you for the awesome library.

spencermountain added a commit that referenced this issue Feb 4, 2021
@spencermountain spencermountain mentioned this issue Feb 4, 2021
Merged
@spencermountain
Copy link
Owner

got a fix for 79-years-old tokenization in v13.9

the other thing - it's not hasDash, it's @hasHyphen:

nlp('foo-bar').match('@hasHyphen').debug()

cheers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
2 participants