Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: nlp-sentencize wrongly splits sentences with multiple punctuation marks #3013

Open
2 tasks done
Pupix opened this issue Oct 16, 2024 · 3 comments
Open
2 tasks done
Assignees
Labels
Bug Something isn't working.

Comments

@Pupix
Copy link

Pupix commented Oct 16, 2024

Description

Hello! Not sure if this is the right place, but can't post in the other repo.

Using @stdlib/[email protected] with phrases like 'HAPPY BIRTHDAY!!!' will incorrectly return a sentence for every punctuation mark:

console.log(sentencize('HAPPY BIRTHDAY!!!'));
> ['HAPPY BIRTHDAY!', '!', '!']

console.log(sentencize('what??'));
>  ['what?', '?']

console.log(sentencize('HOW DARE YOU?!?!'));
> ['HOW DARE YOU?', '!', '?', '!']

The above examples should be considered one sentence each

Weirdly enough it works well with ellipsis and phrases ending in !!!1!!11!!! and stuff like that. Such as:

console.log(sentencize('Yeah, about that...'));
> ['Yeah, about that...']

console.log(sentencize('OH EM GEE!!!1!!11!one!!1'));
> ['OH EM GEE!!!1!!11!one!!1']

This one is fine.

Cheers!

Related Issues

No response

Questions

No response

Demo

No response

Reproduction

const sentencize = require('@stdlib/nlp-sentencize');
console.log(sentencize('SURPRISE!!!'));

Expected Results

['SURPRISE!!!']

Actual Results

['SURPRISE!', '!', '!']

Version

0.2.2

Environments

Node.js

Browser Version

No response

Node.js / npm Version

v22.9.0

Platform

Windows 11

Checklist

  • Read and understood the Code of Conduct.
  • Searched for existing issues and pull requests.
@stdlib-bot
Copy link
Contributor

👋 Hi there! 👋

And thank you for opening your first issue! We will get back to you shortly. 🏃 💨

@kgryte kgryte added the Bug Something isn't working. label Oct 16, 2024
@Pupix
Copy link
Author

Pupix commented Oct 17, 2024

Punctuation is broken with prefixes/suffixes as well. I can make a new issue if need be.

console.log(sentencize('I said "Look out" right before he banged his head'));
> [ 'I said "Look out" right before he banged his head' ] // This is correct

console.log(sentencize('I said "Look out!" right before he banged his head'));
> ['I said "Look out!"', 'right before he banged his head'] // This should be one sentence

@Planeshifter
Copy link
Member

@Pupix Thanks for flagging these issues! A separate issue would be a good idea for that. I will be looking into these shortly.

@kgryte kgryte changed the title nlp-sentencize wrongly splits sentences with multiple punctuation marks [BUG]: nlp-sentencize wrongly splits sentences with multiple punctuation marks Oct 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working.
Projects
None yet
Development

No branches or pull requests

4 participants