Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Punctuation following abbreviations causes sentences to merge #1061

Closed
Fdawgs opened this issue Nov 15, 2023 · 3 comments
Closed

Punctuation following abbreviations causes sentences to merge #1061

Fdawgs opened this issue Nov 15, 2023 · 3 comments

Comments

@Fdawgs
Copy link
Contributor

Fdawgs commented Nov 15, 2023

Node version: 18.18.2
Compromise version: 14.10

As title states, full stops and other punctuation types that denote an end of a sentence (?! etc.) that occur after an abbreviation causes the trailing sentence to be treated as part of the original sentence.

Reproduction:

const nlp = require('compromise');

const text = "Dr. Hibbert has advised starting Homer on morphine 400 mg. I have copied this letter to his general practitioner.";
const sentences = nlp(text).sentences().out('array');
console.log(sentences);
/**
 * outputs: 
 * [
 *    'Dr. Hibbert has advised starting Homer on morphine 400 mg. I have copied this letter to his general practitioner.'
 * ]
 */

Comparison without using an abbreviation:

const nlp = require('compromise');

const text = "Dr. Hibbert has advised starting Homer on morphine 400 milligrams. I have copied this letter to his general practitioner.";
const sentences = nlp(text).sentences().out('array');
console.log(sentences);
/**
 * outputs: 
 * [
 *    'Dr. Hibbert has advised starting Homer on morphine 400 milligrams.',
 *    'I have copied this letter to his general practitioner.'
 * ]
 */
@spencermountain
Copy link
Owner

hey Frazer, with periods, this is the expected behaviour for abbreviations, like 400 mg. of THC, and a sr. in high-school.
but yeah '12 mg!' and 12 mg? should truncate the sentence.
will add this one to the list. Good catch
cheers

@spencermountain
Copy link
Owner

spencermountain commented Nov 16, 2023

fixed in 14.10.1, thanks for the help

@thegoatherder
Copy link
Contributor

@spencermountain this is half-fixed. I think the problem is when an abbreviation is used in text and then followed by a genuine new sentence.

I prescribed him 400mg. He went to the pharmacy.

As I think about this, I guess there's no easy fix for it. We could detect an uppercase next work but I imagine that will have a lot of false-positives.

FWIW we have a large body of clinical dialogue in text and we rarely would see the . after a unit abbreviation. It's not common at all. Usually it's presented without the . i.e. He was injected with 400mg of morphine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants