Skip to content

medesto/onset_pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 

Repository files navigation

onset_pipeline

Annotation guidelines and methodological approach for the paper:

Viani N, Botelle R, Kerwin J, Yin L, Patel R, Stewart R, Velupillai S. "When Did It Start? A Natural Language Processing Approach for Determining Disease Onset from Mental Health Text", submitted.

Pipeline steps:

Extract first referral documents for patients with a diagnosis of schizophrenia:

  • documents written within 3 months of first referral date
  • first referral date after 1st Jan 2012
  • diagnosis like %f20% or %schizophrenia%

Data filtering

  • calculate document length and average line length
  • calculate number of symptom keywords per document (from previously developed list)
  • calculate number of time expressions per document (from SUTimeMentalHealth)
  • apply filters (character length >= 50th percentile, average line length >= 25th percentile, at least 1 symptom keyword, more than 5 time expressions)

Manual annotation

  • randomly extract patients
  • manually annotate documents for onset mentions, with type (psychosis simptom, diagnosis, non-specific) and time information (past anchored, past not anchored, current)

NLP development

  • divide documents in paragraphs
  • apply filters: remove shorter paragraphs and paragraphs without time expressions (from SUTimeMentalHealth)
  • binary classification problem: label paragraphs with 1 (if containing a past anchored mention) or 0 (otherwise)
  • model selection with 5-fold cross validation on training set, experimenting with different features (TF-IDF counts vs. embedding features), and supervised machine learning classifiers (LR, RF and SVM)

Patient-level ranked list of likely onset dates

  • for each patient, look at all predicted relevant paragraphs
  • for each paragraph, extract earliest date or age
  • rank dates according to paragraph probability

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published