Learning Entities from Narratives of Skin Cancer (LENS) is a Python library designed for Named Entity Recognition (NER) specifically tailored to narratives related to skin cancer. LENS is designed to recognize and categorize important entities within skin cancer narratives. It is equipped with 24 distinct tags (see file annotation_guidelines.pdf), which allow for the extraction of key information from unstructured text. This information can be linked to biomedical ontologies such as SNOMED-CT and MedCAT, facilitating structured data analysis in clinical and research settings.
The primary objective of LENS is to process input text—such as online narratives from platforms like Reddit—and return the corresponding LENS tags. These tags allow for the categorization of various entities mentioned in the text, facilitating further analysis and integration with biomedical ontologies.
To install the latest version of LENS, please run the following command:
pip install https://huggingface.co/4DPicture/OncoNER/Lens/resolve/main/onco_lens_ner-0.1.0-py3-none-any.whl
Below is an example of how to use LENS to extract entities from a skin cancer narrative:
import onco_lens_ner as lens
text = "I was diagnosed with melanoma last year. I'm currently undergoing immunotherapy and sometimes feel nauseous."
entities = lens.get_entities(text)
print(entities)
LENS provides a range of functionalities to meet diverse user needs:
- Extract all LENS entities: Identify and extract all recognized entities from a given text.
entities = lens.get_entities(text)
print(entities)
- Display all entities: Output the extracted entities with their corresponding tags.
lens.display_entities(text)
- Extract entities for a specific label: Extract entities corresponding to a specific tag, such as
INV
(Investigation).
entities = lens.get_entities(text, tag_list=['INV'])
print(entities)
- Extract entities for a subset of labels: Focus on a subset of tags, for example,
TRT
andSYM
.
entities = lens.get_entities(text, tag_list=['TRT', 'INV'])
print(entities)
- Display entities for a subset of labels: Output entities for specific tags, such as
TRT
,SYM
, andINV
.
lens.display_entities(text, tag_list=['TRT', 'SYM','INV'])
- Extract all MedCAT Mappings: Link recognized entities to MedCAT biomedical concepts.
lens.display_entities(text)
- Extract all SNOMED-CT Mappings: Link recognized entities to SNOMED-CT concepts.
lens2medcat = lens.lens2medcat(text)
print(lens2medcat)
- Save the annotations in JSON format: Save the extracted entities and mappings in a structured JSON file for further analysis.
lens2snomedct = lens.lens2snomedct(text)
print(lens2snomedct)
A comprehensive tutorial on how to use LENS, including advanced features, is available here.
LENS is licensed under the MIT License. Please see the LICENSE file for further information.