-
Notifications
You must be signed in to change notification settings - Fork 23
Home
David Campos edited this page Oct 13, 2016
·
11 revisions
Neji is a flexible and powerful platform for biomedical information extraction from scientific texts, such as patents, publications and electronic health records.
Please use the right menu to access further documentation.
- Gimli for machine learning NER training
- Multiple linguistic parsers support, for general text and multi-language
- Support to additional input and output formats, including BioC
- SDK usability improvements
- Performance improvements
- Stability improvements
With Neji you can build processing pipelines for:
-
Concept recognition:
- Dictionary-based, Machine learning-based and Rule-based
-
Train machine learning models for NER (Named Entity Recognition):
- Normalization with dictionary matching and Stopword filtering
-
Linguistic parsing:
- Sentence splitting, Tokenisation, Lemmatisation, Chunking and Dependency parsing
-
Convert between corpora formats:
- Input formats: BioC, XML, HTML and Text
- Output formats: JSON, A1, BC2, Base64, BioC, CoNLL, IeXML, Pipe and PipeExtended
-
Read documents
- Raw, XML and BioC formats, supporting Pubmed and BioMed Central articles.
-
Process target data
- Modules for sentence splitting, tokenization, dependency parsing, concept recognition (dictionary and machine learning), and more.
-
Get concept tree
- Innovative concept tree with nested and intersected annotations supporting multiple identifiers.
-
Store information
- Various known output formats: XML, A1, CoNLL, JSON, and BioC.
Please contact BMD Software for further support and consulting services.
Copyright (C) 2016 BMD Software and University of Aveiro
Neji is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/3.0/.