Adding new kernel metrics to the ctakes-ytex concept similarity service. #14
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This patch includes code to add additional kernel metrics to the ctakes-ytex
These include:
The algorithms for most can be found either in the original Perl UMLS::Similarity package or as described by Sanzhez and Batet in: https://www.sciencedirect.com/science/article/pii/S1532046411000645
Examples were computed and compared with output from the Perl UMLS::Similarity and verified to be the same. However, this requires that when testing against Perl's package, you must specify to use --instrinsic sanchez as the cTakes YTEX implementation of the IC is ONLY using the Sanchez implementation. If you do not specify the Resnik when calling the perl scripts, it will default to the corpus based IC which results in different numbers being produced. Once you force it to use the Sanchez IC, the distance metrics correspond exactly when running against the same UMLS database installed.