Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

really incorrect match_affil results #11

Open
simonatdrg opened this issue Mar 15, 2018 · 5 comments
Open

really incorrect match_affil results #11

simonatdrg opened this issue Mar 15, 2018 · 5 comments

Comments

@simonatdrg
Copy link

I created a set of affiliation strings from Pubmed abstracts which all include 'Harvard' (University, medical school, etc) and ran them through match_affil, After downloading the most recent grid.csv dataset (which has many entries for Harvard, including 'Harvard medical school'). The script, input file and results are attached.

You'll see that there are very few matches, and of those quite a few are incorrect. II'm not an expert on the machine learning techniques involved - can you explain and possibly suggest ways to improve these results ?
Zip file with sccript, input data and outputs attached.

matchaffil_test.zip

@titipata
Copy link
Owner

Hi @simonatdrg, yeah, totally agree I was hard coding Harvard when parsing it.

I'll take a look at it and fix it by following weeks! (sorry, I'm a little busy for this week)

@simonatdrg
Copy link
Author

simonatdrg commented Mar 15, 2018 via email

@titipata
Copy link
Owner

Hi @simonatdrg, sorry for the really late reply. I will work on this issue over the weekend. Hopefully will solve most issues here.

@fangzhou-xie
Copy link

fangzhou-xie commented Dec 16, 2019

Hi, currently I have similar questions. For example when matching "Stanford University", it shows:

(Pdb++) match_affil('Stanford University')
OrderedDict([('ID', 'grid.440952.e'), ('Name', 'University of Belize'), ('City', 'Belmopan'), ('State', ''), ('Country', 'Belize')])

I have tried some other well-known universities (Harvard, Princeton, NYU, Columbia, Caltech, etc) but the match_affil works just fine. I wonder if this result could be improved somehow?

Thank you!

@titipata
Copy link
Owner

Hi @mark-fangzhou-xie, yeah, I wrote this library a while ago and I really need to update the code on this repo. Currently, the matching is done based on the nearest neighbor algorithm. I will try to improve it over the next month if I have time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants