-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add cui2vec embeddings #25
Comments
Nice find! |
Additional information:
|
Hey this is my paper, how cool! I'd be happy to contribute these, let me know if they need any clean up first. |
Oh, hi @beamandrew, glad to see you here! Please follow the instruction https://github.com/RaRe-Technologies/gensim-data#want-to-add-a-new-corpus-or-model |
Will do! It might be a couple weeks until I can get it together. I'm teaching a deep learning class right now that won't end until May which keeps me pretty busy. I'm actually having them use the embeddings from this repo in class to build an RNN (which is how I ended up finding this issue). You can check it out here if you're interested: |
@beamandrew can you give read access for |
Oops, try this link which should let you view: https://drive.google.com/file/d/1WuoHWf1KyFsNiilbVa7qnKkSDALfch01/view?usp=sharing |
Last I checked the actual concept names aren't include in this dataset and/or under the same license, but they are available from a different source which looks legitimately released. I have, in fact, a task to correlate them. Without this correlation, the embeddings discussed here include arbitrary codes instead of the original (concept) words that you see in the online demo. |
I currently have some data that will allow for this mapping as @matanster describes from the author of this publication (Section 2). If anyone is interested I can upload a link to this as I sit next to the author and he has given his permission @jimmyoentung. |
Thanks guys. What we want is for users who download this dataset to be able to use it easily. If the dataset requires users to jump through hoops, it's not a good fit for gensim-data. The experience of applying / using a dataset has to be streamlined and intuitive, including access and code (not just data). That is why we created this repo, and it's a mandatory part of each new contribution. @hscells and @matanster what does this extra step mean for users? Can we somehow integrate it directly, so it's transparent to people who want to use cui2vec? Is it necessary? |
The CUI in cui2vec stands for Concept Unique Identifier. A CUI is an identifier for all of the types of synonyms for a particular medical string. The dataset which I described in my comment is a mapping of CUI to the most commonly used string in the UMLS meta-thesaurus. One may simply replace the CUIs in the pre-trained vector file with terms from this mapping file (although I believe not all CUIs are mapped because the semantic types of the strings were filtered in this particular dataset). One may use QuickUMLS or MetaMap to map a term to a CUI, then using the method described above map the CUI to the most commonly used term in UMLS or MetaMap. I'm not exactly sure how the demo in the OP is mapping CUIs to strings, but I believe this is most likely how it would be done. In terms of how it could be integrated @piskvorky, the original data could be modified or this mapping could be performed in a separate step, however like I said, due to the relationship between CUI and the strings associated with that concept (one-to-many) this mapping would preferably be performed as two separate steps. |
No problem, as long as the process is clearly described to users, and the dataset ready-to-use out of the box. |
Just curious, any progress on this issue? |
Hi, any body knows if the dataset 'cui2vec' is available?? |
@souravsingh can i load the CSV in gensim? Can you tell me how to do that. |
Hi everyone, I am lead author on this paper. Apologies for the radio silence on this request. We are currently working on a revision to the paper/approach that we hope to release this month. I will check back in and try to make it gensim compatible at that time. |
@juancq @andresrosso sorry for waiting, I can't say when this will be added |
@beamandrew great, thanks! |
Is there any model using snowmed CT data? |
Please share the source code for the evaluation metrics used in this work. I would like to evaluate my own embedding trained on EHRs. Thanks in advanced. |
@andresrosso
Source: https://stackoverflow.com/questions/46297740/how-to-turn-embeddings-loaded-in-a-pandas-dataframe-into-a-gensim-model (Ken Syme's answer) |
Great work, thanks a lot. |
The embeddings for over 100k medical concepts using data from 60 million patients, 1.7 million journal articles and 20 million notes is up, available here- https://figshare.com/s/00d69861786cd0156d81
Explorer available here- http://ec2-52-14-191-192.us-east-2.compute.amazonaws.com:1234/
The text was updated successfully, but these errors were encountered: