This repository contains the source code for asynchronously scraping science papers from the bioRxiv and PubMed, for subsequent conversion of this data into word embeddings Word2Vec for semantic and syntactic similarity words from this data, relations with other words, etc., followed by visualization of hight-dimensional Word2Vec word embeddings using t-SNE.
Expected Input - links to an articles of the format:
Pubmed
bioRxiv
Expected Output - images with visualization and scraped data recorded in PDF format.
demo.ipynb
EDA-NLP-PubMed-and-bioRxiv.ipynb
PubMed - https://www.ncbi.nlm.nih.gov/pubmed/?term=connectome
2,372 Results.
bioRxiv - https://www.biorxiv.org/search/Connectome
4568 Results.
PubMed bioRxiv
Words Cloud
Similar Words
42 Results.
PubMed - https://www.ncbi.nlm.nih.gov/pubmed/?term=coronavirus+2019-nCoV
bioRxiv - https://www.biorxiv.org/search/coronavirus%252B2019-nCoV
72 Results.
PubMed bioRxiv
Words Cloud
Similar Words
PubMed - https://www.ncbi.nlm.nih.gov/pubmed/?term=Schizophrenia
5,000 out of 141,489 Results.
bioRxiv - https://www.biorxiv.org/search/Schizophrenia
4,102 Results.
PubMed bioRxiv
Words Cloud
Similar Words
PubMed - https://www.ncbi.nlm.nih.gov/pubmed/?term=raspberry+pi
143 Results.
bioRxiv - https://www.biorxiv.org/search/raspberry%252Bpi
184 Results.
PubMed bioRxiv
Words Cloud
Similar Words
PubMed - https://www.ncbi.nlm.nih.gov/pubmed/?term=Electroactive+Polymers
860 Results.
bioRxiv - https://www.biorxiv.org/search/Electroactive%252BPolymers%252B
13 Results.
PubMed bioRxiv
Words Cloud
Similar Words