WebScraping_TextMining

This repository includes some of my practicing notebooks focusing on webscraping and text mining on patent citation and abstracts.

'WebScrapingandAPI_patentcitationdata': Using web scraping and APIs (Pandas, BeautifulSoup, google-patent-api) to collect citation data about GAA patents.

'WebScraping_PatentAbstracts_CPCcodes': Using web scraping technique to collect patent abstracts and cpc codes, focusing on patents related to 'renewable energy'.

'TextMining_spaCyTokenizer_TopicModeling_TextClassification': Build customized tokenizer with spaCy. Integrate customized tokenizer to scikit learn countvectorizer to create bag of words (BoW) and tf-idf. Perform Topic Modeling with tf-idf. Build Logistic Regression and Support Vector Classifier to predict patent cpc code and compare test accuracy between models using BoW and tf-idf as input features.

'TextGraph_Token&Document': Use Bag of Words and Tf-Idf to generate token-token graph and document-document graph.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
GAA_Patent_Inventor_Network.ipynb		GAA_Patent_Inventor_Network.ipynb
GAA_patents_clean.csv		GAA_patents_clean.csv
README.md		README.md
RenewableEnergy_patents_2023_abstract_cpc.csv		RenewableEnergy_patents_2023_abstract_cpc.csv
RenewableEnergy_patents_clean.csv		RenewableEnergy_patents_clean.csv
TextGraph_Token&Document.ipynb		TextGraph_Token&Document.ipynb
TextMining_spaCyTokenizer_TopicModeling_TextClassification.ipynb		TextMining_spaCyTokenizer_TopicModeling_TextClassification.ipynb
WebScraping_PatentAbstracts_CPCcodes.ipynb		WebScraping_PatentAbstracts_CPCcodes.ipynb
WebscrapingAndAPI_PatentCitationData.ipynb		WebscrapingAndAPI_PatentCitationData.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WebScraping_TextMining

About

Releases

Packages

Languages

AidenJiang01/WebScraping_TextMining

Folders and files

Latest commit

History

Repository files navigation

WebScraping_TextMining

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages