Topic_Modelling

LDA and NMF applied to two different data sets.

The two datasets in question are:

Task overview: Apply LDA and NMF to both datasets and compare the results of a large dataset (say 100K entries) versus a smaller dataset (1K).

Data Extraction Web scraping for the first data set and database access for the second data set are the points of attack for this.
Data Analysis Latent Dirichlet Allocation and Non-negative Matrix Factorization are easy to apply with scikit-learn. More effort goes into actually extracting and cleaning the data.
Data Visualization This will be in table form since the results are meant to describe probability distributions.

Some caveats include: the lack of simple inferencing techniques for NMF.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.ipynb_checkpoints		.ipynb_checkpoints
ScrapyFiles		ScrapyFiles
.DS_Store		.DS_Store
README.md		README.md
Topic_Modelling_LDA_NMF.ipynb		Topic_Modelling_LDA_NMF.ipynb
random_wiki_article.txt		random_wiki_article.txt
strat_articles.json		strat_articles.json

Provide feedback