This twitter analyzer is able to determine 20 topics clusters grouping them by most popular terms. Latent Dirichlet Allocation was used as a model.
- Install Cassandra
- Launch Cassanda
- Open terminal
- Type
cassandra -f
Twits are going to be collected for one hour. This option could be changed for streamingContext in TwitterRunnner.scala.
git clone https://github.com/KKhanda/twitter-analyser.git
cd twitter-analyzer
sbt run
I have provided dump with about 11 thousand twits in repository.
If you want to use it, you should upload dump into Cassandra.
Here is the command, which should be executed from cqlsh
when you are in project root folder:
COPY twits.message FROM './data/twits-data.csv' WITH DELIMITER = ',' AND QUOTE = '"' AND NULL = '';