Skip to content

Creation of data processing infrastructure based on CoreOS & Docker

Notifications You must be signed in to change notification settings

mserrate/CoreOS-BigData

Repository files navigation

That's basically a copy of: https://github.com/endocode/CoreOS-Kafka-Storm-Cassandra-cluster-demo adapted to my needs

##CoreOS instances

Install Vagrant (>= 1.6) and VirtualBox, then run

git clone https://github.com/mserrate/CoreOS-BigData
cd CoreOS-Kafka-Storm-Cassandra-cluster-demo/coreos-vagrant
vagrant up

Vagrant will create three CoreOS VMs with the following IPs: 172.17.8.101, 172.17.8.102, 172.17.8.103

Then login to your first Vagrant instance and submit fleet units:

vagrant ssh core-01 -- -A
fleetctl submit share/*.service share/*.mount

##Load the persisted data storage

fleetctl load media-storage.mount

##Run Zookeeper cluster

fleetctl start zookeeper@{1..3}.service

##Run Kafka cluster

fleetctl start kafka@{1..3}.service

##Run Cassandra luster

fleetctl start cassandra@{1..3}.service

##Run Storm cluster

fleetctl start storm-nimbus.service
# storm-ui (not required) will listen on http://172.17.8.101:8080
fleetctl start storm-ui.service
fleetctl start storm-supervisor@{1..3}.service

##Run development container inside CoreOS VM (storm, kafka, maven, scala, python, zookeeper, cassandra, etc)

docker run --rm -ti -v /home/core/share:/root/share -e BROKER_LIST=`fleetctl list-machines -no-legend=true -fields=ip | sed 's/$/:9092/' | paste -s -d ','` -e NIMBUS_HOST=`etcdctl get /storm-nimbus` -e ZK=`fleetctl list-machines -no-legend=true -fields=ip | paste -s -d ','` mserrate/devel-env start-shell.sh bash

###Test Kafka cluster

Run these commands in devel-node container to test your Kafka cluster.

Create topic

$KAFKA_HOME/bin/kafka-topics.sh --create --topic test --partitions 3 --zookeeper $ZK --replication-factor 2

Show topic info

$KAFKA_HOME/bin/kafka-topics.sh --describe --topic test --zookeeper $ZK

Send some data to topic

$KAFKA_HOME/bin/kafka-console-producer.sh --topic test --broker-list="$BROKER_LIST"

Get some data from topic

$KAFKA_HOME/bin/kafka-console-consumer.sh --zookeeper $ZK --topic test --from-beginning

Remove topic (valid only with KAFKA_DELETE_TOPIC_ENABLE=true environment)

$KAFKA_HOME/bin/kafka-topics.sh --zookeeper $ZK --delete --topic test

###Cassandra

####Cassandra cluster CLI

cqlsh 172.17.8.101

####Cassandra queries

You can view and manage your Cassandra table's content with the following queries:

SELECT * FROM testkeyspace.meter_data;
#Delete content from table
TRUNCATE testkeyspace.meter_data;
SELECT COUNT(*) FROM testkeyspace.meter_data LIMIT 1000000;

####Storm topology

Take a look to https://github.com/mserrate/twitter-streaming-app for a sample topology


##Troubleshooting: To be able to use fleetctl ssh

#start user agent by typng:
eval $(ssh-agent)
#add the private key to the agent:
ssh-add

#if it's vagrant:
ssh-add ~/.vagrant.d/insecure_private_key
vagrant ssh core-01 -- -A

To shell a session on a running container:

#in this case container cassandra-1
sudo docker exec -i -t cassandra-1 bash

Restart unit:

sudo systemctl start [email protected]

To not type Vagrant password each time for shared folders:

#On MacOS
sudo visudo
#Place the following at the bottom of the file
Cmnd_Alias VAGRANT_EXPORTS_ADD = /usr/bin/tee -a /etc/exports
Cmnd_Alias VAGRANT_NFSD = /sbin/nfsd restart
Cmnd_Alias VAGRANT_EXPORTS_REMOVE = /usr/bin/sed -E -e /*/ d -ibak /etc/exports
%admin ALL=(root) NOPASSWD: VAGRANT_EXPORTS_ADD, VAGRANT_NFSD, VAGRANT_EXPORTS_REMOVE

About

Creation of data processing infrastructure based on CoreOS & Docker

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published