Kafka-Spark_Streaming

Scala program for Spark streaming from Kafka Topics

This is a scala program does the spark streaming from a Kafka topic using Kafka utils createstream api. It collects data every 5 seconds, each RDD from returned from kafka utils is saved as JSON files in HDFS location. These can be go to Hive external tables to view the data in tables.

If this data is to feed to visualization tools, hive hcatalog jar file should be initiated manually for every session, and some visualization tools like Power Bi do not have feature to run "initial SQL" which Tableau has, after establishing connection to Hive through ODBC connectors.

To initialize custom/standard jar files when a session is open, follow the instructions given in the below cloudera link.

https://www.cloudera.com/documentation/enterprise/5-6-x/topics/cm_mc_hive_udf.html#concept_nc3_mms_lr

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
BookingPassengers		BookingPassengers
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kafka-Spark_Streaming

About

Releases

Packages

Languages

NithK45/Kafka-Spark_Streaming

Folders and files

Latest commit

History

Repository files navigation

Kafka-Spark_Streaming

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages