GitHub - chaithra-yenikapati/BigData-CASM

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Bombayrelated		Bombayrelated
London Related		London Related
correlations		correlations
nyrelated		nyrelated
src		src
readme.txt		readme.txt

Repository files navigation

The data of three stock exchanges is stores in three folders, London related, NY related and Bombay related. These folders have the input datasets, map reduce code we wrote for profiling, output we got after profiling, hive output and index values of the respective stock exchanges.

The complete source code of the project is in the src folder.

Output of the final analytic, the values of correlations is in the folder named correlations.

Steps of execution:

Profiling:

The input data of three cities should be given as input to respective profiling codes. after this step, we have three folders with filtered data.

Hive for computing market capitalization:

Run the code for hive, HiveCreateTable.java by giving the profiling output as input path to this file and add all the jars that are provided in the jars for hive folder. This creates tables in hive. Then run HiveWriteTable.java. This writes table data to files. Put all the files in a single directory.

Computing Index:

This directory is given as input to the next map reduce code that is in index calculation folder. The output part-r-00000 file contains index values of the stock market for each input day.

Correlation:
For Correlation, we used pearson’s correlation coefficient.
Put the index files of any given pair of stock exchanges in a directory and give this as input to correlation mapreduce job. Create a project with 5 files in the correlation folder to run this map reduce code. This comprises of 2 map reduce jobs. The output1 folder contains the values of correlation of those stock exchanges for each year in the dataset.