Hagidoop

Hagidoop is a lightweight version of Apache Hadoop developed by Jules Aubry and Robin Augereau (myself) as part of our Computer Science Engineering Degree at École Nationale Supérieure d'Électrotechnique, d'Électronique, d'Informatique, d'Hydraulique et des Télécommunications (ENSEEIHT).

User Manual

The codes HdfsClient, HdfsServer, JobLauncher, Worker can be used independently.

For testing purposes, the scripts directory contains 2 scripts:

aio.sh: launches an instance of HdfsServer and WorkerImpl on each machine described in the configuration file located in src/config/main.cfg, after copying the Java project and compiling it remotely on one of the machines.
down.sh: stops the instances of WorkerImpl and HdfsServer on each machine and cleans the /tmp/data directory.

NB: These scripts only work with machines from ENSEEIHT for now.

The configuration file can be modified as follows:

The first line contains the names of the machines.
The second line contains the HDFS ports of the servers.
The third line contains the RMI ports of the workers.
The fourth line contains the size of the fragments in characters.

On each line, values must be separated by commas.

It is necessary to modify the name of the main machine in JobLauncher.java line 25 with the one from which you launch the usage scripts.

Performance Evaluations

To evaluate the performance of this data distribution system, we conducted a series of tests:

Variations in the number of machines

This figure shows the variation of the execution time of Hagidoop based on the number of machines used for fragment distribution. The size of the files and fragments was kept constant. It can be observed that distributing the workload across a greater number of computers results in faster execution. This confirms the relevance of the adage "divide and conquer."

Variations in file size

This graph represents the evolution of the size of the main files sent to HDFS. A linear trend is observable, which seems logical given that the execution time evolves linearly with the size of the files. The larger the file, the longer the execution time. The slope of the curve, calculated at 1.303, confirms this linear relationship between file size and execution time.

Variations in fragment size

This graphical representation examines the variation in the size of the fragments sent for processing to the machines. It can be observed that the larger the size of the fragments, the faster the execution. This can be interpreted by considering that increasing the size of the fragments leads to a decrease in the number of threads on the different machines, thereby freeing up more resources. However, it is interesting to note that simply creating more threads of smaller size does not necessarily accelerate execution. It is crucial to find an optimal balance between the number of threads and the size of the fragments.

Name		Name	Last commit message	Last commit date
Latest commit History 121 Commits
.idea		.idea
data		data
doc		doc
scripts		scripts
src		src
.classpath		.classpath
.gitignore		.gitignore
.project		.project
README.md		README.md
hagidoop.iml		hagidoop.iml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hagidoop

User Manual

Performance Evaluations

Variations in the number of machines

Variations in file size

Variations in fragment size

About

Releases

Packages

Languages

SGDkklo/hagidoop

Folders and files

Latest commit

History

Repository files navigation

Hagidoop

User Manual

Performance Evaluations

Variations in the number of machines

Variations in file size

Variations in fragment size

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages