Skip to content

pivie/voting

 
 

Repository files navigation

#Vote_Counter

Rscript Usage

use example/graph.R to draw graphs with the data generated by benchmark

Benchmark Usage

data_generator.py

''' data_generator.py zipf_argument(>=1) vote_number file_number file_zipf_distribution_argument file_prefix '''

bench_worker.py

''' bench_worker.py execution_times log_file custom_column costom_value arguments_pass_to_vote_counter

'''

benchmark.py

inside are 3 measurements, try edit it and then you could get other measurements here.

Compile

	make

Usage

	vote_counter [-dupvote] [filename1] ... [filenameN]

params:

* -dupvote	If someone votes more than once, all votes made by him/her will be invalid.

Design & Details

The bottle-neck of this program is, even you use multi-threads to do the counting, you need to merge the results generated by each thread, and that need mutex and shared memory. The easiest way is using only one lock for the entire shared memory. With this approach, when a thread is writing, all other threads need to wait until it finish writing, and that makes the performance of multi-thread version just like the single-thread one.

Hence, if the programmer could eliminate or at least reduce the waiting time, or collision, of each thread, there would be a great improvement in performance. This program does not use a single hash map to store all the vote results, instead, it breaks the entire results into small, unrealted hash maps, and use a simple hash function to map vote and the result chunk it belongs to.

A simple example is as follows:

  • the input data is
	1,2
	2,3
	3,4
	4,5
  • the hash-function is vote & HASH_MASK (this HASH_MASK is a magic number and can be adjusted by editing constants.h)
  • the result hashmap is an array with HASH_NUM hash map elements, each element stores a part of the result.

if there is 2 thread, one deal with 1,2,2,22 and one deal with 3,4,4,5, then thread 1 will access hashmap_list[2] and thread2 will access hashmap_list[4],hashmap_list[5]. Because each element in hashmap list got a lock, so there will be less waiting time than previous implementation. According to the test, when the data-set is 7MB, the "hashmap_list" is twice as fast as the single hash map implementation.

Malformed Input Detection

In this program, we are dealing with the following errors:

  • No input paramater
  • Unsupported arguments
  • Invalid file/directory
  • Invalid data in the file (lacking of comma, or the int32 out-of-range)

Libraries

Reference

Releases

No releases published

Packages

No packages published

Languages

  • C 86.7%
  • Python 6.5%
  • R 6.2%
  • C++ 0.6%