#Vote_Counter
use example/graph.R
to draw graphs with the data generated by benchmark
''' data_generator.py zipf_argument(>=1) vote_number file_number file_zipf_distribution_argument file_prefix '''
''' bench_worker.py execution_times log_file custom_column costom_value arguments_pass_to_vote_counter
'''
inside are 3 measurements, try edit it and then you could get other measurements here.
make
vote_counter [-dupvote] [filename1] ... [filenameN]
params:
* -dupvote If someone votes more than once, all votes made by him/her will be invalid.
The bottle-neck of this program is, even you use multi-threads to do the counting, you need to merge the results generated by each thread, and that need mutex and shared memory. The easiest way is using only one lock for the entire shared memory. With this approach, when a thread is writing, all other threads need to wait until it finish writing, and that makes the performance of multi-thread version just like the single-thread one.
Hence, if the programmer could eliminate or at least reduce the waiting time, or collision, of each thread, there would be a great improvement in performance. This program does not use a single hash map to store all the vote results, instead, it breaks the entire results into small, unrealted hash maps, and use a simple hash function to map vote and the result chunk it belongs to.
A simple example is as follows:
- the input data is
1,2
2,3
3,4
4,5
- the hash-function is
vote & HASH_MASK
(this HASH_MASK is a magic number and can be adjusted by editingconstants.h
) - the result hashmap is an array with
HASH_NUM
hash map elements, each element stores a part of the result.
if there is 2 thread, one deal with 1,2
,2,22
and one deal with 3,4
,4,5
, then thread 1 will access hashmap_list[2]
and thread2 will access hashmap_list[4]
,hashmap_list[5]
. Because each element in hashmap list got a lock, so there will be less waiting time than previous implementation. According to the test, when the data-set is 7MB, the "hashmap_list" is twice as fast as the single hash map implementation.
In this program, we are dealing with the following errors:
- No input paramater
- Unsupported arguments
- Invalid file/directory
- Invalid data in the file (lacking of comma, or the int32 out-of-range)
- The GNU C Library Reference Manual
- POSIX Threads Programming
- C++ Reference
- Advanced Programming in Unix Environment, 2nd