Word Count Generator

Author

Name: Sahil Nagpal
Email: [email protected]

Execution Commands

Serial Execution:

python word_count.py <predefined words file path> <sample text file path>

Parallel Execution:

python word_count_parallel.py <predefined words path> <sample file path> <number of processes>

Testing Process

Used the 10000 words list provided by Google for the predefined words (words.txt) (Google 10000 English Words List).
Used the Kaggle random sentences dataset for the sample text (sample.txt). Repeated the 724 sentences to create a file size of 20MB. (Kaggle Random Sentences Dataset)
Executed the Python code (both serial and parallel) on these files to get the word count.
Compared the resulting word count with "Find" results in Sublime Text.
Compared the results between the different execution methods by exporting them to text files and comparing the outputs.
Benchmarked the performance of the code by calculating the execution time using the time module.

Performance

Serial execution performs better due to the small size of inputs. The cost of dictionary aggregation and thread overhead causes the parallel method to perform poorly. However, with larger inputs, parallel execution is expected to perform better.

Assumptions

The predefined words are proper English words containing only alphabets.
Each line in the predefined words file contains only one word.
The Pandas library is installed in Python (used for displaying the final word count).
The final result prints the words in the same casing as the predefined words file.
The sample text file does not contain contractions (e.g., "it's" for "it is").

Possible Improvements

Use MapReduce for larger files.
Include contractions by modifying the regular expression to handle single quotes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Word Count Generator

Author

Execution Commands

Serial Execution:

Parallel Execution:

Testing Process

Performance

Assumptions

Possible Improvements

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
sample.txt		sample.txt
word_count.py		word_count.py
word_count_parallel.py		word_count_parallel.py
words.txt		words.txt

sahil2598/Word-Count-Generator

Folders and files

Latest commit

History

Repository files navigation

Word Count Generator

Author

Execution Commands

Serial Execution:

Parallel Execution:

Testing Process

Performance

Assumptions

Possible Improvements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages