Skip to content

A library to analyze text in relation to well-known novels from Project Gutenberg.

License

Notifications You must be signed in to change notification settings

anlandu/conText

Repository files navigation

HistText

Objective

A library dedicated to downloading, analyzing, and comparing well-known texts on Project Gutenberg using natural language processing.

Use

pip install git+https://github.com/anlandu/histtext

Run scripts/GetGut.py to get the top 100 books on Project Gutenberg from the past 30 days as text files.

Run scripts/GetBookFromURL.py to get the text file of a specific book from Project Gutenberg. Takes string as command-line argument: the URL of the root directory of the book. e.g. http://www.gutenberg.org/ebooks/4300

Import src.AvgWordLen, src.AvgSentLen, or src.FreqPOS to use their methods. AvgWordLen enables you to get the average word length of a given file, analyze all files in the resources folder, and/or compare all stored word lengths to an inputted one. AvgSentLen does the same for sentence lengths. FreqPOS counts the number of times each part of speech is used in a text.

(Note that these three files assume the top 100 books are already downloaded using GetGut.py; alternatively, you can simply download your own texts of choice into the resources folder.)

Contributing

Contributions are more than welcome! Please see the CONTRIBUTING.md to learn how to contribute. Check out issues for some starting points!

LICENSE

This project is licensed under the MIT License.

About

A library to analyze text in relation to well-known novels from Project Gutenberg.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published