TextBench

A simple benchmark for text processing comparing the speeds of Julia, Go, C, and Python, in particular to report on this Julia issue. In all cases, the programs read in text8 and print the words that occur at least 10 times in order of number of occurances.

The C Code is taken from the GloVE project, covered under the Apache License, a copy of which is included in the C code.

The example corpus is the text8 corpus from Matt Mahoney [more info]

As downloaded, text8 is a single line, so I do two versions of each test, one on the original text8 and the second on a fmt'ed version of text8 with the text broken up into reasonable line widths. Results on my machine available in results.txt.

To run the benchmarks, it should be enough to issue make.

The results in results.txt were generated on a quad core x86 machine running Ubuntu 13.04.

Thanks to @remusao for the haskell, cpp, and python3 versions as well as general improvements.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
results.txt		results.txt
vocab.c		vocab.c
vocab.cc		vocab.cc
vocab.go		vocab.go
vocab.hs		vocab.hs
vocab.jl		vocab.jl
vocab.py		vocab.py
vocab.rs		vocab.rs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TextBench

About

Releases

Packages

Contributors 2

Languages

License

alexalemi/textbench

Folders and files

Latest commit

History

Repository files navigation

TextBench

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages