Rosetta Code

Word Count

Pig

tweets = LOAD 'tweets.tsv' AS (text:chararray);
words = FOREACH tweets GENERATE FLATTEN(TOKENIZE(text)) AS word;
word_groups = GROUP words BY word;
word_counts = FOREACH word_groups GENERATE COUNT(words) AS count, GROUP AS word;

STORE word_counts INTO 'word_counts.tsv';

Scalding

Tsv("tweets.tsv", 'text)
  .flatMap('text -> 'word) { line : String => line.split("\\s+") }
  .groupBy('word) { _.size }
  .write(Tsv("word_counts.tsv"))

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rosetta Code

Word Count

Pig

Scalding

Contents

Getting help

Documentation

Matrix API

Third Party Modules

Videos

How-tos

Tutorials

Articles

Other

Clone this wiki locally