Skip to content

EclairJS Examples

jtbarbetta edited this page Nov 12, 2015 · 3 revisions

EclairJS Examples

The following examples are using what is currently available under the EclairJS project. Please consult the API Documentation for more information and frequent updates.

Wordcount Example

Although a more basic use of Spark, counting how many times certain words appear in a data source can be quite useful.

This first example is a basic "hello world" Spark example which is counting words from a static text file. It is using a one-time "batch" call to Spark, where the file is read and transformations are applied once and results are returned. EclairJS can also run in live "streaming" mode (see Streaming Example).

// Static file should be some file on your system
var file = "src/test/resources/dream.txt";
// Create an instance of a Spark configuration for your application
var conf = new SparkConf().setAppName("JavaScript word count")
                          .setMaster("local[*]");
// Create your Spark context
var sparkContext = new SparkContext(conf);
// Load the file into your SparkContext (this creates an RDD)
var rdd = sparkContext.textFile(file).cache();
// Create an array out of the words in the file
var rdd2 = rdd.flatMap(function(sentence) {
    return sentence.split(" ");
});
// Filter out any empty strings
var rdd3 = rdd2.filter(function(word) {
    return word.trim().length > 0;
});
// Create a map out of the word and a variable that will be used to count
// the number of times it occurs
var rdd4 = rdd3.mapToPair(function(word) {
    return [word, 1];
});
// Converge any duplicates
var rdd5 = rdd4.reduceByKey(function(a, b) {
    return a + b;
});
// Create a tuple out of the results
var rdd6 = rdd5.mapToPair(function(tuple) {
    return [tuple[1]+0.0, tuple[0]];
})
// Sort the results so those with the most occurrences float to the top
var rdd7 = rdd6.sortByKey(false);
// Print out the top 10 most occurring words found
print("top 10 words = " + rdd7.take(10));

Wordcount Example (Jupyter Notebook)

Running the above example in the Notebook environment requires some minor changes to the code. You can run the entire code in one Notebook cell or split them up.

// Static file should be some file on your system
var file = "src/test/resources/dream.txt";
// Create your Spark context - no need for a configuration in the Notebook
var sparkContext = new SparkContext();
// Load the file into your SparkContext (this creates an RDD)
var rdd = sparkContext.textFile(file).cache();
// Create an array out of the words in the file
var rdd2 = rdd.flatMap(function(sentence) {
    return sentence.split(" ");
});
// Filter out any empty strings
var rdd3 = rdd2.filter(function(word) {
    return word.trim().length > 0;
});
// Create a map out of the word and a variable that will be used to count
// the number of times it occurs
var rdd4 = rdd3.mapToPair(function(word) {
    return [word, 1];
});
// Converge any duplicates
var rdd5 = rdd4.reduceByKey(function(a, b) {
    return a + b;
});
// Create a tuple out of the results
var rdd6 = rdd5.mapToPair(function(tuple) {
    return [tuple[1]+0.0, tuple[0]];
})
// Sort the results so those with the most occurrences float to the top
var rdd7 = rdd6.sortByKey(false);
// Print out the top 10 most occurring words found.  
// Currently print() will output to the console where your 
// started the notebook so we just eval the results:
JSON.stringify(rdd7.take(10))

Streaming Example

Coming Soon!