-
Notifications
You must be signed in to change notification settings - Fork 38
EclairJS Examples
The following examples are using what is currently available under the EclairJS project. Please consult the API Documentation for more information and frequent updates.
Although a more basic use of Spark, counting how many times certain words appear in a data source can be quite useful.
This first example is a basic "hello world" Spark example which is counting words from a static text file. It is using a one-time "batch" call to Spark, where the file is read and transformations are applied once and results are returned. EclairJS can also run in live "streaming" mode (see Streaming Example).
// Static file should be some file on your system
var file = "src/test/resources/dream.txt";
// Create an instance of a Spark configuration for your application
var conf = new SparkConf().setAppName("JavaScript word count")
.setMaster("local[*]");
// Create your Spark context
var sparkContext = new SparkContext(conf);
// Load the file into your SparkContext (this creates an RDD)
var rdd = sparkContext.textFile(file).cache();
// Create an array out of the words in the file
var rdd2 = rdd.flatMap(function(sentence) {
return sentence.split(" ");
});
// Filter out any empty strings
var rdd3 = rdd2.filter(function(word) {
return word.trim().length > 0;
});
// Create a map out of the word and a variable that will be used to count
// the number of times it occurs
var rdd4 = rdd3.mapToPair(function(word) {
return [word, 1];
});
// Converge any duplicates
var rdd5 = rdd4.reduceByKey(function(a, b) {
return a + b;
});
// Create a tuple out of the results
var rdd6 = rdd5.mapToPair(function(tuple) {
return [tuple[1]+0.0, tuple[0]];
})
// Sort the results so those with the most occurrences float to the top
var rdd7 = rdd6.sortByKey(false);
// Print out the top 10 most occurring words found
print("top 10 words = " + rdd7.take(10));
Running the above example in the Notebook environment requires some minor changes to the code. You can run the entire code in one Notebook cell or split them up.
// Static file should be some file on your system
var file = "src/test/resources/dream.txt";
// Create your Spark context - no need for a configuration in the Notebook
var sparkContext = new SparkContext();
// Load the file into your SparkContext (this creates an RDD)
var rdd = sparkContext.textFile(file).cache();
// Create an array out of the words in the file
var rdd2 = rdd.flatMap(function(sentence) {
return sentence.split(" ");
});
// Filter out any empty strings
var rdd3 = rdd2.filter(function(word) {
return word.trim().length > 0;
});
// Create a map out of the word and a variable that will be used to count
// the number of times it occurs
var rdd4 = rdd3.mapToPair(function(word) {
return [word, 1];
});
// Converge any duplicates
var rdd5 = rdd4.reduceByKey(function(a, b) {
return a + b;
});
// Create a tuple out of the results
var rdd6 = rdd5.mapToPair(function(tuple) {
return [tuple[1]+0.0, tuple[0]];
})
// Sort the results so those with the most occurrences float to the top
var rdd7 = rdd6.sortByKey(false);
// Print out the top 10 most occurring words found.
// Currently print() will output to the console where your
// started the notebook so we just eval the results:
JSON.stringify(rdd7.take(10))