-
Notifications
You must be signed in to change notification settings - Fork 38
Working With Lambda Functions
Doron Rosenberg edited this page Aug 5, 2016
·
3 revisions
Several Spark APIs take in lambda (also called anonymous) functions as arguments, which will get executed on the Spark worker nodes.
Lets say we want to filter out a specific word from an RDD and the specific word can't be hardcoded in the Lambda function. In EclairJS, we have added an additional argument called bindArgs
just for this case. bindArgs
is an array of values that will get appended to the arguments of the Lambda function:
var wordToFilterOut = 'foo;'
var filteredRDD = rdd.filter(function(word, wordToFilterOut) {
return word.trim() !== wordToFilterOut;
}, [wordToFilterOut]);
As of version 0.4
of EclairJS, if a Lambda function needs to generate a Spark class instance, the Spark class needs to be passed in via the bindArgs
array.
For example, in RDD.mapToPair
, to generate a Pair
we need to create a Tuple
like this:
var eclairjs = require('eclairjs');
...
var pairRDD = rdd.mapToPair(function(word, Tuple) {
return new Tuple2(word.toLowerCase(), 1);
}, [eclairjs.Tuple2]);
This is done for performance reasons.