Skip to content

Working With Lambda Functions

Doron Rosenberg edited this page Aug 5, 2016 · 3 revisions

Several Spark APIs take in lambda (also called anonymous) functions as arguments, which will get executed on the Spark worker nodes.

Passing variables to Lambdas

Lets say we want to filter out a specific word from an RDD and the specific word can't be hardcoded in the Lambda function. In EclairJS, we have added an additional argument called bindArgs just for this case. bindArgs is an array of values that will get appended to the arguments of the Lambda function:

var wordToFilterOut = 'foo;'

var filteredRDD = rdd.filter(function(word, wordToFilterOut) {
  return word.trim() !== wordToFilterOut;
}, [wordToFilterOut]);

Creating Spark class instances in Lambdas

As of version 0.4 of EclairJS, if a Lambda function needs to generate a Spark class instance, the Spark class needs to be passed in via the bindArgs array.

For example, in RDD.mapToPair, to generate a Pair we need to create a Tuple like this:

var eclairjs = require('eclairjs');

...

var pairRDD = rdd.mapToPair(function(word, Tuple) {
  return new Tuple2(word.toLowerCase(), 1);
}, [eclairjs.Tuple2]); 

This is done for performance reasons.