Skip to content
Luke Bakken edited this page Oct 5, 2016 · 1 revision

Map/Reducing Luwak File Data

Contributed By: Bryan Fink

Source File on GitHub

Description and Usage

The primary tool in this module is a function that conforms to the interface for "dynamic map/reduce inputs." This function will allow you to set up a map/reduce process for running a computation across the blocks of a Luwak file.

To use the function via the Erlang client:

C:mapred({modfun, luwak_mr, file, <<"my_file_name">>},
         [... your query ...]).

Over HTTP, structure your JSON query like:

{"inputs":{"module":"luwak_mr",
           "function":"file",
           "arg":"my_file_name"},
 "query":[... your query ...]}

The luwak_mr:file/3 function will send an input to the map/reduce query for each block in the file. The "KeyData" for the block will be its offset in the file. As a trivial example, you might use this to get an ordered list of the first byte of each block like so:

F = fun(B, O, _) ->
       <<Y, _/binary>> = luwak_block:data(B),
       [{Y, O}]
    end,
{ok, Bytes} = C:mapred({modfun,luwak_mr,file,<<"name">>},
                       [{map, {qfun, F}, none, true}]),
OrderedBytes = lists:keysort(2, Bytes),
[ Y || {Y, _} <- OrderedBytes.

Installation

Before using the luwak_mr module, you'll need to build it, and add it to the code path on your Riak nodes.

To build luwak_mr with Rebar, add the module to your project, and then add then tell rebar to add luwak's include path to the Erlang compiler options. In rebar.config, add:

{erl_opts, [{i, "/path/to/riak/lib/luwak-1.0.0/include"}]}.

To build luwak_mr with straight erlc commands, add luwak's include path with the -I option:

erl -I/path/to/riak/lib/luwak-1.0.0/include luwak_mr.erl

Once luwak_mr.beam is built, add the directory it's in to Riak's code path. Edit Riak's app.config, find the riak_kv section, and add and add_paths parameter. For example, if you compiled luwak_mr.beam to /foo/luwak_mr/ebin/luwak_mr.beam, then your app.config should read:

{riak_kv,
 ... other options ...
 {add_paths, ["/foo/luwak_mr/ebin"]}
}

Or, if your Riak nodes are already running, connect to each node's console, and execute the following:

([email protected])> code:add_path("/foo/luwak_mr/ebin").