Analyze Data using Hadoop tools from within the R environment (e.g. MapReduce)
Documentation: http://saptarshiguha.github.com/RHIPE/
Binary downloads: http://ml.stat.purdue.edu/rhipebin/
Many! And better documentation to come soon. Very importantly, rhmr
has gone and is now replaced by rhwatch
.
Though the warning says ‘use the same arguments’, this is not true. Here is a quick conversion
The parameters ifolder
, ofolder
and inout
have gone.
- For sequence input and sequence output,
rhwatch(, input=path-to-input, output=path-to-output)
- For text input and sequence output,
rhwatch(, input=rhfmt(path-to-input,type
‘text’), output=path-to-output= - For text output, sequence input,
rhwatch(, input=path-to-input, output=rhfmt(path-to-output, type
‘text’)= - For the mapreduce equivalent of
lapply(1:N, F)
, dorhwatch(map=rhmap({ rhcollect(k, F(k) )}), input=N)
for documentation regarding these , ask the google groups mailing list.
And there is support for reading from HBase, writing to HBase too(experimental).