-
Notifications
You must be signed in to change notification settings - Fork 277
user>plyrmr>Home
This R package enables the R user to perform common data manipulation operations, as found in popular packages such as plyr
and reshape2
, on very large data sets stored on Hadoop. Like rmr, it relies on Hadoop mapreduce to perform its tasks, but it provides a familiar plyr-like interface while hiding many of the mapreduce details. plyrmr
provides:
- Hadoop-capable equivalents of well known data.frame functions:
transmute
andbind.cols
generalize overtransform
andsummarize
;select
fromdplyr
;melt
anddcast
fromreshape2
; sampling, quantiles, counting and more. - Simple but powerful ways of applying many functions operating on data frames to Hadoop data sets:
gapply
andmagic.wand
. - Simple but powerful ways to group data:
group
,group.f
,gather
andungroup
. - All of the above can be combined by normal functional composition: delayed evaluation helps mitigating any performance penalty of doing so by minimizing the number of Hadoop jobs launched to evaluate an expression.
The current version has a major release number of zero (0.x.y). As the numbering suggests, the package should be considered work in progress and the API is not cast in stone yet. We seek feedback at an early stage to drive further development. This package has a Github repo, please feel free to enter an issue there to discuss problems, existing or missing features and what not (anything that requires an answer from the developers). For general discussions head to the RHadoop forum.
- rmr 3.2.0 or higher.
-
plyrmr
installed on each node of a Hadoop cluster together with its dependencies (see the DESCRIPTION file,depends:
andimports:
lines). The packagememoise
requires special instructions. First load the packagedevtools
. For memoise, issue this command at the R prompt:install_github("RevolutionAnalytics/memoise")
. The reason is that its maintainer, the excellent @hadley, would not accept our pull-request for no particular reason, nor he plans to submit to CRAN in the foreseeable future. Hence we were forced into a, hopefully temporary, fork.
To download plyrmr
see Releases.
- A Tutorial