diff --git a/NEWS.md b/NEWS.md
index 55a7340090..659ebbf119 100644
--- a/NEWS.md
+++ b/NEWS.md
@@ -5,7 +5,8 @@
 * the rules for transformations passed to `select`/`select!`, `transform`/`transform!`,
   and `combine` have been made more flexible; in particular now it is allowed to
   return multiple columns from a transformation function
-  [#2461](https://github.com/JuliaData/DataFrames.jl/pull/2461)
+  ([#2461](https://github.com/JuliaData/DataFrames.jl/pull/2461) and
+  [#2481](https://github.com/JuliaData/DataFrames.jl/pull/2481))
 * CategoricalArrays.jl is no longer reexported: call `using CategoricalArrays`
   to use it [#2404]((https://github.com/JuliaData/DataFrames.jl/pull/2404)).
   In the same vein, the `categorical` and `categorical!` functions
diff --git a/docs/src/man/split_apply_combine.md b/docs/src/man/split_apply_combine.md
index 3bdc94bf62..e7a129009f 100644
--- a/docs/src/man/split_apply_combine.md
+++ b/docs/src/man/split_apply_combine.md
@@ -1,13 +1,29 @@
 # The Split-Apply-Combine Strategy
 
-Many data analysis tasks involve splitting a data set into groups, applying some
-functions to each of the groups and then combining the results. A standardized
-framework for handling this sort of computation is described in the paper
-"[The Split-Apply-Combine Strategy for Data Analysis](http://www.jstatsoft.org/v40/i01)",
-written by Hadley Wickham.
+Many data analysis tasks involve three steps:
+1. splitting a data set into groups,
+2. applying some functions to each of the groups,
+3. combining the results.
+
+Note that any of the steps 1 and 3 of this general procedure can be dropped,
+in which case we just transform a data frame without grouping it and later
+combining the result.
+
+A standardized framework for handling this sort of computation is described in
+the paper "[The Split-Apply-Combine Strategy for Data
+Analysis](http://www.jstatsoft.org/v40/i01)", written by Hadley Wickham.
 
 The DataFrames package supports the split-apply-combine strategy through the
-`groupby` function followed by `combine`, `select`/`select!` or `transform`/`transform!`.
+`groupby` function that creates a `GroupedDataFrame`,
+followed by `combine`, `select`/`select!` or `transform`/`transform!`.
+
+All operations described in this section of the manual are supported both for
+`AbstractDataFrame` (when split and combine steps are skipped) and
+`GroupedDataFrame`. Technically, `AbstractDataFrame` is just considered as being
+grouped on no columns (meaning it has a single group, or zero groups if it is
+empty). The only difference is that in this case the `keepkeys` and `ungroup`
+keyword arguments (described below) are not supported and a data frame is always
+returned, as there are no split and combine steps in this case.
 
 In order to perform operations by groups you first need to create a `GroupedDataFrame`
 object from your data frame using the `groupby` function that takes two arguments:
@@ -26,59 +42,107 @@ Operations can then be applied on each group using one of the following function
 
 All these functions take a specification of one or more functions to apply to
 each subset of the `DataFrame`. This specification can be of the following forms:
-1. standard column selectors (integers, symbols, vectors of integers, vectors of symbols,
+1. standard column selectors (integers, `Symbol`s, strings, vectors of integers,
+   vectors of `Symbol`s, vectors of strings,
    `All`, `Cols`, `:`, `Between`, `Not` and regular expressions)
 2. a `cols => function` pair indicating that `function` should be called with
-   positional arguments holding columns `cols`, which can be a any valid column selector
-3. a `cols => function => target_col` form additionally
-   specifying the name of the target column (this assumes that `function` returns a single
-   value or a vector)
-4. a `col => target_col` pair, which renames the column `col` to `target_col`
-5. a `nrow` or `nrow => target_col` form which efficiently computes the number of rows
-   in a group (without `target_col` the new column is called `:nrow`)
-6. several arguments of the forms given above, or vectors thereof
-7. a function which will be called with a `SubDataFrame` corresponding to each group;
+   positional arguments holding columns `cols`, which can be a any valid column selector;
+   in this case target column name is automatically generated and it is assumed that
+   `function` returns a single value or a vector; the generated name is created by
+   concatenating source column name and `function` name by default (see examples below).
+3. a `cols => function => target_cols` form additionally explicitly specifying
+   the target column or columns.
+4. a `col => target_cols` pair, which renames the column `col` to `target_cols`, which
+   must be single name (as a `Symbol` or a string).
+5. a `nrow` or `nrow => target_cols` form which efficiently computes the number of rows
+   in a group; without `target_cols` the new column is called `:nrow`, otherwise
+   it must be single name (as a `Symbol` or a string).
+6. vectors or matrices containing transformations specified by the `Pair` syntax
+   described in points 2 to 5
+8. a function which will be called with a `SubDataFrame` corresponding to each group;
    this form should be avoided due to its poor performance unless a very large
    number of columns are processed (in which case `SubDataFrame` avoids excessive
    compilation)
 
-As a special rule that applies to `cols => function` syntax, if `cols` is wrapped
-in an `AsTable` object then a `NamedTuple` containing columns selected by `cols` is
-passed to `function`.
-
-In all of these cases, `function` can return either a single row or multiple rows.
-`function` can always generate a single column by returning a single value or a vector.
-Additionally, if `combine` is passed exactly one `function`, `cols => function`,
-or `cols => function => outcol` as a first argument
-and `target_col` is not specified,
-`function` can return multiple columns in the form of an `AbstractDataFrame`,
-`AbstractMatrix`, `NamedTuple` or `DataFrameRow`.
+All functions have two types of signatures. One of them takes a `GroupedDataFrame`
+as the first argument and an arbitrary number of transformations described above
+as following arguments. The second type of signature is when a `Function` or a `Type`
+is passed as the first argument and a `GroupedDataFrame` as the second argument
+(similar to `map`).
+
+As a special rule, with the `cols => function` and `cols => function =>
+target_cols` syntaxes, if `cols` is wrapped in an `AsTable`
+object then a `NamedTuple` containing columns selected by `cols` is passed to
+`function`.
+
+What is allowed for `function` to return is determined by the `target_cols` value:
+1. If both `cols` and `target_cols` are omitted (so only a `function` is passed),
+   then returning a data frame, a matrix, a `NamedTuple`, or a `DataFrameRow` will
+   produce multiple columns in the result. Returning any other value produces
+   a single column.
+2. If `target_cols` is a `Symbol` or a string then the function is assumed to return
+   a single column. In this case returning a data frame, a matrix, a `NamedTuple`,
+   or a `DataFrameRow` raises an error.
+3. If `target_cols` is a vector of `Symbol`s or strings or `AsTable` it is assumed
+   that `function` returns multiple columns.
+   If `function` returns one of `AbstractDataFrame`, `NamedTuple`, `DataFrameRow`,
+   `AbstractMatrix` then rules described in point 1 above apply.
+   If `function` returns an `AbstractVector` then each element of this vector must
+   support the `keys` function, which must return a collection of `Symbol`s, strings
+   or integers; the return value of `keys` must be identical for all elements.
+   Then as many columns are created as there are elements in the return value
+   of the `keys` function. If `target_cols` is `AsTable` then their names
+   are set to be equal to the key names except if `keys` returns integers, in
+   which case they are prefixed by `x` (so the column names are e.g. `x1`,
+   `x2`, ...). If `target_cols` is a vector of `Symbol`s or strings then
+   column names produced using the rules above are ignored and replaced by
+   `target_cols` (the number of columns must be the same as the length of
+   `target_cols` in this case).
+   If `fun` returns a value of any other type then it is assumed that it is a
+   table conforming to the Tables.jl API and the `Tables.columntable` function
+   is called on it to get the resulting columns and their names. The names are
+   retained when `target_cols` is `AsTable` and are replaced if
+   `target_cols` is a vector of `Symbol`s or strings.
+
+In all of these cases, `function` can return either a single row or multiple
+rows. As a particular rule, values wrapped in a `Ref` or a `0`-dimensional
+`AbstractArray` are unwrapped and then treated as a single row.
 
 `select`/`select!` and `transform`/`transform!` always return a `DataFrame`
-with the same number of rows as the source.
-For `combine`, the shape of the resulting `DataFrame` is determined
-according to the following rules:
-- a single value produces a single row and column per group
-- a named tuple or `DataFrameRow` produces a single row and one column per field
-- a vector produces a single column with one row per entry
-- a named tuple of vectors produces one column per field with one row per entry in the vectors
-- a `DataFrame` or a matrix produces as many rows and columns as it contains;
-  note that this option should be avoided due to its poor performance when the number
-  of groups is large
-
-The kind of return value and the number and names of columns must be the same for all groups.
+with the same number and order of rows as the source (even if `GroupedDataFrame`
+had its groups reordered).
+
+For `combine`, rows in the returned object appear in the order of groups in the
+`GroupedDataFrame`. The functions can return an arbitrary number of rows for
+each group, but the kind of returned object and the number and names of columns
+must be the same for all groups, except when a `DataFrame()` or `NamedTuple()`
+is returned, in which case a given group is skipped.
 
 It is allowed to mix single values and vectors if multiple transformations
-are requested. In this case single value will be broadcasted to match the length
+are requested. In this case single value will be repeated to match the length
 of columns specified by returned vectors.
-As a particular rule, values wrapped in a `Ref` or a `0`-dimensional `AbstractArray`
-are unwrapped and then broadcasted.
-
-If a single value or a vector is returned by the `function` and `target_col` is not
-provided, it is generated automatically, by concatenating source column name and
-`function` name where possible (see examples below).
 
-We show several examples of the `by` function applied to the `iris` dataset below:
+To apply `function` to each row instead of whole columns, it can be wrapped in a
+`ByRow` struct. `cols` can be any column indexing syntax, in which case
+`function` will be passed one argument for each of the columns specified by
+`cols` or a `NamedTuple` of them if specified columns are wrapped in `AsTable`.
+If `ByRow` is used it is allowed for `cols` to select an empty set of columns,
+in which case `function` is called for each row without any arguments and an
+empty `NamedTuple` is passed if empty set of columns is wrapped in `AsTable`.
+
+There the following keyword arguments are supported by the transformation functions
+(not all keyword arguments are supported in all cases; in general they are allowed
+in situations when they are meaningful, see the documentation of the specific functions
+for details):
+- `keepkeys` : whether grouping columns should be kept in the returned data frame.
+- `ungroup` : whether the return value of the operation should be a data frame or a
+  `GroupedDataFrame`.
+- `copycols` : whether columns of the source data frame should be copied if no
+  transformation is applied to them.
+- `renamecols` : whether in the `cols => function` form automatically generated
+  column names should include the name of transformation functions or not.
+
+We show several examples of these functions applied to the `iris` dataset below:
 
 ```jldoctest sac
 julia> using DataFrames, CSV, Statistics
@@ -176,8 +240,8 @@ julia> combine(gdf, nrow, :PetalLength => mean => :mean)
 │ 2   │ Iris-versicolor │ 50    │ 4.26    │
 │ 3   │ Iris-virginica  │ 50    │ 5.552   │
 
-julia> combine([:PetalLength, :SepalLength] => (p, s) -> (a=mean(p)/mean(s), b=sum(p)),
-               gdf) # multiple columns are passed as arguments
+julia> combine(gdf, [:PetalLength, :SepalLength] => ((p, s) -> (a=mean(p)/mean(s), b=sum(p))) =>
+               AsTable) # multiple columns are passed as arguments
 3×3 DataFrame
 │ Row │ Species         │ a        │ b       │
 │     │ String          │ Float64  │ Float64 │
@@ -215,6 +279,14 @@ julia> combine(gdf, 1:2 => cor, nrow)
 │ 2   │ Iris-versicolor │ 0.525911                   │ 50    │
 │ 3   │ Iris-virginica  │ 0.457228                   │ 50    │
 
+julia> combine(gdf, :PetalLength => (x -> [extrema(x)]) => [:min, :max])
+3×3 DataFrame
+│ Row │ Species         │ min     │ max     │
+│     │ String          │ Float64 │ Float64 │
+├─────┼─────────────────┼─────────┼─────────┤
+│ 1   │ Iris-setosa     │ 1.0     │ 1.9     │
+│ 2   │ Iris-versicolor │ 3.0     │ 5.1     │
+│ 3   │ Iris-virginica  │ 4.5     │ 6.9     │
 ```
 
 Contrary to `combine`, the `select` and `transform` functions always return
@@ -268,7 +340,7 @@ julia> transform(gdf, :Species => x -> chop.(x, head=5, tail=0))
 │ 150 │ Iris-virginica │ 5.9         │ 3.0        │ 5.1         │ 1.8        │ virginica        │
 ```
 
-The `combine` function also supports the `do` block form. However, as noted above,
+All functions also support the `do` block form. However, as noted above,
 this form is slow and should therefore be avoided when performance matters.
 
 ```jldoctest sac
@@ -385,7 +457,7 @@ julia> combine(gd, valuecols(gd) .=> mean)
 │ 2   │ Iris-versicolor │ 5.936            │ 2.77            │ 4.26             │ 1.326           │
 │ 3   │ Iris-virginica  │ 6.588            │ 2.974           │ 5.552            │ 2.026           │
 
-julia> combine(gd, valuecols(gd) .=> (x -> (x .- mean(x)) ./ std(x)) .=> valuecols(gd))
+julia> combine(gd, valuecols(gd) .=> (x -> (x .- mean(x)) ./ std(x)), renamecols=false)
 150×5 DataFrame
 │ Row │ Species        │ SepalLength │ SepalWidth │ PetalLength │ PetalWidth │
 │     │ String         │ Float64     │ Float64    │ Float64     │ Float64    │
diff --git a/src/DataFrames.jl b/src/DataFrames.jl
index 408561bece..f3722f964d 100644
--- a/src/DataFrames.jl
+++ b/src/DataFrames.jl
@@ -107,6 +107,9 @@ include("abstractdataframe/join.jl")
 include("abstractdataframe/reshape.jl")
 
 include("groupeddataframe/splitapplycombine.jl")
+include("groupeddataframe/callprocessing.jl")
+include("groupeddataframe/fastaggregates.jl")
+include("groupeddataframe/complextransforms.jl")
 
 include("abstractdataframe/show.jl")
 include("groupeddataframe/show.jl")
diff --git a/src/abstractdataframe/selection.jl b/src/abstractdataframe/selection.jl
index ea66b52898..ba772786e4 100644
--- a/src/abstractdataframe/selection.jl
+++ b/src/abstractdataframe/selection.jl
@@ -10,6 +10,145 @@
 # 4) Pair{AsTable, <:Pair{<:Base.Callable, <:Union{Symbol, Vector{Symbol}, Type{AsTable}}}}
 # 5) Callable
 
+const TRANSFORMATION_COMMON_RULES =
+    """
+    Below detailed common rules for all transformation functions supported by
+    DataFrames.jl are explained and compared.
+
+    All these operations are supported both for
+    `AbstractDataFrame` (when split and combine steps are skipped) and
+    `GroupedDataFrame`. Technically, `AbstractDataFrame` is just considered as being
+    grouped on no columns (meaning it has a single group, or zero groups if it is
+    empty). The only difference is that in this case the `keepkeys` and `ungroup`
+    keyword arguments (described below) are not supported and a data frame is always
+    returned, as there are no split and combine steps in this case.
+
+    In order to perform operations by groups you first need to create a `GroupedDataFrame`
+    object from your data frame using the `groupby` function that takes two arguments:
+    (1) a data frame to be grouped, and (2) a set of columns to group by.
+
+    Operations can then be applied on each group using one of the following functions:
+    * `combine`: does not put restrictions on number of rows returned, the order of rows
+      is specified by the order of groups in `GroupedDataFrame`; it is typically used
+      to compute summary statistics by group;
+    * `select`: return a data frame with the number and order of rows exactly the same
+      as the source data frame, including only new calculated columns;
+      `select!` is an in-place version of `select`;
+    * `transform`: return a data frame with the number and order of rows exactly the same
+      as the source data frame, including all columns from the source and new calculated columns;
+      `transform!` is an in-place version of `transform`.
+
+    All these functions take a specification of one or more functions to apply to
+    each subset of the `DataFrame`. This specification can be of the following forms:
+    1. standard column selectors (integers, `Symbol`s, strings, vectors of integers,
+       vectors of `Symbol`s, vectors of strings,
+       `All`, `Cols`, `:`, `Between`, `Not` and regular expressions)
+    2. a `cols => function` pair indicating that `function` should be called with
+       positional arguments holding columns `cols`, which can be a any valid column selector;
+       in this case target column name is automatically generated and it is assumed that
+       `function` returns a single value or a vector; the generated name is created by
+       concatenating source column name and `function` name by default (see examples below).
+    3. a `cols => function => target_cols` form additionally explicitly specifying
+       the target column or columns.
+    4. a `col => target_cols` pair, which renames the column `col` to `target_cols`, which
+       must be single name (as a `Symbol` or a string).
+    5. a `nrow` or `nrow => target_cols` form which efficiently computes the number of rows
+       in a group; without `target_cols` the new column is called `:nrow`, otherwise
+       it must be single name (as a `Symbol` or a string).
+    6. vectors or matrices containing transformations specified by the `Pair` syntax
+       described in points 2 to 5
+    8. a function which will be called with a `SubDataFrame` corresponding to each group;
+       this form should be avoided due to its poor performance unless a very large
+       number of columns are processed (in which case `SubDataFrame` avoids excessive
+       compilation)
+
+    All functions have two types of signatures. One of them takes a `GroupedDataFrame`
+    as the first argument and an arbitrary number of transformations described above
+    as following arguments. The second type of signature is when a `Function` or a `Type`
+    is passed as the first argument and a `GroupedDataFrame` as the second argument
+    (similar to `map`).
+
+    As a special rule, with the `cols => function` and `cols => function =>
+    target_cols` syntaxes, if `cols` is wrapped in an `AsTable`
+    object then a `NamedTuple` containing columns selected by `cols` is passed to
+    `function`.
+
+    What is allowed for `function` to return is determined by the `target_cols` value:
+    1. If both `cols` and `target_cols` are omitted (so only a `function` is passed),
+       then returning a data frame, a matrix, a `NamedTuple`, or a `DataFrameRow` will
+       produce multiple columns in the result. Returning any other value produces
+       a single column.
+    2. If `target_cols` is a `Symbol` or a string then the function is assumed to return
+       a single column. In this case returning a data frame, a matrix, a `NamedTuple`,
+       or a `DataFrameRow` raises an error.
+    3. If `target_cols` is a vector of `Symbol`s or strings or `AsTable` it is assumed
+       that `function` returns multiple columns.
+       If `function` returns one of `AbstractDataFrame`, `NamedTuple`, `DataFrameRow`,
+       `AbstractMatrix` then rules described in point 1 above apply.
+       If `function` returns an `AbstractVector` then each element of this vector must
+       support the `keys` function, which must return a collection of `Symbol`s, strings
+       or integers; the return value of `keys` must be identical for all elements.
+       Then as many columns are created as there are elements in the return value
+       of the `keys` function. If `target_cols` is `AsTable` then their names
+       are set to be equal to the key names except if `keys` returns integers, in
+       which case they are prefixed by `x` (so the column names are e.g. `x1`,
+       `x2`, ...). If `target_cols` is a vector of `Symbol`s or strings then
+       column names produced using the rules above are ignored and replaced by
+       `target_cols` (the number of columns must be the same as the length of
+       `target_cols` in this case).
+       If `fun` returns a value of any other type then it is assumed that it is a
+       table conforming to the Tables.jl API and the `Tables.columntable` function
+       is called on it to get the resulting columns and their names. The names are
+       retained when `target_cols` is `AsTable` and are replaced if
+       `target_cols` is a vector of `Symbol`s or strings.
+
+    In all of these cases, `function` can return either a single row or multiple
+    rows. As a particular rule, values wrapped in a `Ref` or a `0`-dimensional
+    `AbstractArray` are unwrapped and then treated as a single row.
+
+    `select`/`select!` and `transform`/`transform!` always return a `DataFrame`
+    with the same number and order of rows as the source (even if `GroupedDataFrame`
+    had its groups reordered).
+
+    For `combine`, rows in the returned object appear in the order of groups in the
+    `GroupedDataFrame`. The functions can return an arbitrary number of rows for
+    each group, but the kind of returned object and the number and names of columns
+    must be the same for all groups, except when a `DataFrame()` or `NamedTuple()`
+    is returned, in which case a given group is skipped.
+
+    It is allowed to mix single values and vectors if multiple transformations
+    are requested. In this case single value will be repeated to match the length
+    of columns specified by returned vectors.
+
+    To apply `function` to each row instead of whole columns, it can be wrapped in a
+    `ByRow` struct. `cols` can be any column indexing syntax, in which case
+    `function` will be passed one argument for each of the columns specified by
+    `cols` or a `NamedTuple` of them if specified columns are wrapped in `AsTable`.
+    If `ByRow` is used it is allowed for `cols` to select an empty set of columns,
+    in which case `function` is called for each row without any arguments and an
+    empty `NamedTuple` is passed if empty set of columns is wrapped in `AsTable`.
+
+    If a collection of column names is passed then requesting duplicate column
+    names in target data frame are accepted (e.g. `select!(df, [:a], :, r"a")`
+    is allowed) and only the first occurrence is used. In particular a syntax to
+    move column `:col` to the first position in the data frame is
+    `select!(df, :col, :)`. On the contrary, output column names of renaming,
+    transformation and single column selection operations must be unique, so e.g.
+    `select!(df, :a, :a => :a)` or `select!(df, :a, :a => ByRow(sin) => :a)` are not allowed.
+
+    As a general rule if `copycols=true` columns are copied and when
+    `copycols=false` columns are reused if possible. Note, however, that
+    including the same column several times in the data frame via renaming or
+    transformations that return the same object without copying may create
+    column aliases even if `copycols=true`. An example of such a situation is
+    `select!(df, :a, :a => :b, :a => identity => :c)`.
+
+    If `df` is a `SubDataFrame` and `copycols=true` then a `DataFrame` is
+    returned and the same copying rules apply as for a `DataFrame` input: this
+    means in particular that selected columns will be copied. If
+    `copycols=false`, a `SubDataFrame` is returned without copying columns.
+    """
+
 """
     ByRow
 
@@ -434,233 +573,30 @@ function select_transform!(@nospecialize(nc::Union{Base.Callable, Pair{<:Union{I
     end
 end
 
-SELECT_ARG_RULES =
-    """
-    Arguments passed as `args...` can be:
-
-    * Any index that is allowed for column indexing
-      ($COLUMNINDEX_STR; $MULTICOLUMNINDEX_STR).
-    * A function or a type
-    * Column transformation operations using the `Pair` notation that is
-      described below and vectors or matrices of such pairs.
-
-    Columns can be renamed using the `old_column => new_column_name` syntax, and
-    transformed using the `old_column => fun => new_column_name` syntax.
-    `new_column_name` must be a `Symbol` or a string, a vector of `Symbol`s or
-    strings, or `AsTable`. `fun` must be a function or a type. If `old_column` is a
-    `Symbol`, a string, or an integer then `fun` is applied to the corresponding
-    column vector. Otherwise `old_column` can be any column indexing syntax, in
-    which case `fun` will be passed the column vectors specified by `old_column`
-    as separate arguments. The only exception is when `old_column` is an
-    `AsTable` type wrapping a selector, in which case `fun` is passed a
-    `NamedTuple` containing the selected columns.
-
-    Column renaming and transformation operations can be passed wrapped in
-    vectors or matrices (this is useful when combined with broadcasting).
-
-    # Rules when `new_column_name` is a `Symbol` or a string or is absent
-
-    If `fun` returns a value of type other than `AbstractVector` then it will be
-    repeated in a vector matching the target number of rows in the data
-    frame, unless its type is one of `AbstractDataFrame`, `NamedTuple`,
-    `DataFrameRow`, `AbstractMatrix`, in which case an error is thrown. As a
-    particular rule, values wrapped in a `Ref` or a `0`-dimensional
-    `AbstractArray` are unwrapped and then repeated.
-
-    To apply `fun` to each row instead of whole columns, it can be wrapped in a
-    `ByRow` struct. In this case if `old_column` is a `Symbol`, a string, or an
-    integer then `fun` is applied to each element (row) of `old_column` using
-    broadcasting. Otherwise `old_column` can be any column indexing syntax, in
-    which case `fun` will be passed one argument for each of the columns
-    specified by `old_column`. If `ByRow` is used it is allowed for
-    `old_column` to select an empty set of columns, in which case `fun`
-     is called for each row without any arguments.
-
-    Column transformation can also be specified using the short `old_column =>
-    fun` form. In this case, `new_column_name` is automatically generated as
-    `\$(old_column)_\$(fun)` if `renamecols=true` and `\$(old_column)` if
-    `renamecols=false`. Up to three column names are used for multiple input
-    columns and they are joined using `_`; if more than three columns are passed
-    then the name consists of the first two names and `etc` suffix then, e.g.
-    `[:a,:b,:c,:d] => fun` produces the new column name `:a_b_etc_fun` if
-    `renamecols=true` and ``:a_b_etc` if `renamecols=false`.
-    It is not allowed to pass `renamecols=false` if `old_column` is empty
-    as it would generate an empty column name.
-
-    # Rules when `new_column_name` is a vector of `Symbol`s or strings or is `AsTable`
-
-    In this case it is assumed that `fun` returns multiple columns.
-
-    If `fun` returns one of `AbstractDataFrame`, `NamedTuple`, `DataFrameRow`,
-    `AbstractMatrix` then rules described in the section describing the case
-    when `args` is a function or a type apply.
-
-    If `fun` returns an `AbstractVector` then each element of this vector must
-    support the `keys` function, which must return a collection of `Symbol`s, strings
-    or integers; the return value of `keys` must be identical for all elements.
-    Then as many columns are created as there are elements in the return value
-    of the `keys` function. If `new_column_name` is `AsTable` then their names
-    are set to be equal to the key names except if `keys` returns integers, in
-    which case they are prefixed by `x` (so the column names are e.g. `x1`,
-    `x2`, ...). If `new_column_name` is a vector of `Symbol`s or strings then
-    column names produced using the rules above are ignored and replaced by
-    `new_column_name` (the number of columns must be the same as the length of
-    `new_column_name` in this case).
-
-    If `fun` returns a value of any other type then it is assumed that it is a
-    table conforming to the Tables.jl API and the `Tables.columntable` function
-    is called on it to get the resulting columns and their names. The names are
-    retained when `new_column_name` is `AsTable` and are replaced if
-    `new_column_name` is a vector of `Symbol`s or strings.
-
-    # Rules when element of `args` is a function or a type
-
-    In this case the function or type is called with `df` as a single argument.
-
-    If the return value of the transformation is one of `AbstractDataFrame`,
-    `NamedTuple`, `DataFrameRow` or `AbstractMatrix` then it is treated as
-    containing multiple columns. For `AbstractMatrix` column names are generated
-    as `x1`, `x2`, etc. For `AbstractDataFrame`, `NamedTuple` of vectors and
-    `AbstractMatrix` the columns are taken as is from the returned value. For
-    `DataFrameRow` and` NamedTuple` not containing any vectors the returned
-    value is broadcasted to a vector matching the target number of rows in the data
-    frame.
-
-    If the return value is an `AbstractVector` then it is used as-is. The resulting
-    column gets the name `x1`.
-
-    In all other cases the return value is repeated in a vector matching
-    the target number of rows in the data frame. As a particular rule, values
-    wrapped in a `Ref` or a `0`-dimensional `AbstractArray` are unwrapped and
-    then repeated. The resulting column gets the name `x1`.
-
-    # Special rules
-
-    As a special rule passing `nrow` without specifying `old_column` creates a
-    column named `:nrow` containing a number of rows in a source data frame, and
-    passing `nrow => new_column_name` stores the number of rows in source data
-    frame in `new_column_name` column.
-
-    If a collection of column names is passed to `select!` or `select` then
-    requesting duplicate column names in target data frame are accepted (e.g.
-    `select!(df, [:a], :, r"a")` is allowed) and only the first occurrence is
-    used. In particular a syntax to move column `:col` to the first position in
-    the data frame is `select!(df, :col, :)`. On the contrary, output column
-    names of renaming, transformation and single column selection operations
-    must be unique, so e.g. `select!(df, :a, :a => :a)` or
-    `select!(df, :a, :a => ByRow(sin) => :a)` are not allowed.
-    """
-
 """
     select!(df::DataFrame, args...; renamecols::Bool=true)
-    select!(args::Callable, df::DataFrame; renamecols::Bool=true)
-
-Mutate `df` in place to retain only columns specified by `args...` and return it.
-The result is guaranteed to have the same number of rows as `df`, except when no
-columns are selected (in which case the result has zero rows).
-
-$SELECT_ARG_RULES
-
-Note that including the same column several times in the data frame via renaming
-or transformations that return the same object without copying will create
-column aliases. An example of such a situation is
-`select!(df, :a, :a => :b, :a => identity => :c)`.
-
-# Examples
-```jldoctest
-julia> df = DataFrame(a=1:3, b=4:6)
-3×2 DataFrame
-│ Row │ a     │ b     │
-│     │ Int64 │ Int64 │
-├─────┼───────┼───────┤
-│ 1   │ 1     │ 4     │
-│ 2   │ 2     │ 5     │
-│ 3   │ 3     │ 6     │
-
-julia> select!(df, 2)
-3×1 DataFrame
-│ Row │ b     │
-│     │ Int64 │
-├─────┼───────┤
-│ 1   │ 4     │
-│ 2   │ 5     │
-│ 3   │ 6     │
-
-julia> df = DataFrame(a=1:3, b=4:6);
-
-julia> select!(df, :a => ByRow(sin) => :c, :b)
-3×2 DataFrame
-│ Row │ c        │ b     │
-│     │ Float64  │ Int64 │
-├─────┼──────────┼───────┤
-│ 1   │ 0.841471 │ 4     │
-│ 2   │ 0.909297 │ 5     │
-│ 3   │ 0.14112  │ 6     │
+    select!(args::Base.Callable, df::DataFrame; renamecols::Bool=true)
+    select!(gd::GroupedDataFrame{DataFrame}, args...; ungroup::Bool=true, renamecols::Bool=true)
+    select!(f::Base.Callable, gd::GroupedDataFrame; ungroup::Bool=true, renamecols::Bool=true)
 
-julia> select!(df, :, [:c, :b] => (c,b) -> c .+ b .- sum(b)/length(b))
-3×3 DataFrame
-│ Row │ c        │ b     │ c_b_function │
-│     │ Float64  │ Int64 │ Float64      │
-├─────┼──────────┼───────┼──────────────┤
-│ 1   │ 0.841471 │ 4     │ -0.158529    │
-│ 2   │ 0.909297 │ 5     │ 0.909297     │
-│ 3   │ 0.14112  │ 6     │ 1.14112      │
+Mutate `df` or `gd` in place to retain only columns or transformations specified by `args...` and
+return it. The result is guaranteed to have the same number of rows as `df` or
+parent of `gd`, except when no columns are selected (in which case the result
+has zero rows).
 
-julia> df = DataFrame(a=1:3, b=4:6);
-
-julia> select!(df, names(df) .=> [minimum maximum]);
-
-julia> df
-3×4 DataFrame
-│ Row │ a_minimum │ b_minimum │ a_maximum │ b_maximum │
-│     │ Int64     │ Int64     │ Int64     │ Int64     │
-├─────┼───────────┼───────────┼───────────┼───────────┤
-│ 1   │ 1         │ 4         │ 3         │ 6         │
-│ 2   │ 1         │ 4         │ 3         │ 6         │
-│ 3   │ 1         │ 4         │ 3         │ 6         │
-
-julia> df = DataFrame(a=1:3, b=4:6);
-
-julia> using Statistics
+If `gd` is passed then it is updated to reflect the new rows of its updated
+parent. If there are independent `GroupedDataFrame` objects constructed using
+the same parent data frame they might get corrupt.
 
-julia> select!(df, AsTable(:) => ByRow(mean), renamecols=false)
-3×1 DataFrame
-│ Row │ a_b     │
-│     │ Float64 │
-├─────┼─────────┤
-│ 1   │ 2.5     │
-│ 2   │ 3.5     │
-│ 3   │ 4.5     │
-
-julia> df = DataFrame(a=1:3, b=4:6);
-
-julia> select!(first, df)
-3×2 DataFrame
-│ Row │ a     │ b     │
-│     │ Int64 │ Int64 │
-├─────┼───────┼───────┤
-│ 1   │ 1     │ 4     │
-│ 2   │ 1     │ 4     │
-│ 3   │ 1     │ 4     │
+$TRANSFORMATION_COMMON_RULES
 
-julia> df = DataFrame(a=1:3, b=4:6, c=7:9)
-3×3 DataFrame
-│ Row │ a     │ b     │ c     │
-│     │ Int64 │ Int64 │ Int64 │
-├─────┼───────┼───────┼───────┤
-│ 1   │ 1     │ 4     │ 7     │
-│ 2   │ 2     │ 5     │ 8     │
-│ 3   │ 3     │ 6     │ 9     │
+# Keyword arguments
+- `renamecols::Bool=true` : whether in the `cols => function` form automatically generated
+  column names should include the name of transformation functions or not.
+- `ungroup::Bool=true` : whether the return value of the operation on `gd` should be a data
+  frame or a `GroupedDataFrame`.
 
-julia> select!(df, AsTable(:) => ByRow(x -> (mean=mean(x), std=std(x))) => :stats,
-               AsTable(:) => ByRow(x -> (mean=mean(x), std=std(x))) => AsTable)
-3×3 DataFrame
-│ Row │ stats                   │ mean    │ std     │
-│     │ NamedTuple…             │ Float64 │ Float64 │
-├─────┼─────────────────────────┼─────────┼─────────┤
-│ 1   │ (mean = 4.0, std = 3.0) │ 4.0     │ 3.0     │
-│ 2   │ (mean = 5.0, std = 3.0) │ 5.0     │ 3.0     │
-│ 3   │ (mean = 6.0, std = 3.0) │ 6.0     │ 3.0     │
+See [`select`](@ref) for examples.
 ```
 
 """
@@ -677,12 +613,22 @@ end
 """
     transform!(df::DataFrame, args...; renamecols::Bool=true)
     transform!(args::Callable, df::DataFrame; renamecols::Bool=true)
+    transform!(gd::GroupedDataFrame{DataFrame}, args...; ungroup::Bool=true, renamecols::Bool=true)
+    transform!(f::Base.Callable, gd::GroupedDataFrame; ungroup::Bool=true, renamecols::Bool=true)
 
-Mutate `df` in place to add columns specified by `args...` and return it.
+Mutate `df` or `gd` in place to add columns specified by `args...` and return it.
 The result is guaranteed to have the same number of rows as `df`.
-Equivalent to `select!(df, :, args...)`.
+Equivalent to `select!(df, :, args...)` or `select!(gd, :, args...)`.
+
+$TRANSFORMATION_COMMON_RULES
 
-See [`select!`](@ref) for detailed rules regarding accepted values for `args`.
+# Keyword arguments
+- `renamecols::Bool=true` : whether in the `cols => function` form automatically generated
+  column names should include the name of transformation functions or not.
+- `ungroup::Bool=true` : whether the return value of the operation on `gd` should be a data
+  frame or a `GroupedDataFrame`.
+
+See [`select`](@ref) for examples.
 """
 transform!(df::DataFrame, @nospecialize(args...); renamecols::Bool=true) =
     select!(df, :, args..., renamecols=renamecols)
@@ -697,38 +643,27 @@ end
 """
     select(df::AbstractDataFrame, args...; copycols::Bool=true, renamecols::Bool=true)
     select(args::Callable, df::DataFrame; renamecols::Bool=true)
-
-Create a new data frame that contains columns from `df` specified by `args` and
-return it. The result is guaranteed to have the same number of rows as `df`,
-except when no columns are selected (in which case the result has zero rows)..
-
-If `df` is a `DataFrame` or `copycols=true` then column renaming and transformations
-are supported.
-
-$SELECT_ARG_RULES
-
-If `df` is a `DataFrame` a new `DataFrame` is returned.
-If `copycols=false`, then the returned `DataFrame` shares column vectors with `df`
-where possible.
-If `copycols=true` (the default), then the returned `DataFrame` will not share
-columns with `df`.
-The only exception for this rule is the `old_column => fun => new_column`
-transformation when `fun` returns a vector that is not allocated by `fun` but is
-neither a `SubArray` nor one of the input vectors.
-In such a case a new `DataFrame` might contain aliases. Such a situation can
-only happen with transformations which returns vectors other than their inputs,
-e.g. with `select(df, :a => (x -> c) => :c1, :b => (x -> c) => :c2)`  when `c`
-is a vector object or with `select(df, :a => (x -> df.c) => :c2)`.
-
-If `df` is a `SubDataFrame` and `copycols=true` then a `DataFrame` is returned
-and the same copying rules apply as for a `DataFrame` input:
-this means in particular that selected columns will be copied.
-If `copycols=false`, a `SubDataFrame` is returned without copying columns.
-
-Note that including the same column several times in the data frame via renaming
-or transformations that return the same object when `copycols=false` will create
-column aliases. An example of such a situation is
-`select(df, :a, :a => :b, :a => identity => :c, copycols=false)`.
+    select(gd::GroupedDataFrame, args...; copycols::Bool=true, keepkeys::Bool=true,
+           ungroup::Bool=true, renamecols::Bool=true)
+    select(f::Base.Callable, gd::GroupedDataFrame; copycols::Bool=true,
+           keepkeys::Bool=true, ungroup::Bool=true, renamecols::Bool=true)
+
+Create a new data frame that contains columns from `df` or `gd` specified by
+`args` and return it. The result is guaranteed to have the same number of rows
+as `df`, except when no columns are selected (in which case the result has zero
+rows).
+
+$TRANSFORMATION_COMMON_RULES
+
+# Keyword arguments
+- `copycols::Bool=true` : whether columns of the source data frame should be copied if
+  no transformation is applied to them.
+- `renamecols::Bool=true` : whether in the `cols => function` form automatically generated
+  column names should include the name of transformation functions or not.
+- `keepkeys::Bool=true` : whether grouping columns of `gd` should be kept in the returned
+  data frame.
+- `ungroup::Bool=true` : whether the return value of the operation on `gd` should be a data
+  frame or a `GroupedDataFrame`.
 
 # Examples
 ```jldoctest
@@ -815,6 +750,131 @@ julia> select(df, AsTable(:) => ByRow(x -> (mean=mean(x), std=std(x))) => :stats
 │ 1   │ (mean = 4.0, std = 3.0) │ 4.0     │ 3.0     │
 │ 2   │ (mean = 5.0, std = 3.0) │ 5.0     │ 3.0     │
 │ 3   │ (mean = 6.0, std = 3.0) │ 6.0     │ 3.0     │
+
+julia> df = DataFrame(a = [1, 1, 1, 2, 2, 1, 1, 2],
+                      b = repeat([2, 1], outer=[4]),
+                      c = 1:8)
+8×3 DataFrame
+│ Row │ a     │ b     │ c     │
+│     │ Int64 │ Int64 │ Int64 │
+├─────┼───────┼───────┼───────┤
+│ 1   │ 1     │ 2     │ 1     │
+│ 2   │ 1     │ 1     │ 2     │
+│ 3   │ 1     │ 2     │ 3     │
+│ 4   │ 2     │ 1     │ 4     │
+│ 5   │ 2     │ 2     │ 5     │
+│ 6   │ 1     │ 1     │ 6     │
+│ 7   │ 1     │ 2     │ 7     │
+│ 8   │ 2     │ 1     │ 8     │
+
+julia> gd = groupby(df, :a);
+
+julia> select(gd, :c => sum, nrow)
+8×3 DataFrame
+│ Row │ a     │ c_sum │ nrow  │
+│     │ Int64 │ Int64 │ Int64 │
+├─────┼───────┼───────┼───────┤
+│ 1   │ 1     │ 19    │ 5     │
+│ 2   │ 1     │ 19    │ 5     │
+│ 3   │ 1     │ 19    │ 5     │
+│ 4   │ 2     │ 17    │ 3     │
+│ 5   │ 2     │ 17    │ 3     │
+│ 6   │ 1     │ 19    │ 5     │
+│ 7   │ 1     │ 19    │ 5     │
+│ 8   │ 2     │ 17    │ 3     │
+
+julia> select(gd, :c => sum, nrow, ungroup=false)
+GroupedDataFrame with 2 groups based on key: a
+First Group (5 rows): a = 1
+│ Row │ a     │ c_sum │ nrow  │
+│     │ Int64 │ Int64 │ Int64 │
+├─────┼───────┼───────┼───────┤
+│ 1   │ 1     │ 19    │ 5     │
+│ 2   │ 1     │ 19    │ 5     │
+│ 3   │ 1     │ 19    │ 5     │
+│ 4   │ 1     │ 19    │ 5     │
+│ 5   │ 1     │ 19    │ 5     │
+⋮
+Last Group (3 rows): a = 2
+│ Row │ a     │ c_sum │ nrow  │
+│     │ Int64 │ Int64 │ Int64 │
+├─────┼───────┼───────┼───────┤
+│ 1   │ 2     │ 17    │ 3     │
+│ 2   │ 2     │ 17    │ 3     │
+│ 3   │ 2     │ 17    │ 3     │
+
+# specifying a name for target column
+julia> select(gd, :c => (x -> sum(log, x)) => :sum_log_c)
+8×2 DataFrame
+│ Row │ a     │ sum_log_c │
+│     │ Int64 │ Float64   │
+├─────┼───────┼───────────┤
+│ 1   │ 1     │ 5.52943   │
+│ 2   │ 1     │ 5.52943   │
+│ 3   │ 1     │ 5.52943   │
+│ 4   │ 2     │ 5.07517   │
+│ 5   │ 2     │ 5.07517   │
+│ 6   │ 1     │ 5.52943   │
+│ 7   │ 1     │ 5.52943   │
+│ 8   │ 2     │ 5.07517   │
+
+julia> select(gd, [:b, :c] .=> sum) # passing a vector of pairs
+8×3 DataFrame
+│ Row │ a     │ b_sum │ c_sum │
+│     │ Int64 │ Int64 │ Int64 │
+├─────┼───────┼───────┼───────┤
+│ 1   │ 1     │ 8     │ 19    │
+│ 2   │ 1     │ 8     │ 19    │
+│ 3   │ 1     │ 8     │ 19    │
+│ 4   │ 2     │ 4     │ 17    │
+│ 5   │ 2     │ 4     │ 17    │
+│ 6   │ 1     │ 8     │ 19    │
+│ 7   │ 1     │ 8     │ 19    │
+│ 8   │ 2     │ 4     │ 17    │
+
+ # multiple arguments, renaming and keepkeys
+julia> select(gd, :b => :b1, :c => :c1, [:b, :c] => +, keepkeys=false)
+8×3 DataFrame
+│ Row │ b1    │ c1    │ b_c_+ │
+│     │ Int64 │ Int64 │ Int64 │
+├─────┼───────┼───────┼───────┤
+│ 1   │ 2     │ 1     │ 3     │
+│ 2   │ 1     │ 2     │ 3     │
+│ 3   │ 2     │ 3     │ 5     │
+│ 4   │ 1     │ 4     │ 5     │
+│ 5   │ 2     │ 5     │ 7     │
+│ 6   │ 1     │ 6     │ 7     │
+│ 7   │ 2     │ 7     │ 9     │
+│ 8   │ 1     │ 8     │ 9     │
+
+# broadcasting and column expansion
+julia> select(gd, :b, AsTable([:b, :c]) => ByRow(extrema) => [:min, :max])
+8×4 DataFrame
+│ Row │ a     │ b     │ min   │ max   │
+│     │ Int64 │ Int64 │ Int64 │ Int64 │
+├─────┼───────┼───────┼───────┼───────┤
+│ 1   │ 1     │ 2     │ 1     │ 2     │
+│ 2   │ 1     │ 1     │ 1     │ 2     │
+│ 3   │ 1     │ 2     │ 2     │ 3     │
+│ 4   │ 2     │ 1     │ 1     │ 4     │
+│ 5   │ 2     │ 2     │ 2     │ 5     │
+│ 6   │ 1     │ 1     │ 1     │ 6     │
+│ 7   │ 1     │ 2     │ 2     │ 7     │
+│ 8   │ 2     │ 1     │ 1     │ 8     │
+
+julia> select(gd, :, AsTable(Not(:a)) => sum, renamecols=false)
+8×4 DataFrame
+│ Row │ a     │ b     │ c     │ b_c   │
+│     │ Int64 │ Int64 │ Int64 │ Int64 │
+├─────┼───────┼───────┼───────┼───────┤
+│ 1   │ 1     │ 2     │ 1     │ 3     │
+│ 2   │ 1     │ 1     │ 2     │ 3     │
+│ 3   │ 1     │ 2     │ 3     │ 5     │
+│ 4   │ 2     │ 1     │ 4     │ 5     │
+│ 5   │ 2     │ 2     │ 5     │ 7     │
+│ 6   │ 1     │ 1     │ 6     │ 7     │
+│ 7   │ 1     │ 2     │ 7     │ 9     │
+│ 8   │ 2     │ 1     │ 8     │ 9     │
 ```
 
 """
@@ -830,14 +890,59 @@ end
 
 """
     transform(df::AbstractDataFrame, args...; copycols::Bool=true, renamecols::Bool=true)
-    transform(args::Callable, df::DataFrame; renamecols::Bool=true)
+    transform(f::Callable, df::DataFrame; renamecols::Bool=true)
+    transform(gd::GroupedDataFrame, args...; copycols::Bool=true,
+              keepkeys::Bool=true, ungroup::Bool=true, renamecols::Bool=true)
+    transform(f::Base.Callable, gd::GroupedDataFrame; copycols::Bool=true,
+              keepkeys::Bool=true, ungroup::Bool=true, renamecols::Bool=true)
+
+Create a new data frame that contains columns from `df` or `gd` plus columns
+specified by `args` and return it. The result is guaranteed to have the same
+number of rows as `df`. Equivalent to `select(df, :, args...)` or `select(gd, :, args...)`.
+
+$TRANSFORMATION_COMMON_RULES
+
+# Keyword arguments
+- `copycols::Bool=true` : whether columns of the source data frame should be copied if
+  no transformation is applied to them.
+- `renamecols::Bool=true` : whether in the `cols => function` form automatically generated
+  column names should include the name of transformation functions or not.
+- `keepkeys::Bool=true` : whether grouping columns of `gd` should be kept in the returned
+  data frame.
+- `ungroup::Bool=true` : whether the return value of the operation on `gd` should be a data
+  frame or a `GroupedDataFrame`.
+
+Note that when the first argument is a `GroupedDataFrame`, `keepkeys=false`
+is needed to be able to return a different value for the grouping column:
 
-Create a new data frame that contains columns from `df` and adds columns
-specified by `args` and return it.
-The result is guaranteed to have the same number of rows as `df`.
-Equivalent to `select(df, :, args..., copycols=copycols)`.
+```
+julia> gdf = groupby(DataFrame(x=1:2), :x)
+GroupedDataFrame with 2 groups based on key: x
+First Group (1 row): x = 1
+│ Row │ x     │
+│     │ Int64 │
+├─────┼───────┤
+│ 1   │ 1     │
+⋮
+Last Group (1 row): x = 2
+│ Row │ x     │
+│     │ Int64 │
+├─────┼───────┤
+│ 1   │ 2     │
+
+julia> transform(gdf, x -> (x=10,), keepkeys=false)
+2×1 DataFrame
+│ Row │ x     │
+│     │ Int64 │
+├─────┼───────┤
+│ 1   │ 10    │
+│ 2   │ 10    │
 
-See [`select`](@ref) for detailed rules regarding accepted values for `args`.
+julia> transform(gdf, x -> (x=10,), keepkeys=true)
+ERROR: ArgumentError: column :x in returned data frame is not equal to grouping key :x
+```
+
+See [`select`](@ref) for more examples.
 """
 transform(df::AbstractDataFrame, @nospecialize(args...); copycols::Bool=true, renamecols::Bool=true) =
     select(df, :, args..., copycols=copycols, renamecols=renamecols)
@@ -851,16 +956,25 @@ end
 
 """
     combine(df::AbstractDataFrame, args...; renamecols::Bool=true)
-    combine(args::Callable, df::AbstractDataFrame; renamecols::Bool=true)
-
-Create a new data frame that contains columns from `df` specified by `args` and
-return it. The result can have any number of rows that is determined by the
-values returned by passed transformations.
-
-See [`select`](@ref) for detailed rules regarding accepted values for `args` in
-`combine(df, args...)` form. For `combine(arg, df)` the same rules as for
-`combine` on `GroupedDataFrame` apply except that a `df` with zero rows is
-currently not allowed.
+    combine(f::Callable, df::AbstractDataFrame; renamecols::Bool=true)
+    combine(gd::GroupedDataFrame, args...;
+            keepkeys::Bool=true, ungroup::Bool=true, renamecols::Bool=true)
+    combine(f::Base.Callable, gd::GroupedDataFrame;
+            keepkeys::Bool=true, ungroup::Bool=true, renamecols::Bool=true)
+
+Create a new data frame that contains columns from `df` or `gd` specified by
+`args` and return it. The result can have any number of rows that is determined
+by the values returned by passed transformations.
+
+$TRANSFORMATION_COMMON_RULES
+
+# Keyword arguments
+- `renamecols::Bool=true` : whether in the `cols => function` form automatically generated
+  column names should include the name of transformation functions or not.
+- `keepkeys::Bool=true` : whether grouping columns of `gd` should be kept in the returned
+  data frame.
+- `ungroup::Bool=true` : whether the return value of the operation on `gd` should be a data
+  frame or a `GroupedDataFrame`.
 
 # Examples
 ```jldoctest
@@ -941,6 +1055,148 @@ julia> combine(df, AsTable(:) => ByRow(x -> (mean=mean(x), std=std(x))) => :stat
 │ 1   │ (mean = 4.0, std = 3.0) │ 4.0     │ 3.0     │
 │ 2   │ (mean = 5.0, std = 3.0) │ 5.0     │ 3.0     │
 │ 3   │ (mean = 6.0, std = 3.0) │ 6.0     │ 3.0     │
+
+julia> df = DataFrame(a = repeat([1, 2, 3, 4], outer=[2]),
+                      b = repeat([2, 1], outer=[4]),
+                      c = 1:8);
+
+julia> gd = groupby(df, :a);
+
+julia> combine(gd, :c => sum, nrow)
+4×3 DataFrame
+│ Row │ a     │ c_sum │ nrow  │
+│     │ Int64 │ Int64 │ Int64 │
+├─────┼───────┼───────┼───────┤
+│ 1   │ 1     │ 6     │ 2     │
+│ 2   │ 2     │ 8     │ 2     │
+│ 3   │ 3     │ 10    │ 2     │
+│ 4   │ 4     │ 12    │ 2     │
+
+julia> combine(gd, :c => sum, nrow, ungroup=false)
+GroupedDataFrame with 4 groups based on key: a
+First Group (1 row): a = 1
+│ Row │ a     │ c_sum │ nrow  │
+│     │ Int64 │ Int64 │ Int64 │
+├─────┼───────┼───────┼───────┤
+│ 1   │ 1     │ 6     │ 2     │
+⋮
+Last Group (1 row): a = 4
+│ Row │ a     │ c_sum │ nrow  │
+│     │ Int64 │ Int64 │ Int64 │
+├─────┼───────┼───────┼───────┤
+│ 1   │ 4     │ 12    │ 2     │
+
+julia> combine(gd) do d # do syntax for the slower variant
+           sum(d.c)
+       end
+4×2 DataFrame
+│ Row │ a     │ x1    │
+│     │ Int64 │ Int64 │
+├─────┼───────┼───────┤
+│ 1   │ 1     │ 6     │
+│ 2   │ 2     │ 8     │
+│ 3   │ 3     │ 10    │
+│ 4   │ 4     │ 12    │
+
+# specifying a name for target column
+julia> combine(gd, :c => (x -> sum(log, x)) => :sum_log_c)
+4×2 DataFrame
+│ Row │ a     │ sum_log_c │
+│     │ Int64 │ Float64   │
+├─────┼───────┼───────────┤
+│ 1   │ 1     │ 1.60944   │
+│ 2   │ 2     │ 2.48491   │
+│ 3   │ 3     │ 3.04452   │
+│ 4   │ 4     │ 3.46574   │
+
+julia> combine(gd, [:b, :c] .=> sum) # passing a vector of pairs
+4×3 DataFrame
+│ Row │ a     │ b_sum │ c_sum │
+│     │ Int64 │ Int64 │ Int64 │
+├─────┼───────┼───────┼───────┤
+│ 1   │ 1     │ 4     │ 6     │
+│ 2   │ 2     │ 2     │ 8     │
+│ 3   │ 3     │ 4     │ 10    │
+│ 4   │ 4     │ 2     │ 12    │
+
+julia> combine(gd) do sdf # dropping group when DataFrame() is returned
+          sdf.c[1] != 1 ? sdf : DataFrame()
+       end
+6×3 DataFrame
+│ Row │ a     │ b     │ c     │
+│     │ Int64 │ Int64 │ Int64 │
+├─────┼───────┼───────┼───────┤
+│ 1   │ 2     │ 1     │ 2     │
+│ 2   │ 2     │ 1     │ 6     │
+│ 3   │ 3     │ 2     │ 3     │
+│ 4   │ 3     │ 2     │ 7     │
+│ 5   │ 4     │ 1     │ 4     │
+│ 6   │ 4     │ 1     │ 8     │
+
+# auto-splatting, renaming and keepkeys
+julia> combine(gd, :b => :b1, :c => :c1, [:b, :c] => +, keepkeys=false)
+8×3 DataFrame
+│ Row │ b1    │ c1    │ b_c_+ │
+│     │ Int64 │ Int64 │ Int64 │
+├─────┼───────┼───────┼───────┤
+│ 1   │ 2     │ 1     │ 3     │
+│ 2   │ 2     │ 5     │ 7     │
+│ 3   │ 1     │ 2     │ 3     │
+│ 4   │ 1     │ 6     │ 7     │
+│ 5   │ 2     │ 3     │ 5     │
+│ 6   │ 2     │ 7     │ 9     │
+│ 7   │ 1     │ 4     │ 5     │
+│ 8   │ 1     │ 8     │ 9     │
+
+# broadcasting and column expansion
+julia> combine(gd, :b, AsTable([:b, :c]) => ByRow(extrema) => [:min, :max])
+8×4 DataFrame
+│ Row │ a     │ b     │ min   │ max   │
+│     │ Int64 │ Int64 │ Int64 │ Int64 │
+├─────┼───────┼───────┼───────┼───────┤
+│ 1   │ 1     │ 2     │ 1     │ 2     │
+│ 2   │ 1     │ 2     │ 2     │ 5     │
+│ 3   │ 2     │ 1     │ 1     │ 2     │
+│ 4   │ 2     │ 1     │ 1     │ 6     │
+│ 5   │ 3     │ 2     │ 2     │ 3     │
+│ 6   │ 3     │ 2     │ 2     │ 7     │
+│ 7   │ 4     │ 1     │ 1     │ 4     │
+│ 8   │ 4     │ 1     │ 1     │ 8     │
+
+# preventing vector from being spread across multiple rows
+julia> combine(gd, [:b, :c] .=> Ref)
+4×3 DataFrame
+│ Row │ a     │ b_Ref    │ c_Ref    │
+│     │ Int64 │ SubArra… │ SubArra… │
+├─────┼───────┼──────────┼──────────┤
+│ 1   │ 1     │ [2, 2]   │ [1, 5]   │
+│ 2   │ 2     │ [1, 1]   │ [2, 6]   │
+│ 3   │ 3     │ [2, 2]   │ [3, 7]   │
+│ 4   │ 4     │ [1, 1]   │ [4, 8]   │
+
+julia> combine(gd, AsTable(:) => Ref) # protecting result
+4×2 DataFrame
+│ Row │ a     │ a_b_c_Ref                            │
+│     │ Int64 │ NamedTuple…                          │
+├─────┼───────┼──────────────────────────────────────┤
+│ 1   │ 1     │ (a = [1, 1], b = [2, 2], c = [1, 5]) │
+│ 2   │ 2     │ (a = [2, 2], b = [1, 1], c = [2, 6]) │
+│ 3   │ 3     │ (a = [3, 3], b = [2, 2], c = [3, 7]) │
+│ 4   │ 4     │ (a = [4, 4], b = [1, 1], c = [4, 8]) │
+
+julia> combine(gd, :, AsTable(Not(:a)) => sum, renamecols=false)
+8×4 DataFrame
+│ Row │ a     │ b     │ c     │ b_c   │
+│     │ Int64 │ Int64 │ Int64 │ Int64 │
+├─────┼───────┼───────┼───────┼───────┤
+│ 1   │ 1     │ 2     │ 1     │ 3     │
+│ 2   │ 1     │ 2     │ 5     │ 7     │
+│ 3   │ 2     │ 1     │ 2     │ 3     │
+│ 4   │ 2     │ 1     │ 6     │ 7     │
+│ 5   │ 3     │ 2     │ 3     │ 5     │
+│ 6   │ 3     │ 2     │ 7     │ 9     │
+│ 7   │ 4     │ 1     │ 4     │ 5     │
+│ 8   │ 4     │ 1     │ 8     │ 9     │
 ```
 """
 combine(df::AbstractDataFrame, @nospecialize(args...); renamecols::Bool=true) =
@@ -953,6 +1209,11 @@ function combine(arg::Base.Callable, df::AbstractDataFrame; renamecols::Bool=tru
     return combine(df, arg)
 end
 
+combine(f::Pair, gd::AbstractDataFrame; renamecols::Bool=true) =
+    throw(ArgumentError("First argument must be a transformation if the second argument is a data frame. " *
+                        "You can pass a `Pair` as the second argument of the transformation. If you want the return " *
+                        "value to be processed as having multiple columns add `=> AsTable` suffix to the pair."))
+
 manipulate(df::DataFrame, args::AbstractVector{Int}; copycols::Bool, keeprows::Bool,
            renamecols::Bool) =
     DataFrame(_columns(df)[args], Index(_names(df)[args]), copycols=copycols)
diff --git a/src/groupeddataframe/callprocessing.jl b/src/groupeddataframe/callprocessing.jl
new file mode 100644
index 0000000000..859987d83d
--- /dev/null
+++ b/src/groupeddataframe/callprocessing.jl
@@ -0,0 +1,143 @@
+# Wrapping automatically adds column names when the value returned
+# by the user-provided function lacks them
+wrap(x::Union{AbstractDataFrame, DataFrameRow}) = x
+wrap(x::NamedTuple) = x
+function wrap(x::NamedTuple{<:Any, <:Tuple{Vararg{AbstractVector}}})
+    if !isempty(x)
+        len1 = length(x[1])
+        for i in 2:length(x)
+            length(x[i]) == len1 || throw(DimensionMismatch("all vectors returned in a " *
+                                                            "NamedTuple must have the same length"))
+        end
+    end
+    return x
+end
+wrap(x::AbstractMatrix) =
+    NamedTuple{Tuple(gennames(size(x, 2)))}(Tuple(view(x, :, i) for i in 1:size(x, 2)))
+wrap(x::Any) = (x1=x,)
+
+const ERROR_ROW_COUNT = "return value must not change its kind " *
+                        "(single row or variable number of rows) across groups"
+
+const ERROR_COL_COUNT = "function must return only single-column values, " *
+                        "or only multiple-column values"
+
+wrap_table(x::Any, ::Val) =
+    throw(ArgumentError(ERROR_ROW_COUNT))
+function wrap_table(x::Union{NamedTuple{<:Any, <:Tuple{Vararg{AbstractVector}}},
+                             AbstractDataFrame, AbstractMatrix},
+                             ::Val{firstmulticol}) where firstmulticol
+    if !firstmulticol
+        throw(ArgumentError(ERROR_COL_COUNT))
+    end
+    return wrap(x)
+end
+
+function wrap_table(x::AbstractVector, ::Val{firstmulticol}) where firstmulticol
+    if firstmulticol
+        throw(ArgumentError(ERROR_COL_COUNT))
+    end
+    return wrap(x)
+end
+
+function wrap_row(x::Any, ::Val{firstmulticol}) where firstmulticol
+    # NamedTuple is not possible in this branch
+    if (x isa DataFrameRow) ⊻ firstmulticol
+        throw(ArgumentError(ERROR_COL_COUNT))
+    end
+    return wrap(x)
+end
+
+function wrap_row(x::Union{AbstractArray{<:Any, 0}, Ref},
+                  ::Val{firstmulticol}) where firstmulticol
+    if firstmulticol
+        throw(ArgumentError(ERROR_COL_COUNT))
+    end
+    return (x1 = x[],)
+end
+
+# note that also NamedTuple() is correctly captured by this definition
+# as it is more specific than the one below
+wrap_row(::Union{AbstractVecOrMat, AbstractDataFrame,
+                 NamedTuple{<:Any, <:Tuple{Vararg{AbstractVector}}}}, ::Val) =
+    throw(ArgumentError(ERROR_ROW_COUNT))
+
+function wrap_row(x::NamedTuple, ::Val{firstmulticol}) where firstmulticol
+    if any(v -> v isa AbstractVector, x)
+        throw(ArgumentError("mixing single values and vectors in a named tuple is not allowed"))
+    end
+    if !firstmulticol
+        throw(ArgumentError(ERROR_COL_COUNT))
+    end
+    return x
+end
+
+# idx, starts and ends are passed separately to avoid cost of field access in tight loop
+# Manual unrolling of Tuple is used as it turned out more efficient than @generated
+# for small number of columns passed.
+# For more than 4 columns `map` is slower than @generated
+# but this case is probably rare and if huge number of columns is passed @generated
+# has very high compilation cost
+function do_call(f::Base.Callable, idx::AbstractVector{<:Integer},
+                 starts::AbstractVector{<:Integer}, ends::AbstractVector{<:Integer},
+                 gd::GroupedDataFrame, incols::Tuple{}, i::Integer)
+    if f isa ByRow
+        return [f.fun() for _ in 1:(ends[i] - starts[i] + 1)]
+    else
+        return f()
+    end
+end
+
+function do_call(f::Base.Callable, idx::AbstractVector{<:Integer},
+                 starts::AbstractVector{<:Integer}, ends::AbstractVector{<:Integer},
+                 gd::GroupedDataFrame, incols::Tuple{AbstractVector}, i::Integer)
+    idx = idx[starts[i]:ends[i]]
+    return f(view(incols[1], idx))
+end
+
+function do_call(f::Base.Callable, idx::AbstractVector{<:Integer},
+                 starts::AbstractVector{<:Integer}, ends::AbstractVector{<:Integer},
+                 gd::GroupedDataFrame, incols::NTuple{2, AbstractVector}, i::Integer)
+    idx = idx[starts[i]:ends[i]]
+    return f(view(incols[1], idx), view(incols[2], idx))
+end
+
+function do_call(f::Base.Callable, idx::AbstractVector{<:Integer},
+                 starts::AbstractVector{<:Integer}, ends::AbstractVector{<:Integer},
+                 gd::GroupedDataFrame, incols::NTuple{3, AbstractVector}, i::Integer)
+    idx = idx[starts[i]:ends[i]]
+    return f(view(incols[1], idx), view(incols[2], idx), view(incols[3], idx))
+end
+
+function do_call(f::Base.Callable, idx::AbstractVector{<:Integer},
+                 starts::AbstractVector{<:Integer}, ends::AbstractVector{<:Integer},
+                 gd::GroupedDataFrame, incols::NTuple{4, AbstractVector}, i::Integer)
+    idx = idx[starts[i]:ends[i]]
+    return f(view(incols[1], idx), view(incols[2], idx), view(incols[3], idx),
+             view(incols[4], idx))
+end
+
+function do_call(f::Base.Callable, idx::AbstractVector{<:Integer},
+                 starts::AbstractVector{<:Integer}, ends::AbstractVector{<:Integer},
+                 gd::GroupedDataFrame, incols::Tuple, i::Integer)
+    idx = idx[starts[i]:ends[i]]
+    return f(map(c -> view(c, idx), incols)...)
+end
+
+function do_call(f::Base.Callable, idx::AbstractVector{<:Integer},
+                 starts::AbstractVector{<:Integer}, ends::AbstractVector{<:Integer},
+                 gd::GroupedDataFrame, incols::NamedTuple, i::Integer)
+    if f isa ByRow && isempty(incols)
+        return [f.fun(NamedTuple()) for _ in 1:(ends[i] - starts[i] + 1)]
+    else
+        idx = idx[starts[i]:ends[i]]
+        return f(map(c -> view(c, idx), incols))
+    end
+end
+
+function do_call(f::Base.Callable, idx::AbstractVector{<:Integer},
+                 starts::AbstractVector{<:Integer}, ends::AbstractVector{<:Integer},
+                 gd::GroupedDataFrame, incols::Nothing, i::Integer)
+    idx = idx[starts[i]:ends[i]]
+    return f(view(parent(gd), idx, :))
+end
diff --git a/src/groupeddataframe/complextransforms.jl b/src/groupeddataframe/complextransforms.jl
new file mode 100644
index 0000000000..8db068c398
--- /dev/null
+++ b/src/groupeddataframe/complextransforms.jl
@@ -0,0 +1,236 @@
+_nrow(df::AbstractDataFrame) = nrow(df)
+_nrow(x::NamedTuple{<:Any, <:Tuple{Vararg{AbstractVector}}}) =
+    isempty(x) ? 0 : length(x[1])
+_ncol(df::AbstractDataFrame) = ncol(df)
+_ncol(x::Union{NamedTuple, DataFrameRow}) = length(x)
+
+function _combine_multicol(firstres, fun::Base.Callable, gd::GroupedDataFrame,
+                           incols::Union{Nothing, AbstractVector, Tuple, NamedTuple})
+    firstmulticol = firstres isa MULTI_COLS_TYPE
+    if !(firstres isa Union{AbstractVecOrMat, AbstractDataFrame,
+                            NamedTuple{<:Any, <:Tuple{Vararg{AbstractVector}}}})
+        idx_agg = Vector{Int}(undef, length(gd))
+        fillfirst!(nothing, idx_agg, 1:length(gd.groups), gd)
+    else
+        idx_agg = nothing
+    end
+    return _combine_with_first(wrap(firstres), fun, gd, incols,
+                               Val(firstmulticol), idx_agg)
+end
+
+function _combine_with_first(first::Union{NamedTuple, DataFrameRow, AbstractDataFrame},
+                             f::Base.Callable, gd::GroupedDataFrame,
+                             incols::Union{Nothing, AbstractVector, Tuple, NamedTuple},
+                             firstmulticol::Val, idx_agg::Union{Nothing, AbstractVector{<:Integer}})
+    extrude = false
+
+    if first isa AbstractDataFrame
+        n = 0
+        eltys = eltype.(eachcol(first))
+    elseif first isa NamedTuple{<:Any, <:Tuple{Vararg{AbstractVector}}}
+        n = 0
+        eltys = map(eltype, first)
+    elseif first isa DataFrameRow
+        n = length(gd)
+        eltys = [eltype(parent(first)[!, i]) for i in parentcols(index(first))]
+    elseif firstmulticol == Val(false) && first[1] isa Union{AbstractArray{<:Any, 0}, Ref}
+        extrude = true
+        first = wrap_row(first[1], firstmulticol)
+        n = length(gd)
+        eltys = (typeof(first[1]),)
+    else # other NamedTuple giving a single row
+        n = length(gd)
+        eltys = map(typeof, first)
+        if any(x -> x <: AbstractVector, eltys)
+            throw(ArgumentError("mixing single values and vectors in a named tuple is not allowed"))
+        end
+    end
+    idx = isnothing(idx_agg) ? Vector{Int}(undef, n) : idx_agg
+    local initialcols
+    let eltys=eltys, n=n # Workaround for julia#15276
+        initialcols = ntuple(i -> Tables.allocatecolumn(eltys[i], n), _ncol(first))
+    end
+    targetcolnames = tuple(propertynames(first)...)
+    if !extrude && first isa Union{AbstractDataFrame,
+                                   NamedTuple{<:Any, <:Tuple{Vararg{AbstractVector}}}}
+        outcols, finalcolnames = _combine_tables_with_first!(first, initialcols, idx, 1, 1,
+                                                             f, gd, incols, targetcolnames,
+                                                             firstmulticol)
+    else
+        outcols, finalcolnames = _combine_rows_with_first!(first, initialcols, 1, 1,
+                                                           f, gd, incols, targetcolnames,
+                                                           firstmulticol)
+    end
+    return idx, outcols, collect(Symbol, finalcolnames)
+end
+
+function fill_row!(row, outcols::NTuple{N, AbstractVector},
+                   i::Integer, colstart::Integer,
+                   colnames::NTuple{N, Symbol}) where N
+    if _ncol(row) != N
+        throw(ArgumentError("return value must have the same number of columns " *
+                            "for all groups (got $N and $(length(row)))"))
+    end
+    @inbounds for j in colstart:length(outcols)
+        col = outcols[j]
+        cn = colnames[j]
+        local val
+        try
+            val = row[cn]
+        catch
+            throw(ArgumentError("return value must have the same column names " *
+                                "for all groups (got $colnames and $(propertynames(row)))"))
+        end
+        S = typeof(val)
+        T = eltype(col)
+        if S <: T || promote_type(S, T) <: T
+            col[i] = val
+        else
+            return j
+        end
+    end
+    return nothing
+end
+
+function _combine_rows_with_first!(first::Union{NamedTuple, DataFrameRow},
+                                   outcols::NTuple{N, AbstractVector},
+                                   rowstart::Integer, colstart::Integer,
+                                   f::Base.Callable, gd::GroupedDataFrame,
+                                   incols::Union{Nothing, AbstractVector, Tuple, NamedTuple},
+                                   colnames::NTuple{N, Symbol},
+                                   firstmulticol::Val) where N
+    len = length(gd)
+    gdidx = gd.idx
+    starts = gd.starts
+    ends = gd.ends
+
+    # handle empty GroupedDataFrame
+    len == 0 && return outcols, colnames
+
+    # Handle first group
+    j = fill_row!(first, outcols, rowstart, colstart, colnames)
+    @assert j === nothing # eltype is guaranteed to match
+    # Handle remaining groups
+    @inbounds for i in rowstart+1:len
+        row = wrap_row(do_call(f, gdidx, starts, ends, gd, incols, i), firstmulticol)
+        j = fill_row!(row, outcols, i, 1, colnames)
+        if j !== nothing # Need to widen column type
+            local newcols
+            let i = i, j = j, outcols=outcols, row=row # Workaround for julia#15276
+                newcols = ntuple(length(outcols)) do k
+                    S = typeof(row[k])
+                    T = eltype(outcols[k])
+                    U = promote_type(S, T)
+                    if S <: T || U <: T
+                        outcols[k]
+                    else
+                        copyto!(Tables.allocatecolumn(U, length(outcols[k])),
+                                1, outcols[k], 1, k >= j ? i-1 : i)
+                    end
+                end
+            end
+            return _combine_rows_with_first!(row, newcols, i, j,
+                                             f, gd, incols, colnames, firstmulticol)
+        end
+    end
+    return outcols, colnames
+end
+
+# This needs to be in a separate function
+# to work around a crash due to JuliaLang/julia#29430
+if VERSION >= v"1.1.0-DEV.723"
+    @inline function do_append!(do_it, col, vals)
+        do_it && append!(col, vals)
+        return do_it
+    end
+else
+    @noinline function do_append!(do_it, col, vals)
+        do_it && append!(col, vals)
+        return do_it
+    end
+end
+
+function append_rows!(rows, outcols::NTuple{N, AbstractVector},
+                      colstart::Integer, colnames::NTuple{N, Symbol}) where N
+    if !isa(rows, Union{AbstractDataFrame, NamedTuple{<:Any, <:Tuple{Vararg{AbstractVector}}}})
+        throw(ArgumentError(ERROR_ROW_COUNT))
+    elseif _ncol(rows) != N
+        throw(ArgumentError("return value must have the same number of columns " *
+                            "for all groups (got $N and $(_ncol(rows)))"))
+    end
+    @inbounds for j in colstart:length(outcols)
+        col = outcols[j]
+        cn = colnames[j]
+        local vals
+        try
+            vals = getproperty(rows, cn)
+        catch
+            throw(ArgumentError("return value must have the same column names " *
+                                "for all groups (got $colnames and $(propertynames(rows)))"))
+        end
+        S = eltype(vals)
+        T = eltype(col)
+        if !do_append!(S <: T || promote_type(S, T) <: T, col, vals)
+            return j
+        end
+    end
+    return nothing
+end
+
+function _combine_tables_with_first!(first::Union{AbstractDataFrame,
+                                     NamedTuple{<:Any, <:Tuple{Vararg{AbstractVector}}}},
+                                     outcols::NTuple{N, AbstractVector},
+                                     idx::Vector{Int}, rowstart::Integer, colstart::Integer,
+                                     f::Base.Callable, gd::GroupedDataFrame,
+                                     incols::Union{Nothing, AbstractVector, Tuple, NamedTuple},
+                                     colnames::NTuple{N, Symbol},
+                                     firstmulticol::Val) where N
+    len = length(gd)
+    gdidx = gd.idx
+    starts = gd.starts
+    ends = gd.ends
+    # Handle first group
+
+    @assert _ncol(first) == N
+    if !isempty(colnames) && length(gd) > 0
+        j = append_rows!(first, outcols, colstart, colnames)
+        @assert j === nothing # eltype is guaranteed to match
+        append!(idx, Iterators.repeated(gdidx[starts[rowstart]], _nrow(first)))
+    end
+    # Handle remaining groups
+    @inbounds for i in rowstart+1:len
+        rows = wrap_table(do_call(f, gdidx, starts, ends, gd, incols, i), firstmulticol)
+        _ncol(rows) == 0 && continue
+        if isempty(colnames)
+            newcolnames = tuple(propertynames(rows)...)
+            if rows isa AbstractDataFrame
+                eltys = eltype.(eachcol(rows))
+            else
+                eltys = map(eltype, rows)
+            end
+            initialcols = ntuple(i -> Tables.allocatecolumn(eltys[i], 0), _ncol(rows))
+            return _combine_tables_with_first!(rows, initialcols, idx, i, 1,
+                                               f, gd, incols, newcolnames, firstmulticol)
+        end
+        j = append_rows!(rows, outcols, 1, colnames)
+        if j !== nothing # Need to widen column type
+            local newcols
+            let i = i, j = j, outcols=outcols, rows=rows # Workaround for julia#15276
+                newcols = ntuple(length(outcols)) do k
+                    S = eltype(rows isa AbstractDataFrame ? rows[!, k] : rows[k])
+                    T = eltype(outcols[k])
+                    U = promote_type(S, T)
+                    if S <: T || U <: T
+                        outcols[k]
+                    else
+                        copyto!(Tables.allocatecolumn(U, length(outcols[k])), outcols[k])
+                    end
+                end
+            end
+            return _combine_tables_with_first!(rows, newcols, idx, i, j,
+                                               f, gd, incols, colnames, firstmulticol)
+        end
+        append!(idx, Iterators.repeated(gdidx[starts[i]], _nrow(rows)))
+    end
+    return outcols, colnames
+end
diff --git a/src/groupeddataframe/fastaggregates.jl b/src/groupeddataframe/fastaggregates.jl
new file mode 100644
index 0000000000..9d0d6e8cd4
--- /dev/null
+++ b/src/groupeddataframe/fastaggregates.jl
@@ -0,0 +1,284 @@
+abstract type AbstractAggregate end
+
+struct Reduce{O, C, A} <: AbstractAggregate
+    op::O
+    condf::C
+    adjust::A
+    checkempty::Bool
+end
+Reduce(f, condf=nothing, adjust=nothing) = Reduce(f, condf, adjust, false)
+
+check_aggregate(f::Any, ::AbstractVector) = f
+check_aggregate(f::typeof(sum), ::AbstractVector{<:Union{Missing, Number}}) =
+    Reduce(Base.add_sum)
+check_aggregate(f::typeof(sum∘skipmissing), ::AbstractVector{<:Union{Missing, Number}}) =
+    Reduce(Base.add_sum, !ismissing)
+check_aggregate(f::typeof(prod), ::AbstractVector{<:Union{Missing, Number}}) =
+    Reduce(Base.mul_prod)
+check_aggregate(f::typeof(prod∘skipmissing), ::AbstractVector{<:Union{Missing, Number}}) =
+    Reduce(Base.mul_prod, !ismissing)
+check_aggregate(f::typeof(maximum),
+                ::AbstractVector{<:Union{Missing, MULTI_COLS_TYPE, AbstractVector}}) = f
+check_aggregate(f::typeof(maximum), v::AbstractVector{<:Union{Missing, Real}}) =
+    eltype(v) === Any ? f : Reduce(max)
+check_aggregate(f::typeof(maximum∘skipmissing),
+                ::AbstractVector{<:Union{Missing, MULTI_COLS_TYPE, AbstractVector}}) = f
+check_aggregate(f::typeof(maximum∘skipmissing), v::AbstractVector{<:Union{Missing, Real}}) =
+    eltype(v) === Any ? f : Reduce(max, !ismissing, nothing, true)
+check_aggregate(f::typeof(minimum),
+                ::AbstractVector{<:Union{Missing, MULTI_COLS_TYPE, AbstractVector}}) = f
+check_aggregate(f::typeof(minimum), v::AbstractVector{<:Union{Missing, Real}}) =
+    eltype(v) === Any ? f : Reduce(min)
+check_aggregate(f::typeof(minimum∘skipmissing),
+                ::AbstractVector{<:Union{Missing, MULTI_COLS_TYPE, AbstractVector}}) = f
+check_aggregate(f::typeof(minimum∘skipmissing), v::AbstractVector{<:Union{Missing, Real}}) =
+    eltype(v) === Any ? f : Reduce(min, !ismissing, nothing, true)
+check_aggregate(f::typeof(mean), ::AbstractVector{<:Union{Missing, Number}}) =
+    Reduce(Base.add_sum, nothing, /)
+check_aggregate(f::typeof(mean∘skipmissing), ::AbstractVector{<:Union{Missing, Number}}) =
+    Reduce(Base.add_sum, !ismissing, /)
+
+# Other aggregate functions which are not strictly reductions
+struct Aggregate{F, C} <: AbstractAggregate
+    f::F
+    condf::C
+end
+Aggregate(f) = Aggregate(f, nothing)
+
+check_aggregate(f::typeof(var), ::AbstractVector{<:Union{Missing, Number}}) =
+    Aggregate(var)
+check_aggregate(f::typeof(var∘skipmissing), ::AbstractVector{<:Union{Missing, Number}}) =
+    Aggregate(var, !ismissing)
+check_aggregate(f::typeof(std), ::AbstractVector{<:Union{Missing, Number}}) =
+    Aggregate(std)
+check_aggregate(f::typeof(std∘skipmissing), ::AbstractVector{<:Union{Missing, Number}}) =
+    Aggregate(std, !ismissing)
+check_aggregate(f::typeof(first), v::AbstractVector) =
+    eltype(v) === Any ? f : Aggregate(first)
+check_aggregate(f::typeof(first),
+                ::AbstractVector{<:Union{Missing, MULTI_COLS_TYPE, AbstractVector}}) = f
+check_aggregate(f::typeof(first∘skipmissing), v::AbstractVector) =
+    eltype(v) === Any ? f : Aggregate(first, !ismissing)
+check_aggregate(f::typeof(first∘skipmissing),
+                ::AbstractVector{<:Union{Missing, MULTI_COLS_TYPE, AbstractVector}}) = f
+check_aggregate(f::typeof(last), v::AbstractVector) =
+    eltype(v) === Any ? f : Aggregate(last)
+check_aggregate(f::typeof(last),
+                ::AbstractVector{<:Union{Missing, MULTI_COLS_TYPE, AbstractVector}}) = f
+check_aggregate(f::typeof(last∘skipmissing), v::AbstractVector) =
+    eltype(v) === Any ? f : Aggregate(last, !ismissing)
+check_aggregate(f::typeof(last∘skipmissing),
+                ::AbstractVector{<:Union{Missing, MULTI_COLS_TYPE, AbstractVector}}) = f
+check_aggregate(f::typeof(length), ::AbstractVector) = Aggregate(length)
+
+# SkipMissing does not support length
+
+# Use a strategy similar to reducedim_init from Base to get the vector of the right type
+function groupreduce_init(op, condf, adjust,
+                          incol::AbstractVector{U}, gd::GroupedDataFrame) where U
+    T = Base.promote_union(U)
+
+    if op === Base.add_sum
+        initf = zero
+    elseif op === Base.mul_prod
+        initf = one
+    else
+        throw(ErrorException("Unrecognized op $op"))
+    end
+
+    Tnm = nonmissingtype(T)
+    if isconcretetype(Tnm) && applicable(initf, Tnm)
+        tmpv = initf(Tnm)
+        initv = op(tmpv, tmpv)
+        if adjust isa Nothing
+            x = Tnm <: AbstractIrrational ? float(initv) : initv
+        else
+            x = adjust(initv, 1)
+        end
+        if condf === !ismissing
+            V = typeof(x)
+        else
+            V = U >: Missing ? Union{typeof(x), Missing} : typeof(x)
+        end
+        v = similar(incol, V, length(gd))
+        fill!(v, x)
+        return v
+    else
+        # do not try to determine the narrowest possible type nor starting value
+        # as this is not possible to do correctly in general without processing
+        # groups; it will get fixed later in groupreduce!; later we
+        # will make use of the fact that this vector is filled with #undef
+        # while above the vector is filled with a concrete value
+        return Vector{Any}(undef, length(gd))
+    end
+end
+
+for (op, initf) in ((:max, :typemin), (:min, :typemax))
+    @eval begin
+        function groupreduce_init(::typeof($op), condf, adjust,
+                                  incol::AbstractVector{T}, gd::GroupedDataFrame) where T
+            @assert isnothing(adjust)
+            S = nonmissingtype(T)
+            # !ismissing check is purely an optimization to avoid a copy later
+            outcol = similar(incol, condf === !ismissing ? S : T, length(gd))
+            # Comparison is possible only between CatValues from the same pool
+            if incol isa CategoricalVector
+                U = Union{CategoricalArrays.leveltype(outcol),
+                          eltype(outcol) >: Missing ? Missing : Union{}}
+                outcol = CategoricalArray{U, 1}(outcol.refs, incol.pool)
+            end
+            # It is safe to use a non-missing init value
+            # since missing will poison the result if present
+            # we assume here that groups are non-empty (current design assures this)
+            # + workaround for https://github.com/JuliaLang/julia/issues/36978
+            if isconcretetype(S) && hasmethod($initf, Tuple{S}) && !(S <: Irrational)
+                fill!(outcol, $initf(S))
+            else
+                fillfirst!(condf, outcol, incol, gd)
+            end
+            return outcol
+        end
+    end
+end
+
+function copyto_widen!(res::AbstractVector{T}, x::AbstractVector) where T
+    @inbounds for i in eachindex(res, x)
+        val = x[i]
+        S = typeof(val)
+        if S <: T || promote_type(S, T) <: T
+            res[i] = val
+        else
+            newres = Tables.allocatecolumn(promote_type(S, T), length(x))
+            return copyto_widen!(newres, x)
+        end
+    end
+    return res
+end
+
+function groupreduce!(res::AbstractVector, f, op, condf, adjust, checkempty::Bool,
+                      incol::AbstractVector, gd::GroupedDataFrame)
+    n = length(gd)
+    if adjust !== nothing || checkempty
+        counts = zeros(Int, n)
+    end
+    groups = gd.groups
+    @inbounds for i in eachindex(incol, groups)
+        gix = groups[i]
+        x = incol[i]
+        if gix > 0 && (condf === nothing || condf(x))
+            # this check should be optimized out if U is not Any
+            if eltype(res) === Any && !isassigned(res, gix)
+                res[gix] = f(x, gix)
+            else
+                res[gix] = op(res[gix], f(x, gix))
+            end
+            if adjust !== nothing || checkempty
+                counts[gix] += 1
+            end
+        end
+    end
+    # handle the case of an unitialized reduction
+    if eltype(res) === Any
+        if op === Base.add_sum
+            initf = zero
+        elseif op === Base.mul_prod
+            initf = one
+        else
+            initf = x -> throw(ErrorException("Unrecognized op $op"))
+        end
+        @inbounds for gix in eachindex(res)
+            if !isassigned(res, gix)
+                res[gix] = initf(nonmissingtype(eltype(incol)))
+            end
+        end
+    end
+    if adjust !== nothing
+        res .= adjust.(res, counts)
+    end
+    if checkempty && any(iszero, counts)
+        throw(ArgumentError("some groups contain only missing values"))
+    end
+    # Undo pool sharing done by groupreduce_init
+    if res isa CategoricalVector && res.pool === incol.pool
+        V = Union{CategoricalArrays.leveltype(res),
+                  eltype(res) >: Missing ? Missing : Union{}}
+        res = CategoricalArray{V, 1}(res.refs, copy(res.pool))
+    end
+    if isconcretetype(eltype(res))
+        return res
+    else
+        return copyto_widen!(Tables.allocatecolumn(typeof(first(res)), n), res)
+    end
+end
+
+# function barrier works around type instability of groupreduce_init due to applicable
+groupreduce(f, op, condf, adjust, checkempty::Bool,
+            incol::AbstractVector, gd::GroupedDataFrame) =
+    groupreduce!(groupreduce_init(op, condf, adjust, incol, gd),
+                 f, op, condf, adjust, checkempty, incol, gd)
+# Avoids the overhead due to Missing when computing reduction
+groupreduce(f, op, condf::typeof(!ismissing), adjust, checkempty::Bool,
+            incol::AbstractVector, gd::GroupedDataFrame) =
+    groupreduce!(disallowmissing(groupreduce_init(op, condf, adjust, incol, gd)),
+                 f, op, condf, adjust, checkempty, incol, gd)
+
+(r::Reduce)(incol::AbstractVector, gd::GroupedDataFrame) =
+    groupreduce((x, i) -> x, r.op, r.condf, r.adjust, r.checkempty, incol, gd)
+
+# this definition is missing in Julia 1.0 LTS and is required by aggregation for var
+# TODO: remove this when we drop 1.0 support
+if VERSION < v"1.1"
+    Base.zero(::Type{Missing}) = missing
+end
+
+function (agg::Aggregate{typeof(var)})(incol::AbstractVector, gd::GroupedDataFrame)
+    means = groupreduce((x, i) -> x, Base.add_sum, agg.condf, /, false, incol, gd)
+    # !ismissing check is purely an optimization to avoid a copy later
+    if eltype(means) >: Missing && agg.condf !== !ismissing
+        T = Union{Missing, real(eltype(means))}
+    else
+        T = real(eltype(means))
+    end
+    res = zeros(T, length(gd))
+    return groupreduce!(res, (x, i) -> @inbounds(abs2(x - means[i])), +, agg.condf,
+                        (x, l) -> l <= 1 ? oftype(x / (l-1), NaN) : x / (l-1),
+                        false, incol, gd)
+end
+
+function (agg::Aggregate{typeof(std)})(incol::AbstractVector, gd::GroupedDataFrame)
+    outcol = Aggregate(var, agg.condf)(incol, gd)
+    if eltype(outcol) <: Union{Missing, Rational}
+        return sqrt.(outcol)
+    else
+        return map!(sqrt, outcol, outcol)
+    end
+end
+
+for f in (first, last)
+    function (agg::Aggregate{typeof(f)})(incol::AbstractVector, gd::GroupedDataFrame)
+        n = length(gd)
+        outcol = similar(incol, n)
+        fillfirst!(agg.condf, outcol, incol, gd, rev=agg.f === last)
+        if isconcretetype(eltype(outcol))
+            return outcol
+        else
+            return copyto_widen!(Tables.allocatecolumn(typeof(first(outcol)), n), outcol)
+        end
+    end
+end
+
+function (agg::Aggregate{typeof(length)})(incol::AbstractVector, gd::GroupedDataFrame)
+    if getfield(gd, :idx) === nothing
+        lens = zeros(Int, length(gd))
+        @inbounds for gix in gd.groups
+            gix > 0 && (lens[gix] += 1)
+        end
+        return lens
+    else
+        return gd.ends .- gd.starts .+ 1
+    end
+end
+
+isagg((col, (fun, outcol))::Pair{<:ColumnIndex, <:Pair{<:Any, <:SymbolOrString}}, gdf::GroupedDataFrame) =
+    check_aggregate(fun, parent(gdf)[!, col]) isa AbstractAggregate
+isagg(::Any, gdf::GroupedDataFrame) = false
diff --git a/src/groupeddataframe/groupeddataframe.jl b/src/groupeddataframe/groupeddataframe.jl
index 6b97bcae3d..b46338293e 100644
--- a/src/groupeddataframe/groupeddataframe.jl
+++ b/src/groupeddataframe/groupeddataframe.jl
@@ -33,6 +33,192 @@ mutable struct GroupedDataFrame{T<:AbstractDataFrame}
                                          # thread safe
 end
 
+"""
+    groupby(d::AbstractDataFrame, cols; sort=false, skipmissing=false)
+
+Return a `GroupedDataFrame` representing a view of an `AbstractDataFrame` split
+into row groups.
+
+# Arguments
+- `df` : an `AbstractDataFrame` to split
+- `cols` : data frame columns to group by. Can be any column selector
+  ($COLUMNINDEX_STR; $MULTICOLUMNINDEX_STR).
+- `sort` : whether to sort groups according to the values of the grouping columns
+  `cols`; if `sort=false` then the order of groups in the result is undefined
+  and may change in future releases. In the current implementation
+  groups are ordered following the order of appearance of values in the grouping
+  columns, except when all grouping columns provide non-`nothing`
+  `DataAPI.refpool` in which case the order of groups follows the order of
+  values returned by `DataAPI.refpool`. As a particular application of this rule
+  if all `cols` are `CategoricalVector`s then groups are always sorted
+  irrespective of the value of `sort`.
+- `skipmissing` : whether to skip groups with `missing` values in one of the
+  grouping columns `cols`
+
+# Details
+An iterator over a `GroupedDataFrame` returns a `SubDataFrame` view
+for each grouping into `df`.
+Within each group, the order of rows in `df` is preserved.
+
+`cols` can be any valid data frame indexing expression.
+In particular if it is an empty vector then a single-group `GroupedDataFrame`
+is created.
+
+A `GroupedDataFrame` also supports
+indexing by groups, `map` (which applies a function to each group)
+and `combine` (which applies a function to each group
+and combines the result into a data frame).
+
+`GroupedDataFrame` also supports the dictionary interface. The keys are
+[`GroupKey`](@ref) objects returned by [`keys(::GroupedDataFrame)`](@ref),
+which can also be used to get the values of the grouping columns for each group.
+`Tuples` and `NamedTuple`s containing the values of the grouping columns (in the
+same order as the `cols` argument) are also accepted as indices. Finally,
+an `AbstractDict` can be used to index into a grouped data frame where
+the keys are column names of the data frame. The order of the keys does
+not matter in this case.
+
+# See also
+
+[`combine`](@ref), [`select`](@ref), [`select!`](@ref), [`transform`](@ref), [`transform!`](@ref)
+
+# Examples
+```julia
+julia> df = DataFrame(a = repeat([1, 2, 3, 4], outer=[2]),
+                      b = repeat([2, 1], outer=[4]),
+                      c = 1:8);
+
+julia> gd = groupby(df, :a)
+GroupedDataFrame with 4 groups based on key: a
+First Group (2 rows): a = 1
+│ Row │ a     │ b     │ c     │
+│     │ Int64 │ Int64 │ Int64 │
+├─────┼───────┼───────┼───────┤
+│ 1   │ 1     │ 2     │ 1     │
+│ 2   │ 1     │ 2     │ 5     │
+⋮
+Last Group (2 rows): a = 4
+│ Row │ a     │ b     │ c     │
+│     │ Int64 │ Int64 │ Int64 │
+├─────┼───────┼───────┼───────┤
+│ 1   │ 4     │ 1     │ 4     │
+│ 2   │ 4     │ 1     │ 8     │
+
+julia> gd[1]
+2×3 SubDataFrame
+│ Row │ a     │ b     │ c     │
+│     │ Int64 │ Int64 │ Int64 │
+├─────┼───────┼───────┼───────┤
+│ 1   │ 1     │ 2     │ 1     │
+│ 2   │ 1     │ 2     │ 5     │
+
+julia> last(gd)
+2×3 SubDataFrame
+│ Row │ a     │ b     │ c     │
+│     │ Int64 │ Int64 │ Int64 │
+├─────┼───────┼───────┼───────┤
+│ 1   │ 4     │ 1     │ 4     │
+│ 2   │ 4     │ 1     │ 8     │
+
+julia> gd[(a=3,)]
+2×3 SubDataFrame
+│ Row │ a     │ b     │ c     │
+│     │ Int64 │ Int64 │ Int64 │
+├─────┼───────┼───────┼───────┤
+│ 1   │ 3     │ 2     │ 3     │
+│ 2   │ 3     │ 2     │ 7     │
+
+julia> gd[Dict("a" => 3)]
+2×3 SubDataFrame
+│ Row │ a     │ b     │ c     │
+│     │ Int64 │ Int64 │ Int64 │
+├─────┼───────┼───────┼───────┤
+│ 1   │ 3     │ 2     │ 3     │
+│ 2   │ 3     │ 2     │ 7     │
+
+julia> gd[(3,)]
+2×3 SubDataFrame
+│ Row │ a     │ b     │ c     │
+│     │ Int64 │ Int64 │ Int64 │
+├─────┼───────┼───────┼───────┤
+│ 1   │ 3     │ 2     │ 3     │
+│ 2   │ 3     │ 2     │ 7     │
+
+julia> k = first(keys(gd))
+GroupKey: (a = 3)
+
+julia> gd[k]
+2×3 SubDataFrame
+│ Row │ a     │ b     │ c     │
+│     │ Int64 │ Int64 │ Int64 │
+├─────┼───────┼───────┼───────┤
+│ 1   │ 3     │ 2     │ 3     │
+│ 2   │ 3     │ 2     │ 7     │
+
+julia> for g in gd
+           println(g)
+       end
+2×3 SubDataFrame
+│ Row │ a     │ b     │ c     │
+│     │ Int64 │ Int64 │ Int64 │
+├─────┼───────┼───────┼───────┤
+│ 1   │ 1     │ 2     │ 1     │
+│ 2   │ 1     │ 2     │ 5     │
+2×3 SubDataFrame
+│ Row │ a     │ b     │ c     │
+│     │ Int64 │ Int64 │ Int64 │
+├─────┼───────┼───────┼───────┤
+│ 1   │ 2     │ 1     │ 2     │
+│ 2   │ 2     │ 1     │ 6     │
+2×3 SubDataFrame
+│ Row │ a     │ b     │ c     │
+│     │ Int64 │ Int64 │ Int64 │
+├─────┼───────┼───────┼───────┤
+│ 1   │ 3     │ 2     │ 3     │
+│ 2   │ 3     │ 2     │ 7     │
+2×3 SubDataFrame
+│ Row │ a     │ b     │ c     │
+│     │ Int64 │ Int64 │ Int64 │
+├─────┼───────┼───────┼───────┤
+│ 1   │ 4     │ 1     │ 4     │
+│ 2   │ 4     │ 1     │ 8     │
+```
+"""
+function groupby(df::AbstractDataFrame, cols;
+                 sort::Bool=false, skipmissing::Bool=false)
+    _check_consistency(df)
+    idxcols = index(df)[cols]
+    if isempty(idxcols)
+        return GroupedDataFrame(df, Symbol[], ones(Int, nrow(df)),
+                                nothing, nothing, nothing, nrow(df) == 0 ? 0 : 1,
+                                nothing, Threads.ReentrantLock())
+    end
+    sdf = select(df, idxcols, copycols=false)
+
+    groups = Vector{Int}(undef, nrow(df))
+    ngroups, rhashes, gslots, sorted =
+        row_group_slots(ntuple(i -> sdf[!, i], ncol(sdf)), Val(false),
+                        groups, skipmissing, sort)
+
+    gd = GroupedDataFrame(df, copy(_names(sdf)), groups, nothing, nothing, nothing, ngroups, nothing,
+                          Threads.ReentrantLock())
+
+    # sort groups if row_group_slots hasn't already done that
+    if sort && !sorted
+        # Find index of representative row for each group
+        idx = Vector{Int}(undef, length(gd))
+        fillfirst!(nothing, idx, 1:nrow(parent(gd)), gd)
+        group_invperm = invperm(sortperm(view(parent(gd)[!, gd.cols], idx, :)))
+        groups = gd.groups
+        @inbounds for i in eachindex(groups)
+            gix = groups[i]
+            groups[i] = gix == 0 ? 0 : group_invperm[gix]
+        end
+    end
+
+    return gd
+end
+
 function genkeymap(gd, cols)
     # currently we use Dict{Any,Int} because then field :keymap in GroupedDataFrame
     # has a concrete type which makes the access to it faster as we do not have a dynamic
diff --git a/src/groupeddataframe/splitapplycombine.jl b/src/groupeddataframe/splitapplycombine.jl
index 5664e7449e..0b848b3d81 100644
--- a/src/groupeddataframe/splitapplycombine.jl
+++ b/src/groupeddataframe/splitapplycombine.jl
@@ -1,504 +1,32 @@
+# in this file we use cs and cs_i variable names that mean "target columns specification"
+
 # this constant defines which types of values returned by aggregation function
 # in combine are considered to produce multiple columns in the resulting data frame
 const MULTI_COLS_TYPE = Union{AbstractDataFrame, NamedTuple, DataFrameRow, AbstractMatrix}
 
-"""
-    groupby(d::AbstractDataFrame, cols; sort=false, skipmissing=false)
-
-Return a `GroupedDataFrame` representing a view of an `AbstractDataFrame` split
-into row groups.
-
-# Arguments
-- `df` : an `AbstractDataFrame` to split
-- `cols` : data frame columns to group by. Can be any column selector
-  ($COLUMNINDEX_STR; $MULTICOLUMNINDEX_STR).
-- `sort` : whether to sort groups according to the values of the grouping columns
-  `cols`; if all `cols` are `CategoricalVector`s then groups are always sorted
-  irrespective of the value of `sort`
-- `skipmissing` : whether to skip groups with `missing` values in one of the
-  grouping columns `cols`
-
-# Details
-An iterator over a `GroupedDataFrame` returns a `SubDataFrame` view
-for each grouping into `df`.
-Within each group, the order of rows in `df` is preserved.
-
-`cols` can be any valid data frame indexing expression.
-In particular if it is an empty vector then a single-group `GroupedDataFrame`
-is created.
-
-A `GroupedDataFrame` also supports
-indexing by groups, `map` (which applies a function to each group)
-and `combine` (which applies a function to each group
-and combines the result into a data frame).
-
-`GroupedDataFrame` also supports the dictionary interface. The keys are
-[`GroupKey`](@ref) objects returned by [`keys(::GroupedDataFrame)`](@ref),
-which can also be used to get the values of the grouping columns for each group.
-`Tuples` and `NamedTuple`s containing the values of the grouping columns (in the
-same order as the `cols` argument) are also accepted as indices. Finally,
-an `AbstractDict` can be used to index into a grouped data frame where
-the keys are column names of the data frame. The order of the keys does
-not matter in this case.
-
-# See also
-
-[`combine`](@ref), [`select`](@ref), [`select!`](@ref), [`transform`](@ref), [`transform!`](@ref)
-
-# Examples
-```julia
-julia> df = DataFrame(a = repeat([1, 2, 3, 4], outer=[2]),
-                      b = repeat([2, 1], outer=[4]),
-                      c = 1:8);
-
-julia> gd = groupby(df, :a)
-GroupedDataFrame with 4 groups based on key: a
-First Group (2 rows): a = 1
-│ Row │ a     │ b     │ c     │
-│     │ Int64 │ Int64 │ Int64 │
-├─────┼───────┼───────┼───────┤
-│ 1   │ 1     │ 2     │ 1     │
-│ 2   │ 1     │ 2     │ 5     │
-⋮
-Last Group (2 rows): a = 4
-│ Row │ a     │ b     │ c     │
-│     │ Int64 │ Int64 │ Int64 │
-├─────┼───────┼───────┼───────┤
-│ 1   │ 4     │ 1     │ 4     │
-│ 2   │ 4     │ 1     │ 8     │
-
-julia> gd[1]
-2×3 SubDataFrame
-│ Row │ a     │ b     │ c     │
-│     │ Int64 │ Int64 │ Int64 │
-├─────┼───────┼───────┼───────┤
-│ 1   │ 1     │ 2     │ 1     │
-│ 2   │ 1     │ 2     │ 5     │
-
-julia> last(gd)
-2×3 SubDataFrame
-│ Row │ a     │ b     │ c     │
-│     │ Int64 │ Int64 │ Int64 │
-├─────┼───────┼───────┼───────┤
-│ 1   │ 4     │ 1     │ 4     │
-│ 2   │ 4     │ 1     │ 8     │
-
-julia> gd[(a=3,)]
-2×3 SubDataFrame
-│ Row │ a     │ b     │ c     │
-│     │ Int64 │ Int64 │ Int64 │
-├─────┼───────┼───────┼───────┤
-│ 1   │ 3     │ 2     │ 3     │
-│ 2   │ 3     │ 2     │ 7     │
-
-julia> gd[Dict("a" => 3)]
-2×3 SubDataFrame
-│ Row │ a     │ b     │ c     │
-│     │ Int64 │ Int64 │ Int64 │
-├─────┼───────┼───────┼───────┤
-│ 1   │ 3     │ 2     │ 3     │
-│ 2   │ 3     │ 2     │ 7     │
-
-julia> gd[(3,)]
-2×3 SubDataFrame
-│ Row │ a     │ b     │ c     │
-│     │ Int64 │ Int64 │ Int64 │
-├─────┼───────┼───────┼───────┤
-│ 1   │ 3     │ 2     │ 3     │
-│ 2   │ 3     │ 2     │ 7     │
-
-julia> k = first(keys(gd))
-GroupKey: (a = 3)
-
-julia> gd[k]
-2×3 SubDataFrame
-│ Row │ a     │ b     │ c     │
-│     │ Int64 │ Int64 │ Int64 │
-├─────┼───────┼───────┼───────┤
-│ 1   │ 3     │ 2     │ 3     │
-│ 2   │ 3     │ 2     │ 7     │
-
-julia> for g in gd
-           println(g)
-       end
-2×3 SubDataFrame
-│ Row │ a     │ b     │ c     │
-│     │ Int64 │ Int64 │ Int64 │
-├─────┼───────┼───────┼───────┤
-│ 1   │ 1     │ 2     │ 1     │
-│ 2   │ 1     │ 2     │ 5     │
-2×3 SubDataFrame
-│ Row │ a     │ b     │ c     │
-│     │ Int64 │ Int64 │ Int64 │
-├─────┼───────┼───────┼───────┤
-│ 1   │ 2     │ 1     │ 2     │
-│ 2   │ 2     │ 1     │ 6     │
-2×3 SubDataFrame
-│ Row │ a     │ b     │ c     │
-│     │ Int64 │ Int64 │ Int64 │
-├─────┼───────┼───────┼───────┤
-│ 1   │ 3     │ 2     │ 3     │
-│ 2   │ 3     │ 2     │ 7     │
-2×3 SubDataFrame
-│ Row │ a     │ b     │ c     │
-│     │ Int64 │ Int64 │ Int64 │
-├─────┼───────┼───────┼───────┤
-│ 1   │ 4     │ 1     │ 4     │
-│ 2   │ 4     │ 1     │ 8     │
-```
-"""
-function groupby(df::AbstractDataFrame, cols;
-                 sort::Bool=false, skipmissing::Bool=false)
-    _check_consistency(df)
-    idxcols = index(df)[cols]
-    if isempty(idxcols)
-        return GroupedDataFrame(df, Symbol[], ones(Int, nrow(df)),
-                                nothing, nothing, nothing, nrow(df) == 0 ? 0 : 1,
-                                nothing, Threads.ReentrantLock())
-    end
-    sdf = select(df, idxcols, copycols=false)
-
-    groups = Vector{Int}(undef, nrow(df))
-    ngroups, rhashes, gslots, sorted =
-        row_group_slots(ntuple(i -> sdf[!, i], ncol(sdf)), Val(false),
-                        groups, skipmissing, sort)
-
-    gd = GroupedDataFrame(df, copy(_names(sdf)), groups, nothing, nothing, nothing, ngroups, nothing,
-                          Threads.ReentrantLock())
-
-    # sort groups if row_group_slots hasn't already done that
-    if sort && !sorted
-        # Find index of representative row for each group
-        idx = Vector{Int}(undef, length(gd))
-        fillfirst!(nothing, idx, 1:nrow(parent(gd)), gd)
-        group_invperm = invperm(sortperm(view(parent(gd)[!, gd.cols], idx, :)))
-        groups = gd.groups
-        @inbounds for i in eachindex(groups)
-            gix = groups[i]
-            groups[i] = gix == 0 ? 0 : group_invperm[gix]
-        end
-    end
-
-    return gd
-end
-
-const F_TYPE_RULES =
-    """
-    `fun` can return a single value, a row, a vector, or multiple rows.
-    The type of the returned value determines the shape of the resulting `DataFrame`.
-    There are four kind of return values allowed:
-    - A single value gives a `DataFrame` with a single additional column and one row
-      per group.
-    - A named tuple of single values or a [`DataFrameRow`](@ref) gives a `DataFrame`
-      with one additional column for each field and one row per group (returning a
-      named tuple will be faster). It is not allowed to mix single values and vectors
-      if a named tuple is returned.
-    - A vector gives a `DataFrame` with a single additional column and as many rows
-      for each group as the length of the returned vector for that group.
-    - A data frame, a named tuple of vectors or a matrix gives a `DataFrame` with
-      the same additional columns and as many rows for each group as the rows
-      returned for that group (returning a named tuple is the fastest option).
-      Returning a table with zero columns is allowed, whatever the number of columns
-      returned for other groups.
-
-    `fun` must always return the same kind of object (out of four
-    kinds defined above) for all groups, and with the same column names.
-
-    Optimized methods are used when standard summary functions (`sum`, `prod`,
-    `minimum`, `maximum`, `mean`, `var`, `std`, `first`, `last` and `length`)
-    are specified using the `Pair` syntax (e.g. `:col => sum`).
-    When computing the `sum` or `mean` over floating point columns, results will be
-    less accurate than the standard `sum` function (which uses pairwise
-    summation). Use `col => x -> sum(x)` to avoid the optimized method and use the
-    slower, more accurate one.
-
-    Column names are automatically generated when necessary using the rules defined
-    in [`select`](@ref) if the `Pair` syntax is used and `fun` returns a single
-    value or a vector (e.g. for `:col => sum` the column name is `col_sum`); otherwise
-    (if `fun` is a function or a return value is an `AbstractMatrix`) columns are
-    named `x1`, `x2` and so on.
-    """
-
-const F_ARGUMENT_RULES =
-    """
-
-    Arguments passed as `args...` can be:
-
-    * Any index that is allowed for column indexing ($COLUMNINDEX_STR, $MULTICOLUMNINDEX_STR).
-    * Column transformation operations using the `Pair` notation that is described below
-      and vectors of such pairs.
-
-    Transformations allowed using `Pair`s follow the rules specified for
-    [`select`](@ref) and have the form `source_cols => fun`, `source_cols => fun
-    => target_col`, or `source_col => target_col`. Function `fun` is passed
-    `SubArray` views as positional arguments for each column specified to be
-    selected, or a `NamedTuple` containing these `SubArray`s if `source_cols` is
-    an `AsTable` selector. It can return a vector or a single value (defined
-    precisely below). If automatic generation of target column
-    name is required it respects the `renamecols` keyword argument following the
-    rules described in [`select`](@ref).
-
-    As a special case `nrow` or `nrow => target_col` can be passed without specifying
-    input columns to efficiently calculate number of rows in each group.
-    If `nrow` is passed the resulting column name is `:nrow`.
-
-    If multiple `args` are passed then return values of different `fun`s are allowed
-    to mix single values and vectors. In this case single values will be
-    broadcasted to match the length of columns specified by returned vectors.
-    As a particular rule, values wrapped in a `Ref` or a `0`-dimensional `AbstractArray`
-    are unwrapped and then broadcasted.
-
-    If the first or last argument is `pair` then it must be a `Pair` following the
-    rules for pairs described above, except that in this case function defined
-    by `fun` can return any return value defined below.
-
-    If the first or last argument is a function `fun`, it is passed a [`SubDataFrame`](@ref)
-    view for each group and can return any return value defined below.
-    Note that this form is slower than `pair` or `args` due to type instability.
-
-    If `gd` has zero groups then no transformations are applied.
-    """
-
-const KWARG_PROCESSING_RULES =
-    """
-    If `keepkeys=true`, the resulting `DataFrame` contains all the grouping columns
-    in addition to those generated. In this case if the returned
-    value contains columns with the same names as the grouping columns, they are
-    required to be equal.
-    If `keepkeys=false` and some generated columns have the same name as grouping columns,
-    they are kept and are not required to be equal to grouping columns.
-
-    If `ungroup=true` (the default) a `DataFrame` is returned.
-    If `ungroup=false` a `GroupedDataFrame` grouped using `keycols(gdf)` is returned.
-
-    If `gd` has zero groups then transformations are applied to vectors of zero length.
-    """
-
-"""
-    combine(gd::GroupedDataFrame, args...; keepkeys::Bool=true, ungroup::Bool=true,
-            renamecols::Bool=true)
-    combine(fun::Union{Function, Type}, gd::GroupedDataFrame;
-            keepkeys::Bool=true, ungroup::Bool=true, renamecols::Bool=true)
-    combine(pair::Pair, gd::GroupedDataFrame; keepkeys::Bool=true, ungroup::Bool=true,
-            renamecols::Bool=true)
-
-Apply operations to each group in a [`GroupedDataFrame`](@ref) and return the combined
-result as a `DataFrame` if `ungroup=true` or `GroupedDataFrame` if `ungroup=false`.
-
-If an `AbstractDataFrame` is passed, apply operations to the data frame as a whole
-and a `DataFrame` is always returend.
-
-$F_ARGUMENT_RULES
-
-$F_TYPE_RULES
-
-$KWARG_PROCESSING_RULES
-
-Ordering of rows follows the order of groups in `gdf`.
-
-# See also
-
-[`groupby`](@ref), [`select`](@ref), [`select!`](@ref), [`transform`](@ref), [`transform!`](@ref)
-
-# Examples
-```jldoctest
-julia> df = DataFrame(a = repeat([1, 2, 3, 4], outer=[2]),
-                      b = repeat([2, 1], outer=[4]),
-                      c = 1:8);
-
-julia> gd = groupby(df, :a);
-
-julia> combine(gd, :c => sum, nrow)
-4×3 DataFrame
-│ Row │ a     │ c_sum │ nrow  │
-│     │ Int64 │ Int64 │ Int64 │
-├─────┼───────┼───────┼───────┤
-│ 1   │ 1     │ 6     │ 2     │
-│ 2   │ 2     │ 8     │ 2     │
-│ 3   │ 3     │ 10    │ 2     │
-│ 4   │ 4     │ 12    │ 2     │
-
-julia> combine(gd, :c => sum, nrow, ungroup=false)
-GroupedDataFrame with 4 groups based on key: a
-First Group (1 row): a = 1
-│ Row │ a     │ c_sum │ nrow  │
-│     │ Int64 │ Int64 │ Int64 │
-├─────┼───────┼───────┼───────┤
-│ 1   │ 1     │ 6     │ 2     │
-⋮
-Last Group (1 row): a = 4
-│ Row │ a     │ c_sum │ nrow  │
-│     │ Int64 │ Int64 │ Int64 │
-├─────┼───────┼───────┼───────┤
-│ 1   │ 4     │ 12    │ 2     │
-
-julia> combine(sdf -> sum(sdf.c), gd) # Slower variant
-4×2 DataFrame
-│ Row │ a     │ x1    │
-│     │ Int64 │ Int64 │
-├─────┼───────┼───────┤
-│ 1   │ 1     │ 6     │
-│ 2   │ 2     │ 8     │
-│ 3   │ 3     │ 10    │
-│ 4   │ 4     │ 12    │
-
-julia> combine(gdf) do d # do syntax for the slower variant
-           sum(d.c)
-       end
-4×2 DataFrame
-│ Row │ a     │ x1    │
-│     │ Int64 │ Int64 │
-├─────┼───────┼───────┤
-│ 1   │ 1     │ 6     │
-│ 2   │ 2     │ 8     │
-│ 3   │ 3     │ 10    │
-│ 4   │ 4     │ 12    │
-
-julia> combine(gd, :c => (x -> sum(log, x)) => :sum_log_c) # specifying a name for target column
-4×2 DataFrame
-│ Row │ a     │ sum_log_c │
-│     │ Int64 │ Float64   │
-├─────┼───────┼───────────┤
-│ 1   │ 1     │ 1.60944   │
-│ 2   │ 2     │ 2.48491   │
-│ 3   │ 3     │ 3.04452   │
-│ 4   │ 4     │ 3.46574   │
-
-
-julia> combine(gd, [:b, :c] .=> sum) # passing a vector of pairs
-4×3 DataFrame
-│ Row │ a     │ b_sum │ c_sum │
-│     │ Int64 │ Int64 │ Int64 │
-├─────┼───────┼───────┼───────┤
-│ 1   │ 1     │ 4     │ 6     │
-│ 2   │ 2     │ 2     │ 8     │
-│ 3   │ 3     │ 4     │ 10    │
-│ 4   │ 4     │ 2     │ 12    │
-
-julia> combine(gd) do sdf # dropping group when DataFrame() is returned
-          sdf.c[1] != 1 ? sdf : DataFrame()
-       end
-6×3 DataFrame
-│ Row │ a     │ b     │ c     │
-│     │ Int64 │ Int64 │ Int64 │
-├─────┼───────┼───────┼───────┤
-│ 1   │ 2     │ 1     │ 2     │
-│ 2   │ 2     │ 1     │ 6     │
-│ 3   │ 3     │ 2     │ 3     │
-│ 4   │ 3     │ 2     │ 7     │
-│ 5   │ 4     │ 1     │ 4     │
-│ 6   │ 4     │ 1     │ 8     │
-
-julia> combine(gd, :b => :b1, :c => :c1,
-               [:b, :c] => +, keepkeys=false) # auto-splatting, renaming and keepkeys
-8×3 DataFrame
-│ Row │ b1    │ c1    │ b_c_+ │
-│     │ Int64 │ Int64 │ Int64 │
-├─────┼───────┼───────┼───────┤
-│ 1   │ 2     │ 1     │ 3     │
-│ 2   │ 2     │ 5     │ 7     │
-│ 3   │ 1     │ 2     │ 3     │
-│ 4   │ 1     │ 6     │ 7     │
-│ 5   │ 2     │ 3     │ 5     │
-│ 6   │ 2     │ 7     │ 9     │
-│ 7   │ 1     │ 4     │ 5     │
-│ 8   │ 1     │ 8     │ 9     │
-
-julia> combine(gd, :b, :c => sum) # passing columns and broadcasting
-8×3 DataFrame
-│ Row │ a     │ b     │ c_sum │
-│     │ Int64 │ Int64 │ Int64 │
-├─────┼───────┼───────┼───────┤
-│ 1   │ 1     │ 2     │ 6     │
-│ 2   │ 1     │ 2     │ 6     │
-│ 3   │ 2     │ 1     │ 8     │
-│ 4   │ 2     │ 1     │ 8     │
-│ 5   │ 3     │ 2     │ 10    │
-│ 6   │ 3     │ 2     │ 10    │
-│ 7   │ 4     │ 1     │ 12    │
-│ 8   │ 4     │ 1     │ 12    │
-
-julia> combine(gd, [:b, :c] .=> Ref)
-4×3 DataFrame
-│ Row │ a     │ b_Ref    │ c_Ref    │
-│     │ Int64 │ SubArra… │ SubArra… │
-├─────┼───────┼──────────┼──────────┤
-│ 1   │ 1     │ [2, 2]   │ [1, 5]   │
-│ 2   │ 2     │ [1, 1]   │ [2, 6]   │
-│ 3   │ 3     │ [2, 2]   │ [3, 7]   │
-│ 4   │ 4     │ [1, 1]   │ [4, 8]   │
-
-julia> combine(gd, AsTable(:) => Ref)
-4×2 DataFrame
-│ Row │ a     │ a_b_c_Ref                            │
-│     │ Int64 │ NamedTuple…                          │
-├─────┼───────┼──────────────────────────────────────┤
-│ 1   │ 1     │ (a = [1, 1], b = [2, 2], c = [1, 5]) │
-│ 2   │ 2     │ (a = [2, 2], b = [1, 1], c = [2, 6]) │
-│ 3   │ 3     │ (a = [3, 3], b = [2, 2], c = [3, 7]) │
-│ 4   │ 4     │ (a = [4, 4], b = [1, 1], c = [4, 8]) │
-
-julia> combine(gd, :, AsTable(Not(:a)) => sum, renamecols=false)
-8×4 DataFrame
-│ Row │ a     │ b     │ c     │ b_c   │
-│     │ Int64 │ Int64 │ Int64 │ Int64 │
-├─────┼───────┼───────┼───────┼───────┤
-│ 1   │ 1     │ 2     │ 1     │ 3     │
-│ 2   │ 1     │ 2     │ 5     │ 7     │
-│ 3   │ 2     │ 1     │ 2     │ 3     │
-│ 4   │ 2     │ 1     │ 6     │ 7     │
-│ 5   │ 3     │ 2     │ 3     │ 5     │
-│ 6   │ 3     │ 2     │ 7     │ 9     │
-│ 7   │ 4     │ 1     │ 4     │ 5     │
-│ 8   │ 4     │ 1     │ 8     │ 9     │
-```
-"""
-function combine(f::Base.Callable, gd::GroupedDataFrame;
-                 keepkeys::Bool=true, ungroup::Bool=true, renamecols::Bool=true)
-    return combine_helper(f, gd, keepkeys=keepkeys, ungroup=ungroup,
-                          copycols=true, keeprows=false, renamecols=renamecols)
-end
-
-combine(f::typeof(nrow), gd::GroupedDataFrame;
-        keepkeys::Bool=true, ungroup::Bool=true, renamecols::Bool=true) =
-    combine(gd, [nrow => :nrow], keepkeys=keepkeys, ungroup=ungroup,
-            renamecols=renamecols)
-
-function combine(p::Pair, gd::GroupedDataFrame;
-                 keepkeys::Bool=true, ungroup::Bool=true, renamecols::Bool=true)
-    # move handling of aggregate to specialized combine
-    p_from, p_to = p
-
-    # verify if it is not better to use a fast path, which we achieve
-    # by moving to combine(::GroupedDataFrame, ::AbstractVector) method
-    # note that even if length(gd) == 0 we can do this step
-    if isagg(p_from => (p_to isa Pair ? first(p_to) : p_to), gd) || p_from === nrow
-        return combine(gd, [p], keepkeys=keepkeys, ungroup=ungroup, renamecols=renamecols)
-    end
-
-    if p_from isa Tuple
-        cs = collect(p_from)
-        # an explicit error is thrown as this was allowed in the past
-        throw(ArgumentError("passing a Tuple $p_from as column selector is not supported" *
-                            ", use a vector $cs instead"))
-    else
-        cs = p_from
+function gen_groups(idx::Vector{Int})
+    groups = zeros(Int, length(idx))
+    groups[1] = 1
+    j = 1
+    last_idx = idx[1]
+    @inbounds for i in 2:length(idx)
+        cur_idx = idx[i]
+        j += cur_idx != last_idx
+        last_idx = cur_idx
+        groups[i] = j
     end
-    return combine_helper(cs => p_to, gd, keepkeys=keepkeys, ungroup=ungroup,
-                          copycols=true, keeprows=false, renamecols=renamecols)
+    return groups
 end
 
-combine(gd::GroupedDataFrame,
-        cs::Union{Pair, typeof(nrow), ColumnIndex, MultiColumnIndex}...;
-        keepkeys::Bool=true, ungroup::Bool=true, renamecols::Bool=true) =
-    _combine_prepare(gd, cs..., keepkeys=keepkeys, ungroup=ungroup,
-                     copycols=true, keeprows=false, renamecols=renamecols)
-
 function _combine_prepare(gd::GroupedDataFrame,
-                          @nospecialize(cs::Union{Pair, typeof(nrow),
+                          @nospecialize(cs::Union{Pair, Base.Callable,
                                         ColumnIndex, MultiColumnIndex}...);
                           keepkeys::Bool, ungroup::Bool, copycols::Bool,
                           keeprows::Bool, renamecols::Bool)
+    if !ungroup && !keepkeys
+        throw(ArgumentError("keepkeys=false when ungroup=false is not allowed"))
+    end
+
     cs_vec = []
     for p in cs
         if p === nrow
@@ -514,91 +42,33 @@ function _combine_prepare(gd::GroupedDataFrame,
         # an explicit error is thrown as this was allowed in the past
         throw(ArgumentError("passing a Tuple $(first(x)) as column selector is not supported" *
                             ", use a vector $(collect(first(x))) instead"))
-        for (i, v) in enumerate(cs_vec)
-            if first(v) isa Tuple
-                cs_vec[i] = collect(first(v)) => last(v)
-            end
-        end
     end
-    cs_norm_pre = [normalize_selection(index(parent(gd)), c, renamecols) for c in cs_vec]
-    seen_cols = Set{Symbol}()
-    process_vectors = false
-    for v in cs_norm_pre
-        if v isa Pair
-            out_col = last(last(v))
-            if out_col in seen_cols
-                throw(ArgumentError("Duplicate output column name $out_col requested"))
+
+    cs_norm = []
+    optional_transform = Bool[]
+    for c in cs_vec
+        arg = normalize_selection(index(parent(gd)), c, renamecols)
+        if arg isa AbstractVector{Int}
+            for col_idx in arg
+                push!(cs_norm, col_idx => identity => _names(gd)[col_idx])
+                push!(optional_transform, true)
             end
-            push!(seen_cols, out_col)
         else
-            @assert v isa AbstractVector{Int}
-            process_vectors = true
-        end
-    end
-    processed_cols = Set{Symbol}()
-    if process_vectors
-        cs_norm = Pair[]
-        for (i, v) in enumerate(cs_norm_pre)
-            if v isa Pair
-                push!(cs_norm, v)
-                push!(processed_cols, last(last(v)))
-            else
-                @assert v isa AbstractVector{Int}
-                for col_idx in v
-                    col_name = _names(gd)[col_idx]
-                    if !(col_name in processed_cols)
-                        push!(processed_cols, col_name)
-                        if col_name in seen_cols
-                            trans_idx = findfirst(cs_norm_pre) do p
-                                p isa Pair || return false
-                                last(last(p)) == col_name
-                            end
-                            @assert !isnothing(trans_idx) && trans_idx > i
-                            push!(cs_norm, cs_norm_pre[trans_idx])
-                            # it is safe to delete from cs_norm_pre
-                            # as we have not reached trans_idx index yet
-                            deleteat!(cs_norm_pre, trans_idx)
-                        else
-                            push!(cs_norm, col_idx => identity => col_name)
-                        end
-                    end
-                end
-            end
+            push!(cs_norm, arg)
+            push!(optional_transform, false)
         end
-    else
-        cs_norm = collect(Pair, cs_norm_pre)
     end
-    f = Pair[first(x) => first(last(x)) for x in cs_norm]
-    nms = Symbol[last(last(x)) for x in cs_norm]
-    return combine_helper(f, gd, nms, keepkeys=keepkeys, ungroup=ungroup,
-                          copycols=copycols, keeprows=keeprows, renamecols=renamecols)
-end
 
-function gen_groups(idx::Vector{Int})
-    groups = zeros(Int, length(idx))
-    groups[1] = 1
-    j = 1
-    last_idx = idx[1]
-    @inbounds for i in 2:length(idx)
-        cur_idx = idx[i]
-        j += cur_idx != last_idx
-        last_idx = cur_idx
-        groups[i] = j
-    end
-    return groups
-end
+    # cs_norm holds now either src => fun => dst or just fun
+    # if optional_transform[i] is true then the transformation will be skipped
+    # if earlier column with a column with the same name was created
+
+    idx, valscat = _combine(gd, cs_norm, optional_transform, copycols, keeprows, renamecols)
 
-function combine_helper(f, gd::GroupedDataFrame,
-                        nms::Union{AbstractVector{Symbol},Nothing}=nothing;
-                        keepkeys::Bool, ungroup::Bool,
-                        copycols::Bool, keeprows::Bool, renamecols::Bool)
-    if !ungroup && !keepkeys
-        throw(ArgumentError("keepkeys=false when ungroup=false is not allowed"))
-    end
-    idx, valscat = _combine(f, gd, nms, copycols, keeprows, renamecols)
     !keepkeys && ungroup && return valscat
-    keys = groupcols(gd)
-    for key in keys
+
+    gd_keys = groupcols(gd)
+    for key in gd_keys
         if hasproperty(valscat, key)
             if (keeprows && !isequal(valscat[!, key], parent(gd)[!, key])) ||
                 (!keeprows && !isequal(valscat[!, key], view(parent(gd)[!, key], idx)))
@@ -612,17 +82,17 @@ function combine_helper(f, gd::GroupedDataFrame,
     else
         newparent = length(gd) > 0 ? parent(gd)[idx, gd.cols] : parent(gd)[1:0, gd.cols]
     end
-    added_cols = select(valscat, Not(intersect(keys, _names(valscat))), copycols=false)
+    added_cols = select(valscat, Not(intersect(gd_keys, _names(valscat))), copycols=false)
     hcat!(newparent, length(gd) > 0 ? added_cols : similar(added_cols, 0), copycols=false)
     ungroup && return newparent
 
-    if length(idx) == 0 && !(keeprows && length(keys) > 0)
+    if length(idx) == 0 && !(keeprows && length(gd_keys) > 0)
         @assert nrow(newparent) == 0
         return GroupedDataFrame(newparent, copy(gd.cols), Int[],
                                 Int[], Int[], Int[], 0, Dict{Any,Int}(),
                                 Threads.ReentrantLock())
     elseif keeprows
-        @assert length(keys) > 0 || idx == gd.idx
+        @assert length(gd_keys) > 0 || idx == gd.idx
         # in this case we are sure that the result GroupedDataFrame has the
         # same structure as the source except that grouping columns are at the start
         return Threads.lock(gd.lazy_lock) do
@@ -640,220 +110,6 @@ function combine_helper(f, gd::GroupedDataFrame,
     end
 end
 
-# Wrapping automatically adds column names when the value returned
-# by the user-provided function lacks them
-wrap(x::Union{AbstractDataFrame, NamedTuple, DataFrameRow}) = x
-wrap(x::AbstractMatrix) =
-    NamedTuple{Tuple(gennames(size(x, 2)))}(Tuple(view(x, :, i) for i in 1:size(x, 2)))
-wrap(x::Any) = (x1=x,)
-
-const ERROR_ROW_COUNT = "return value must not change its kind " *
-                        "(single row or variable number of rows) across groups"
-
-const ERROR_COL_COUNT = "function must return only single-column values, " *
-                        "or only multiple-column values"
-
-wrap_table(x::Any, ::Val) =
-    throw(ArgumentError(ERROR_ROW_COUNT))
-function wrap_table(x::Union{NamedTuple{<:Any, <:Tuple{Vararg{AbstractVector}}},
-                             AbstractDataFrame, AbstractMatrix},
-                             ::Val{firstmulticol}) where firstmulticol
-    if !firstmulticol
-        throw(ArgumentError(ERROR_COL_COUNT))
-    end
-    return wrap(x)
-end
-
-function wrap_table(x::AbstractVector, ::Val{firstmulticol}) where firstmulticol
-    if firstmulticol
-        throw(ArgumentError(ERROR_COL_COUNT))
-    end
-    return wrap(x)
-end
-
-function wrap_row(x::Any, ::Val{firstmulticol}) where firstmulticol
-    # NamedTuple is not possible in this branch
-    if (x isa DataFrameRow) ⊻ firstmulticol
-        throw(ArgumentError(ERROR_COL_COUNT))
-    end
-    return wrap(x)
-end
-
-function wrap_row(x::Union{AbstractArray{<:Any, 0}, Ref},
-                  ::Val{firstmulticol}) where firstmulticol
-    if firstmulticol
-        throw(ArgumentError(ERROR_COL_COUNT))
-    end
-    return (x1 = x[],)
-end
-
-# note that also NamedTuple() is correctly captured by this definition
-# as it is more specific than the one below
-wrap_row(::Union{AbstractVecOrMat, AbstractDataFrame,
-                 NamedTuple{<:Any, <:Tuple{Vararg{AbstractVector}}}}, ::Val) =
-    throw(ArgumentError(ERROR_ROW_COUNT))
-
-function wrap_row(x::NamedTuple, ::Val{firstmulticol}) where firstmulticol
-    if any(v -> v isa AbstractVector, x)
-        throw(ArgumentError("mixing single values and vectors in a named tuple is not allowed"))
-    end
-    if !firstmulticol
-        throw(ArgumentError(ERROR_COL_COUNT))
-    end
-    return x
-end
-
-# idx, starts and ends are passed separately to avoid cost of field access in tight loop
-# Manual unrolling of Tuple is used as it turned out more efficient than @generated
-# for small number of columns passed.
-# For more than 4 columns `map` is slower than @generated
-# but this case is probably rare and if huge number of columns is passed @generated
-# has very high compilation cost
-function do_call(f::Any, idx::AbstractVector{<:Integer},
-                 starts::AbstractVector{<:Integer}, ends::AbstractVector{<:Integer},
-                 gd::GroupedDataFrame, incols::Tuple{}, i::Integer)
-    if f isa ByRow
-        return [f.fun() for _ in 1:(ends[i] - starts[i] + 1)]
-    else
-        return f()
-    end
-end
-
-function do_call(f::Any, idx::AbstractVector{<:Integer},
-                 starts::AbstractVector{<:Integer}, ends::AbstractVector{<:Integer},
-                 gd::GroupedDataFrame, incols::Tuple{AbstractVector}, i::Integer)
-    idx = idx[starts[i]:ends[i]]
-    return f(view(incols[1], idx))
-end
-
-function do_call(f::Any, idx::AbstractVector{<:Integer},
-                 starts::AbstractVector{<:Integer}, ends::AbstractVector{<:Integer},
-                 gd::GroupedDataFrame, incols::NTuple{2, AbstractVector}, i::Integer)
-    idx = idx[starts[i]:ends[i]]
-    return f(view(incols[1], idx), view(incols[2], idx))
-end
-
-function do_call(f::Any, idx::AbstractVector{<:Integer},
-                 starts::AbstractVector{<:Integer}, ends::AbstractVector{<:Integer},
-                 gd::GroupedDataFrame, incols::NTuple{3, AbstractVector}, i::Integer)
-    idx = idx[starts[i]:ends[i]]
-    return f(view(incols[1], idx), view(incols[2], idx), view(incols[3], idx))
-end
-
-function do_call(f::Any, idx::AbstractVector{<:Integer},
-                 starts::AbstractVector{<:Integer}, ends::AbstractVector{<:Integer},
-                 gd::GroupedDataFrame, incols::NTuple{4, AbstractVector}, i::Integer)
-    idx = idx[starts[i]:ends[i]]
-    return f(view(incols[1], idx), view(incols[2], idx), view(incols[3], idx),
-             view(incols[4], idx))
-end
-
-function do_call(f::Any, idx::AbstractVector{<:Integer},
-                 starts::AbstractVector{<:Integer}, ends::AbstractVector{<:Integer},
-                 gd::GroupedDataFrame, incols::Tuple, i::Integer)
-    idx = idx[starts[i]:ends[i]]
-    return f(map(c -> view(c, idx), incols)...)
-end
-
-function do_call(f::Any, idx::AbstractVector{<:Integer},
-                 starts::AbstractVector{<:Integer}, ends::AbstractVector{<:Integer},
-                 gd::GroupedDataFrame, incols::NamedTuple, i::Integer)
-    if f isa ByRow && isempty(incols)
-        return [f.fun(NamedTuple()) for _ in 1:(ends[i] - starts[i] + 1)]
-    else
-        idx = idx[starts[i]:ends[i]]
-        return f(map(c -> view(c, idx), incols))
-    end
-end
-
-function do_call(f::Any, idx::AbstractVector{<:Integer},
-                 starts::AbstractVector{<:Integer}, ends::AbstractVector{<:Integer},
-                 gd::GroupedDataFrame, incols::Nothing, i::Integer)
-    idx = idx[starts[i]:ends[i]]
-    return f(view(parent(gd), idx, :))
-end
-
-_nrow(df::AbstractDataFrame) = nrow(df)
-_nrow(x::NamedTuple{<:Any, <:Tuple{Vararg{AbstractVector}}}) =
-    isempty(x) ? 0 : length(x[1])
-_ncol(df::AbstractDataFrame) = ncol(df)
-_ncol(x::Union{NamedTuple, DataFrameRow}) = length(x)
-
-abstract type AbstractAggregate end
-
-struct Reduce{O, C, A} <: AbstractAggregate
-    op::O
-    condf::C
-    adjust::A
-    checkempty::Bool
-end
-Reduce(f, condf=nothing, adjust=nothing) = Reduce(f, condf, adjust, false)
-
-check_aggregate(f::Any, ::AbstractVector) = f
-check_aggregate(f::typeof(sum), ::AbstractVector{<:Union{Missing, Number}}) =
-    Reduce(Base.add_sum)
-check_aggregate(f::typeof(sum∘skipmissing), ::AbstractVector{<:Union{Missing, Number}}) =
-    Reduce(Base.add_sum, !ismissing)
-check_aggregate(f::typeof(prod), ::AbstractVector{<:Union{Missing, Number}}) =
-    Reduce(Base.mul_prod)
-check_aggregate(f::typeof(prod∘skipmissing), ::AbstractVector{<:Union{Missing, Number}}) =
-    Reduce(Base.mul_prod, !ismissing)
-check_aggregate(f::typeof(maximum),
-                ::AbstractVector{<:Union{Missing, MULTI_COLS_TYPE, AbstractVector}}) = f
-check_aggregate(f::typeof(maximum), v::AbstractVector{<:Union{Missing, Real}}) =
-    eltype(v) === Any ? f : Reduce(max)
-check_aggregate(f::typeof(maximum∘skipmissing),
-                ::AbstractVector{<:Union{Missing, MULTI_COLS_TYPE, AbstractVector}}) = f
-check_aggregate(f::typeof(maximum∘skipmissing), v::AbstractVector{<:Union{Missing, Real}}) =
-    eltype(v) === Any ? f : Reduce(max, !ismissing, nothing, true)
-check_aggregate(f::typeof(minimum),
-                ::AbstractVector{<:Union{Missing, MULTI_COLS_TYPE, AbstractVector}}) = f
-check_aggregate(f::typeof(minimum), v::AbstractVector{<:Union{Missing, Real}}) =
-    eltype(v) === Any ? f : Reduce(min)
-check_aggregate(f::typeof(minimum∘skipmissing),
-                ::AbstractVector{<:Union{Missing, MULTI_COLS_TYPE, AbstractVector}}) = f
-check_aggregate(f::typeof(minimum∘skipmissing), v::AbstractVector{<:Union{Missing, Real}}) =
-    eltype(v) === Any ? f : Reduce(min, !ismissing, nothing, true)
-check_aggregate(f::typeof(mean), ::AbstractVector{<:Union{Missing, Number}}) =
-    Reduce(Base.add_sum, nothing, /)
-check_aggregate(f::typeof(mean∘skipmissing), ::AbstractVector{<:Union{Missing, Number}}) =
-    Reduce(Base.add_sum, !ismissing, /)
-
-# Other aggregate functions which are not strictly reductions
-struct Aggregate{F, C} <: AbstractAggregate
-    f::F
-    condf::C
-end
-Aggregate(f) = Aggregate(f, nothing)
-
-check_aggregate(f::typeof(var), ::AbstractVector{<:Union{Missing, Number}}) =
-    Aggregate(var)
-check_aggregate(f::typeof(var∘skipmissing), ::AbstractVector{<:Union{Missing, Number}}) =
-    Aggregate(var, !ismissing)
-check_aggregate(f::typeof(std), ::AbstractVector{<:Union{Missing, Number}}) =
-    Aggregate(std)
-check_aggregate(f::typeof(std∘skipmissing), ::AbstractVector{<:Union{Missing, Number}}) =
-    Aggregate(std, !ismissing)
-check_aggregate(f::typeof(first), v::AbstractVector) =
-    eltype(v) === Any ? f : Aggregate(first)
-check_aggregate(f::typeof(first),
-                ::AbstractVector{<:Union{Missing, MULTI_COLS_TYPE, AbstractVector}}) = f
-check_aggregate(f::typeof(first∘skipmissing), v::AbstractVector) =
-    eltype(v) === Any ? f : Aggregate(first, !ismissing)
-check_aggregate(f::typeof(first∘skipmissing),
-                ::AbstractVector{<:Union{Missing, MULTI_COLS_TYPE, AbstractVector}}) = f
-check_aggregate(f::typeof(last), v::AbstractVector) =
-    eltype(v) === Any ? f : Aggregate(last)
-check_aggregate(f::typeof(last),
-                ::AbstractVector{<:Union{Missing, MULTI_COLS_TYPE, AbstractVector}}) = f
-check_aggregate(f::typeof(last∘skipmissing), v::AbstractVector) =
-    eltype(v) === Any ? f : Aggregate(last, !ismissing)
-check_aggregate(f::typeof(last∘skipmissing),
-                ::AbstractVector{<:Union{Missing, MULTI_COLS_TYPE, AbstractVector}}) = f
-check_aggregate(f::typeof(length), ::AbstractVector) = Aggregate(length)
-
-# SkipMissing does not support length
-
 # Find first value matching condition for each group
 # Optimized for situations where a matching value is typically encountered
 # among the first rows for each group
@@ -911,226 +167,303 @@ function fillfirst!(condf, outcol::AbstractVector, incol::AbstractVector,
     outcol
 end
 
-# Use a strategy similar to reducedim_init from Base to get the vector of the right type
-function groupreduce_init(op, condf, adjust,
-                          incol::AbstractVector{U}, gd::GroupedDataFrame) where U
-    T = Base.promote_union(U)
-
-    if op === Base.add_sum
-        initf = zero
-    elseif op === Base.mul_prod
-        initf = one
-    else
-        throw(ErrorException("Unrecognized op $op"))
+function _agg2idx_map_helper(idx::AbstractVector, idx_agg::AbstractVector)
+    agg2idx_map = fill(-1, length(idx))
+    aggj = 1
+    @inbounds for (j, idxj) in enumerate(idx)
+        while idx_agg[aggj] != idxj
+            aggj += 1
+            @assert aggj <= length(idx_agg)
+        end
+        agg2idx_map[j] = aggj
     end
+    return agg2idx_map
+end
 
-    Tnm = nonmissingtype(T)
-    if isconcretetype(Tnm) && applicable(initf, Tnm)
-        tmpv = initf(Tnm)
-        initv = op(tmpv, tmpv)
-        if adjust isa Nothing
-            x = Tnm <: AbstractIrrational ? float(initv) : initv
-        else
-            x = adjust(initv, 1)
-        end
-        if condf === !ismissing
-            V = typeof(x)
+struct TransformationResult
+    col_idx::Vector{Int} # index for a column
+    col::AbstractVector # computed value of a column
+    name::Symbol # name of a column
+    optional::Bool # whether a column is allowed to be replaced in the future
+end
+
+# the transformation is an aggregation for which we have the fast path
+function _combine_process_agg(@nospecialize(cs_i::Pair{Int, <:Pair{<:Function, Symbol}}),
+                              optional_i::Bool,
+                              parentdf::AbstractDataFrame,
+                              gd::GroupedDataFrame,
+                              seen_cols::Dict{Symbol, Tuple{Bool, Int}},
+                              trans_res::Vector{TransformationResult},
+                              idx_agg::Union{Nothing, AbstractVector{Int}})
+    @assert isagg(cs_i, gd)
+    @assert !optional_i
+    out_col_name = last(last(cs_i))
+    incol = parentdf[!, first(cs_i)]
+    agg = check_aggregate(first(last(cs_i)), incol)
+    outcol = agg(incol, gd)
+
+    if haskey(seen_cols, out_col_name)
+        optional, loc = seen_cols[out_col_name]
+        # we have seen this col but it is not allowed to replace it
+        optional || throw(ArgumentError("duplicate output column name: :$out_col_name"))
+        @assert trans_res[loc].optional && trans_res[loc].name == out_col_name
+        trans_res[loc] = TransformationResult(idx_agg, outcol, out_col_name, optional_i)
+        seen_cols[out_col_name] = (optional_i, loc)
+    else
+        push!(trans_res, TransformationResult(idx_agg, outcol, out_col_name, optional_i))
+        seen_cols[out_col_name] = (optional_i, length(trans_res))
+    end
+end
+
+# move one column without transorming it
+function _combine_process_noop(cs_i::Pair{<:Union{Int, AbstractVector{Int}}, Pair{typeof(identity), Symbol}},
+                               optional_i::Bool,
+                               parentdf::AbstractDataFrame,
+                               seen_cols::Dict{Symbol, Tuple{Bool, Int}},
+                               trans_res::Vector{TransformationResult},
+                               idx_keeprows::AbstractVector{Int},
+                               copycols::Bool)
+    source_cols = first(cs_i)
+    out_col_name = last(last(cs_i))
+    if length(source_cols) != 1
+        throw(ArgumentError("Exactly one column can be transformed to one output column" *
+                            " when using identity transformation"))
+    end
+    outcol = parentdf[!, first(source_cols)]
+
+    if haskey(seen_cols, out_col_name)
+        optional, loc = seen_cols[out_col_name]
+        @assert trans_res[loc].name == out_col_name
+        if optional
+            if !optional_i
+                @assert trans_res[loc].optional
+                trans_res[loc] = TransformationResult(idx_keeprows, copycols ? copy(outcol) : outcol,
+                                                      out_col_name, optional_i)
+                seen_cols[out_col_name] = (optional_i, loc)
+            end
         else
-            V = U >: Missing ? Union{typeof(x), Missing} : typeof(x)
+            # if optional_i is true, then we ignore processing this column
+            optional_i || throw(ArgumentError("duplicate output column name: :$out_col_name"))
         end
-        v = similar(incol, V, length(gd))
-        fill!(v, x)
-        return v
     else
-        # do not try to determine the narrowest possible type nor starting value
-        # as this is not possible to do correctly in general without processing
-        # groups; it will get fixed later in groupreduce!; later we
-        # will make use of the fact that this vector is filled with #undef
-        # while above the vector is filled with a concrete value
-        return Vector{Any}(undef, length(gd))
+        push!(trans_res, TransformationResult(idx_keeprows, copycols ? copy(outcol) : outcol,
+                                              out_col_name, optional_i))
+        seen_cols[out_col_name] = (optional_i, length(trans_res))
     end
 end
 
-for (op, initf) in ((:max, :typemin), (:min, :typemax))
-    @eval begin
-        function groupreduce_init(::typeof($op), condf, adjust,
-                                  incol::AbstractVector{T}, gd::GroupedDataFrame) where T
-            @assert isnothing(adjust)
-            S = nonmissingtype(T)
-            # !ismissing check is purely an optimization to avoid a copy later
-            outcol = similar(incol, condf === !ismissing ? S : T, length(gd))
-            # Comparison is possible only between CatValues from the same pool
-            if incol isa CategoricalVector
-                U = Union{CategoricalArrays.leveltype(outcol),
-                          eltype(outcol) >: Missing ? Missing : Union{}}
-                outcol = CategoricalArray{U, 1}(outcol.refs, incol.pool)
-            end
-            # It is safe to use a non-missing init value
-            # since missing will poison the result if present
-            # we assume here that groups are non-empty (current design assures this)
-            # + workaround for https://github.com/JuliaLang/julia/issues/36978
-            if isconcretetype(S) && hasmethod($initf, Tuple{S}) && !(S <: Irrational)
-                fill!(outcol, $initf(S))
-            else
-                fillfirst!(condf, outcol, incol, gd)
-            end
-            return outcol
+# perform a transformation taking SubDataFrame as an input
+function _combine_process_callable(@nospecialize(cs_i::Base.Callable),
+                                   optional_i::Bool,
+                                   parentdf::AbstractDataFrame,
+                                   gd::GroupedDataFrame,
+                                   seen_cols::Dict{Symbol, Tuple{Bool, Int}},
+                                   trans_res::Vector{TransformationResult},
+                                   idx_agg::Union{Nothing, AbstractVector{Int}})
+    firstres = length(gd) > 0 ? cs_i(gd[1]) : cs_i(similar(parentdf, 0))
+    idx, outcols, nms = _combine_multicol(firstres, cs_i, gd, nothing)
+
+    if !(firstres isa Union{AbstractVecOrMat, AbstractDataFrame,
+                            NamedTuple{<:Any, <:Tuple{Vararg{AbstractVector}}}})
+        # if idx_agg was not computed yet it is nothing
+        # in this case if we are not passed a vector compute it.
+        if isnothing(idx_agg)
+            idx_agg = Vector{Int}(undef, length(gd))
+            fillfirst!(nothing, idx_agg, 1:length(gd.groups), gd)
         end
+        @assert idx == idx_agg
+        idx = idx_agg
     end
-end
-
-function copyto_widen!(res::AbstractVector{T}, x::AbstractVector) where T
-    @inbounds for i in eachindex(res, x)
-        val = x[i]
-        S = typeof(val)
-        if S <: T || promote_type(S, T) <: T
-            res[i] = val
+    @assert length(outcols) == length(nms)
+    for j in eachindex(outcols)
+        outcol = outcols[j]
+        out_col_name = nms[j]
+        if haskey(seen_cols, out_col_name)
+            optional, loc = seen_cols[out_col_name]
+            # if column was seen and it is optional now ignore it
+            if !optional_i
+                optional, loc = seen_cols[out_col_name]
+                # we have seen this col but it is not allowed to replace it
+                optional || throw(ArgumentError("duplicate output column name: :$out_col_name"))
+                @assert trans_res[loc].optional && trans_res[loc].name == out_col_name
+                trans_res[loc] = TransformationResult(idx, outcol, out_col_name, optional_i)
+                seen_cols[out_col_name] = (optional_i, loc)
+            end
         else
-            newres = Tables.allocatecolumn(promote_type(S, T), length(x))
-            return copyto_widen!(newres, x)
+            push!(trans_res, TransformationResult(idx, outcol, out_col_name, optional_i))
+            seen_cols[out_col_name] = (optional_i, length(trans_res))
         end
     end
-    return res
+    return idx_agg
 end
 
-function groupreduce!(res::AbstractVector, f, op, condf, adjust, checkempty::Bool,
-                      incol::AbstractVector, gd::GroupedDataFrame)
-    n = length(gd)
-    if adjust !== nothing || checkempty
-        counts = zeros(Int, n)
+# perform a transformation specified using the Pair notation with a single output column
+function _combine_process_pair_symbol(optional_i::Bool,
+                                      gd::GroupedDataFrame,
+                                      seen_cols::Dict{Symbol, Tuple{Bool, Int}},
+                                      trans_res::Vector{TransformationResult},
+                                      idx_agg::Union{Nothing, AbstractVector{Int}},
+                                      out_col_name::Symbol,
+                                      firstmulticol::Bool,
+                                      firstres::Any,
+                                      @nospecialize(fun::Base.Callable),
+                                      incols::Union{Tuple, NamedTuple})
+    if firstmulticol
+        throw(ArgumentError("a single value or vector result is required (got $(typeof(firstres)))"))
     end
-    groups = gd.groups
-    @inbounds for i in eachindex(incol, groups)
-        gix = groups[i]
-        x = incol[i]
-        if gix > 0 && (condf === nothing || condf(x))
-            # this check should be optimized out if U is not Any
-            if eltype(res) === Any && !isassigned(res, gix)
-                res[gix] = f(x, gix)
-            else
-                res[gix] = op(res[gix], f(x, gix))
-            end
-            if adjust !== nothing || checkempty
-                counts[gix] += 1
-            end
-        end
+    # if idx_agg was not computed yet it is nothing
+    # in this case if we are not passed a vector compute it.
+    if !(firstres isa AbstractVector) && isnothing(idx_agg)
+        idx_agg = Vector{Int}(undef, length(gd))
+        fillfirst!(nothing, idx_agg, 1:length(gd.groups), gd)
     end
-    # handle the case of an unitialized reduction
-    if eltype(res) === Any
-        if op === Base.add_sum
-            initf = zero
-        elseif op === Base.mul_prod
-            initf = one
-        else
-            initf = x -> throw(ErrorException("Unrecognized op $op"))
+    # TODO: if firstres is a vector we recompute idx for every function
+    # this could be avoided - it could be computed only the first time
+    # and later we could just check if lengths of groups match this first idx
+
+    # the last argument passed to _combine_with_first informs it about precomputed
+    # idx. Currently we do it only for single-row return values otherwise we pass
+    # nothing to signal that idx has to be computed in _combine_with_first
+    idx, outcols, _ = _combine_with_first(wrap(firstres), fun, gd, incols,
+                                          Val(firstmulticol),
+                                          firstres isa AbstractVector ? nothing : idx_agg)
+    @assert length(outcols) == 1
+    outcol = outcols[1]
+
+    if haskey(seen_cols, out_col_name)
+        # if column was seen and it is optional now ignore it
+        if !optional_i
+            optional, loc = seen_cols[out_col_name]
+            # we have seen this col but it is not allowed to replace it
+            optional || throw(ArgumentError("duplicate output column name: :$out_col_name"))
+            @assert trans_res[loc].optional && trans_res[loc].name == out_col_name
+            trans_res[loc] = TransformationResult(idx, outcol, out_col_name, optional_i)
+            seen_cols[out_col_name] = (optional_i, loc)
         end
-        @inbounds for gix in eachindex(res)
-            if !isassigned(res, gix)
-                res[gix] = initf(nonmissingtype(eltype(incol)))
-            end
-        end
-    end
-    if adjust !== nothing
-        res .= adjust.(res, counts)
-    end
-    if checkempty && any(iszero, counts)
-        throw(ArgumentError("some groups contain only missing values"))
-    end
-    # Undo pool sharing done by groupreduce_init
-    if res isa CategoricalVector && res.pool === incol.pool
-        V = Union{CategoricalArrays.leveltype(res),
-                  eltype(res) >: Missing ? Missing : Union{}}
-        res = CategoricalArray{V, 1}(res.refs, copy(res.pool))
-    end
-    if isconcretetype(eltype(res))
-        return res
     else
-        return copyto_widen!(Tables.allocatecolumn(typeof(first(res)), n), res)
-    end
-end
-
-# function barrier works around type instability of groupreduce_init due to applicable
-groupreduce(f, op, condf, adjust, checkempty::Bool,
-            incol::AbstractVector, gd::GroupedDataFrame) =
-    groupreduce!(groupreduce_init(op, condf, adjust, incol, gd),
-                 f, op, condf, adjust, checkempty, incol, gd)
-# Avoids the overhead due to Missing when computing reduction
-groupreduce(f, op, condf::typeof(!ismissing), adjust, checkempty::Bool,
-            incol::AbstractVector, gd::GroupedDataFrame) =
-    groupreduce!(disallowmissing(groupreduce_init(op, condf, adjust, incol, gd)),
-                 f, op, condf, adjust, checkempty, incol, gd)
-
-(r::Reduce)(incol::AbstractVector, gd::GroupedDataFrame) =
-    groupreduce((x, i) -> x, r.op, r.condf, r.adjust, r.checkempty, incol, gd)
-
-# this definition is missing in Julia 1.0 LTS and is required by aggregation for var
-# TODO: remove this when we drop 1.0 support
-if VERSION < v"1.1"
-    Base.zero(::Type{Missing}) = missing
-end
-
-function (agg::Aggregate{typeof(var)})(incol::AbstractVector, gd::GroupedDataFrame)
-    means = groupreduce((x, i) -> x, Base.add_sum, agg.condf, /, false, incol, gd)
-    # !ismissing check is purely an optimization to avoid a copy later
-    if eltype(means) >: Missing && agg.condf !== !ismissing
-        T = Union{Missing, real(eltype(means))}
+        push!(trans_res, TransformationResult(idx, outcol, out_col_name, optional_i))
+        seen_cols[out_col_name] = (optional_i, length(trans_res))
+    end
+    return idx_agg
+end
+
+# perform a transformation specified using the Pair notation with multiple output columns
+function _combine_process_pair_astable(optional_i::Bool,
+                                       gd::GroupedDataFrame,
+                                       seen_cols::Dict{Symbol, Tuple{Bool, Int}},
+                                       trans_res::Vector{TransformationResult},
+                                       idx_agg::Union{Nothing, AbstractVector{Int}},
+                                       out_col_name::Union{Type{AsTable}, AbstractVector{Symbol}},
+                                       firstmulticol::Bool,
+                                       firstres::Any,
+                                       @nospecialize(fun::Base.Callable),
+                                       incols::Union{Tuple, NamedTuple})
+    if firstres isa AbstractVector
+        idx, outcol_vec, _ = _combine_with_first(wrap(firstres), fun, gd, incols,
+                                              Val(firstmulticol), nothing)
+        @assert length(outcol_vec) == 1
+        res = outcol_vec[1]
+        @assert length(res) > 0
+
+        kp1 = keys(res[1])
+        prepend = all(x -> x isa Integer, kp1)
+        if !(prepend || all(x -> x isa Symbol, kp1) || all(x -> x isa AbstractString, kp1))
+            throw(ArgumentError("keys of the returned elements must be " *
+                                "`Symbol`s, strings or integers"))
+        end
+        if any(x -> !isequal(keys(x), kp1), res)
+            throw(ArgumentError("keys of the returned elements must be identical"))
+        end
+        outcols = [[x[n] for x in res] for n in kp1]
+        nms = [prepend ? Symbol("x", n) : Symbol(n) for n in kp1]
     else
-        T = real(eltype(means))
-    end
-    res = zeros(T, length(gd))
-    return groupreduce!(res, (x, i) -> @inbounds(abs2(x - means[i])), +, agg.condf,
-                        (x, l) -> l <= 1 ? oftype(x / (l-1), NaN) : x / (l-1),
-                        false, incol, gd)
-end
+        if !firstmulticol
+            firstres = Tables.columntable(firstres)
+            oldfun = fun
+            fun = (x...) -> Tables.columntable(oldfun(x...))
+        end
+        idx, outcols, nms = _combine_multicol(firstres, fun, gd, incols)
 
-function (agg::Aggregate{typeof(std)})(incol::AbstractVector, gd::GroupedDataFrame)
-    outcol = Aggregate(var, agg.condf)(incol, gd)
-    if eltype(outcol) <: Union{Missing, Rational}
-        return sqrt.(outcol)
-    else
-        return map!(sqrt, outcol, outcol)
+        if !(firstres isa Union{AbstractVecOrMat, AbstractDataFrame,
+            NamedTuple{<:Any, <:Tuple{Vararg{AbstractVector}}}})
+            # if idx_agg was not computed yet it is nothing
+            # in this case if we are not passed a vector compute it.
+            if isnothing(idx_agg)
+                idx_agg = Vector{Int}(undef, length(gd))
+                fillfirst!(nothing, idx_agg, 1:length(gd.groups), gd)
+            end
+            @assert idx == idx_agg
+            idx = idx_agg
+        end
+        @assert length(outcols) == length(nms)
     end
-end
-
-for f in (first, last)
-    function (agg::Aggregate{typeof(f)})(incol::AbstractVector, gd::GroupedDataFrame)
-        n = length(gd)
-        outcol = similar(incol, n)
-        fillfirst!(agg.condf, outcol, incol, gd, rev=agg.f === last)
-        if isconcretetype(eltype(outcol))
-            return outcol
+    if out_col_name isa AbstractVector{Symbol}
+        if length(out_col_name) != length(nms)
+            throw(ArgumentError("Number of returned columns does not " *
+                                "match the length of requested output"))
+        else
+            nms = out_col_name
+        end
+    end
+    for j in eachindex(outcols)
+        outcol = outcols[j]
+        out_col_name = nms[j]
+        if haskey(seen_cols, out_col_name)
+            optional, loc = seen_cols[out_col_name]
+            # if column was seen and it is optional now ignore it
+            if !optional_i
+                optional, loc = seen_cols[out_col_name]
+                # we have seen this col but it is not allowed to replace it
+                optional || throw(ArgumentError("duplicate output column name: :$out_col_name"))
+                @assert trans_res[loc].optional && trans_res[loc].name == out_col_name
+                trans_res[loc] = TransformationResult(idx, outcol, out_col_name, optional_i)
+                seen_cols[out_col_name] = (optional_i, loc)
+            end
         else
-            return copyto_widen!(Tables.allocatecolumn(typeof(first(outcol)), n), outcol)
+            push!(trans_res, TransformationResult(idx, outcol, out_col_name, optional_i))
+            seen_cols[out_col_name] = (optional_i, length(trans_res))
         end
     end
+    return idx_agg
 end
 
-function (agg::Aggregate{typeof(length)})(incol::AbstractVector, gd::GroupedDataFrame)
-    if getfield(gd, :idx) === nothing
-        lens = zeros(Int, length(gd))
-        @inbounds for gix in gd.groups
-            gix > 0 && (lens[gix] += 1)
-        end
-        return lens
+# perform a transformation specified using the Pair notation
+# cs_i is a Pair that has many possible forms so this function is used to dispatch
+# to an appropriate more specialized function
+function _combine_process_pair(@nospecialize(cs_i::Pair),
+                               optional_i::Bool,
+                               parentdf::AbstractDataFrame,
+                               gd::GroupedDataFrame,
+                               seen_cols::Dict{Symbol, Tuple{Bool, Int}},
+                               trans_res::Vector{TransformationResult},
+                               idx_agg::Union{Nothing, AbstractVector{Int}})
+    source_cols, (fun, out_col_name) = cs_i
+
+    if source_cols isa Int
+        incols = (parentdf[!, source_cols],)
+    elseif source_cols isa AsTable
+        incols = Tables.columntable(select(parentdf,
+                                           source_cols.cols,
+                                           copycols=false))
     else
-        return gd.ends .- gd.starts .+ 1
+        @assert source_cols isa AbstractVector{Int}
+        incols = ntuple(i -> parentdf[!, source_cols[i]], length(source_cols))
     end
-end
 
-isagg((col, fun)::Pair, gdf::GroupedDataFrame) =
-    col isa ColumnIndex && check_aggregate(fun, parent(gdf)[!, col]) isa AbstractAggregate
+    firstres = length(gd) > 0 ?
+               do_call(fun, gd.idx, gd.starts, gd.ends, gd, incols, 1) :
+               do_call(fun, Int[], 1:1, 0:0, gd, incols, 1)
+    firstmulticol = firstres isa MULTI_COLS_TYPE
 
-function _agg2idx_map_helper(idx, idx_agg)
-    agg2idx_map = fill(-1, length(idx))
-    aggj = 1
-    @inbounds for (j, idxj) in enumerate(idx)
-        while idx_agg[aggj] != idxj
-            aggj += 1
-            @assert aggj <= length(idx_agg)
-        end
-        agg2idx_map[j] = aggj
+    if out_col_name isa Symbol
+        return _combine_process_pair_symbol(optional_i, gd, seen_cols, trans_res, idx_agg,
+                                           out_col_name, firstmulticol, firstres, fun, incols)
     end
-    return agg2idx_map
+    if out_col_name == AsTable || out_col_name isa AbstractVector{Symbol}
+        return _combine_process_pair_astable(optional_i, gd, seen_cols, trans_res, idx_agg,
+                                             out_col_name, firstmulticol, firstres, fun, incols)
+    end
+    throw(ArgumentError("unsupported target column name specifier $out_col_name"))
 end
 
 function prepare_idx_keeprows(idx::AbstractVector{<:Integer},
@@ -1150,14 +483,10 @@ function prepare_idx_keeprows(idx::AbstractVector{<:Integer},
     return idx_keeprows
 end
 
-function _combine(f::AbstractVector{<:Pair},
-                  gd::GroupedDataFrame, nms::AbstractVector{Symbol},
+function _combine(gd::GroupedDataFrame,
+                  @nospecialize(cs_norm::Vector{Any}), optional_transform::Vector{Bool},
                   copycols::Bool, keeprows::Bool, renamecols::Bool)
-    # here f should be normalized and in a form of source_cols => fun
-    @assert all(x -> first(x) isa Union{Int, AbstractVector{Int}, AsTable}, f)
-    @assert all(x -> last(x) isa Base.Callable, f)
-
-    if isempty(f)
+    if isempty(cs_norm)
         if keeprows && nrow(parent(gd)) > 0 && minimum(gd.groups) == 0
             throw(ArgumentError("select and transform do not support " *
                                 "`GroupedDataFrame`s from which some groups have "*
@@ -1178,87 +507,76 @@ function _combine(f::AbstractVector{<:Pair},
     end
 
     idx_agg = nothing
-    if length(gd) > 0 && any(x -> isagg(x, gd), f)
+    if length(gd) > 0 && any(x -> isagg(x, gd), cs_norm)
         # Compute indices of representative rows only once for all AbstractAggregates
         idx_agg = Vector{Int}(undef, length(gd))
         fillfirst!(nothing, idx_agg, 1:length(gd.groups), gd)
-    elseif length(gd) == 0 || !all(x -> isagg(x, gd), f)
+    elseif length(gd) == 0 || !all(x -> isagg(x, gd), cs_norm)
         # Trigger computation of indices
         # This can speed up some aggregates that would not trigger this on their own
         @assert gd.idx !== nothing
     end
-    res = Vector{Any}(undef, length(f))
+
+    trans_res = Vector{TransformationResult}()
+
+    # seen_cols keeps an information about location of columns already processed
+    # and if a given column can be replaced in the future
+    seen_cols = Dict{Symbol, Tuple{Bool, Int}}()
+
     parentdf = parent(gd)
-    for (i, p) in enumerate(f)
-        source_cols, fun = p
-        if length(gd) > 0 && isagg(p, gd)
-            incol = parentdf[!, source_cols]
-            agg = check_aggregate(last(p), incol)
-            outcol = agg(incol, gd)
-            res[i] = idx_agg, outcol
-        elseif keeprows && fun === identity && !(source_cols isa AsTable)
-            @assert source_cols isa Union{Int, AbstractVector{Int}}
-            @assert length(source_cols) == 1
-            outcol = parentdf[!, first(source_cols)]
-            res[i] = idx_keeprows, copycols ? copy(outcol) : outcol
-        else
-            if source_cols isa Int
-                incols = (parentdf[!, source_cols],)
-            elseif source_cols isa AsTable
-                incols = Tables.columntable(select(parentdf,
-                                                   source_cols.cols,
-                                                   copycols=false))
-            else
-                @assert source_cols isa AbstractVector{Int}
-                incols = ntuple(i -> parentdf[!, source_cols[i]], length(source_cols))
-            end
-            firstres = length(gd) > 0 ?
-                       do_call(fun, gd.idx, gd.starts, gd.ends, gd, incols, 1) :
-                       do_call(fun, Int[], 1:1, 0:0, gd, incols, 1)
-            firstmulticol = firstres isa MULTI_COLS_TYPE
-            if firstmulticol
-                throw(ArgumentError("a single value or vector result is required when " *
-                                    "passing multiple functions (got $(typeof(res)))"))
+    for i in eachindex(cs_norm, optional_transform)
+        cs_i = cs_norm[i]
+        optional_i = optional_transform[i]
+
+        if length(gd) > 0 && isagg(cs_i, gd)
+            _combine_process_agg(cs_i, optional_i, parentdf, gd, seen_cols, trans_res, idx_agg)
+        elseif keeprows && cs_i isa Pair && first(last(cs_i)) === identity &&
+               !(first(cs_i) isa AsTable) && (last(last(cs_i)) isa Symbol)
+            # this is a fast path used when we pass a column or rename a column in select or transform
+            _combine_process_noop(cs_i, optional_i, parentdf, seen_cols, trans_res, idx_keeprows, copycols)
+        elseif cs_i isa Base.Callable
+            idx_callable = _combine_process_callable(cs_i, optional_i, parentdf, gd, seen_cols, trans_res, idx_agg)
+            if idx_callable !== nothing
+                if idx_agg === nothing
+                    idx_agg = idx_callable
+                else
+                    @assert idx_agg === idx_callable
+                end
             end
-            # if idx_agg was not computed yet it is nothing
-            # in this case if we are not passed a vector compute it.
-            if !(firstres isa AbstractVector) && isnothing(idx_agg)
-                idx_agg = Vector{Int}(undef, length(gd))
-                fillfirst!(nothing, idx_agg, 1:length(gd.groups), gd)
+        else
+            @assert cs_i isa Pair
+            idx_pair = _combine_process_pair(cs_i, optional_i, parentdf, gd, seen_cols, trans_res, idx_agg)
+            if idx_pair !== nothing
+                if idx_agg === nothing
+                    idx_agg = idx_pair
+                else
+                    @assert idx_agg === idx_pair
+                end
             end
-            # TODO: if firstres is a vector we recompute idx for every function
-            # this could be avoided - it could be computed only the first time
-            # and later we could just check if lengths of groups match this first idx
-
-            # the last argument passed to _combine_with_first informs it about precomputed
-            # idx. Currently we do it only for single-row return values otherwise we pass
-            # nothing to signal that idx has to be computed in _combine_with_first
-            idx, outcols, _ = _combine_with_first(wrap(firstres), fun, gd, incols,
-                                                  Val(firstmulticol),
-                                                  firstres isa AbstractVector ? nothing : idx_agg)
-            @assert length(outcols) == 1
-            res[i] = idx, outcols[1]
         end
     end
+
+    isempty(trans_res) && return Int[], DataFrame()
     # idx_agg === nothing then we have only functions that
     # returned multiple rows and idx_loc = 1
-    idx_loc = findfirst(x -> x[1] !== idx_agg, res)
+    idx_loc = findfirst(x -> x.col_idx !== idx_agg, trans_res)
     if !keeprows && isnothing(idx_loc)
         @assert !isnothing(idx_agg)
         idx = idx_agg
     else
-        idx = keeprows ? idx_keeprows : res[idx_loc][1]
+        idx = keeprows ? idx_keeprows : trans_res[idx_loc].col_idx
         agg2idx_map = nothing
-        for i in 1:length(res)
-            if res[i][1] !== idx && res[i][1] != idx
-                if res[i][1] === idx_agg
+        for i in 1:length(trans_res)
+            if trans_res[i].col_idx !== idx
+                if trans_res[i].col_idx === idx_agg
                     # we perform pseudo broadcasting here
                     # keep -1 as a sentinel for errors
                     if isnothing(agg2idx_map)
                         agg2idx_map = _agg2idx_map_helper(idx, idx_agg)
                     end
-                    res[i] = idx_agg, res[i][2][agg2idx_map]
-                elseif idx != res[i][1]
+                    trans_res[i] = TransformationResult(idx_agg, trans_res[i].col[agg2idx_map],
+                                                        trans_res[i].name, trans_res[i].optional)
+                elseif idx != trans_res[i].col_idx
                     if keeprows
                         throw(ArgumentError("all functions must return vectors with " *
                                             "as many values as rows in each group"))
@@ -1270,469 +588,79 @@ function _combine(f::AbstractVector{<:Pair},
         end
     end
 
-    # here first field in res[i] is used to keep track how the column was generated
+    # here first field in trans_res[i] is used to keep track how the column was generated
     # a correct index is stored in idx variable
 
-    for (i, (col_idx, col)) in enumerate(res)
-        if keeprows && res[i][1] !== idx_keeprows # we need to reorder the column
+    for i in eachindex(trans_res)
+        col_idx = trans_res[i].col_idx
+        col = trans_res[i].col
+        if keeprows && col_idx !== idx_keeprows # we need to reorder the column
             newcol = similar(col)
             # we can probably make it more efficient, but I leave it as an optimization for the future
             gd_idx = gd.idx
-            for j in eachindex(gd.idx, col)
-                newcol[gd_idx[j]] = col[j]
+            k = 0
+            # consider adding @inbounds later
+            for (s, e) in zip(gd.starts, gd.ends)
+                for j in s:e
+                    k += 1
+                    newcol[gd_idx[j]] = col[k]
+                end
             end
-            res[i] = (col_idx, newcol)
+            @assert k == length(gd_idx)
+            trans_res[i] = TransformationResult(col_idx, newcol, trans_res[i].name, trans_res[i].optional)
         end
     end
-    outcols = map(x -> x[2], res)
+
+    outcols = AbstractVector[x.col for x in trans_res]
+    nms = Symbol[x.name for x in trans_res]
     # this check is redundant given we check idx above
     # but it is safer to double check and it is cheap
     @assert all(x -> length(x) == length(outcols[1]), outcols)
-    return idx, DataFrame(collect(AbstractVector, outcols), nms, copycols=false)
-end
-
-function _combine(fun::Base.Callable, gd::GroupedDataFrame, ::Nothing,
-                  copycols::Bool, keeprows::Bool, renamecols::Bool)
-    @assert copycols && !keeprows
-    # use `similar` as `gd` might have been subsetted
-    firstres = length(gd) > 0 ? fun(gd[1]) : fun(similar(parent(gd), 0))
-    idx, outcols, nms = _combine_multicol(firstres, fun, gd, nothing)
-    valscat = DataFrame(collect(AbstractVector, outcols), nms)
-    return idx, valscat
-end
-
-function _combine(p::Pair, gd::GroupedDataFrame, ::Nothing,
-                  copycols::Bool, keeprows::Bool, renamecols::Bool)
-    # here p should not be normalized as we allow tabular return value from fun
-    # map and combine should not dispatch here if p is isagg
-    @assert copycols && !keeprows
-    source_cols, (fun, out_col) = normalize_selection(index(parent(gd)), p, renamecols)
-    parentdf = parent(gd)
-    if source_cols isa Int
-        incols = (parent(gd)[!, source_cols],)
-    elseif source_cols isa AsTable
-        incols = Tables.columntable(select(parentdf,
-                                           source_cols.cols,
-                                           copycols=false))
-    else
-        @assert source_cols isa AbstractVector{Int}
-        incols = ntuple(i -> parent(gd)[!, source_cols[i]], length(source_cols))
-    end
-    firstres = length(gd) > 0 ?
-               do_call(fun, gd.idx, gd.starts, gd.ends, gd, incols, 1) :
-               do_call(fun, Int[], 1:1, 0:0, gd, incols, 1)
-    idx, outcols, nms = _combine_multicol(firstres, fun, gd, incols)
-    # disallow passing target column name to genuine tables
-    if firstres isa MULTI_COLS_TYPE
-        if p isa Pair{<:Any, <:Pair{<:Any, <:SymbolOrString}}
-            throw(ArgumentError("setting column name for tabular return value is disallowed"))
-        end
-    else
-        # fetch auto generated or passed target column name to nms overwritting
-        # what _combine_with_first produced
-        nms = [out_col]
-    end
-    valscat = DataFrame(collect(AbstractVector, outcols), nms)
-    return idx, valscat
-end
-
-function _combine_multicol(firstres, fun::Any, gd::GroupedDataFrame,
-                           incols::Union{Nothing, AbstractVector, Tuple, NamedTuple})
-    firstmulticol = firstres isa MULTI_COLS_TYPE
-    if !(firstres isa Union{AbstractVecOrMat, AbstractDataFrame,
-                            NamedTuple{<:Any, <:Tuple{Vararg{AbstractVector}}}})
-        idx_agg = Vector{Int}(undef, length(gd))
-        fillfirst!(nothing, idx_agg, 1:length(gd.groups), gd)
-    else
-        idx_agg = nothing
-    end
-    return _combine_with_first(wrap(firstres), fun, gd, incols,
-                               Val(firstmulticol), idx_agg)
-end
-
-function _combine_with_first(first::Union{NamedTuple, DataFrameRow, AbstractDataFrame},
-                             f::Any, gd::GroupedDataFrame,
-                             incols::Union{Nothing, AbstractVector, Tuple, NamedTuple},
-                             firstmulticol::Val, idx_agg::Union{Nothing, AbstractVector{<:Integer}})
-    extrude = false
-
-    if first isa AbstractDataFrame
-        n = 0
-        eltys = eltype.(eachcol(first))
-    elseif first isa NamedTuple{<:Any, <:Tuple{Vararg{AbstractVector}}}
-        n = 0
-        eltys = map(eltype, first)
-    elseif first isa DataFrameRow
-        n = length(gd)
-        eltys = [eltype(parent(first)[!, i]) for i in parentcols(index(first))]
-    elseif firstmulticol == Val(false) && first[1] isa Union{AbstractArray{<:Any, 0}, Ref}
-        extrude = true
-        first = wrap_row(first[1], firstmulticol)
-        n = length(gd)
-        eltys = (typeof(first[1]),)
-    else # other NamedTuple giving a single row
-        n = length(gd)
-        eltys = map(typeof, first)
-        if any(x -> x <: AbstractVector, eltys)
-            throw(ArgumentError("mixing single values and vectors in a named tuple is not allowed"))
-        end
-    end
-    idx = isnothing(idx_agg) ? Vector{Int}(undef, n) : idx_agg
-    local initialcols
-    let eltys=eltys, n=n # Workaround for julia#15276
-        initialcols = ntuple(i -> Tables.allocatecolumn(eltys[i], n), _ncol(first))
-    end
-    targetcolnames = tuple(propertynames(first)...)
-    if !extrude && first isa Union{AbstractDataFrame,
-                                   NamedTuple{<:Any, <:Tuple{Vararg{AbstractVector}}}}
-        outcols, finalcolnames = _combine_tables_with_first!(first, initialcols, idx, 1, 1,
-                                                             f, gd, incols, targetcolnames,
-                                                             firstmulticol)
-    else
-        outcols, finalcolnames = _combine_rows_with_first!(first, initialcols, 1, 1,
-                                                           f, gd, incols, targetcolnames,
-                                                           firstmulticol)
-    end
-    return idx, outcols, collect(Symbol, finalcolnames)
-end
-
-function fill_row!(row, outcols::NTuple{N, AbstractVector},
-                   i::Integer, colstart::Integer,
-                   colnames::NTuple{N, Symbol}) where N
-    if _ncol(row) != N
-        throw(ArgumentError("return value must have the same number of columns " *
-                            "for all groups (got $N and $(length(row)))"))
-    end
-    @inbounds for j in colstart:length(outcols)
-        col = outcols[j]
-        cn = colnames[j]
-        local val
-        try
-            val = row[cn]
-        catch
-            throw(ArgumentError("return value must have the same column names " *
-                                "for all groups (got $colnames and $(propertynames(row)))"))
-        end
-        S = typeof(val)
-        T = eltype(col)
-        if S <: T || promote_type(S, T) <: T
-            col[i] = val
-        else
-            return j
-        end
-    end
-    return nothing
-end
-
-function _combine_rows_with_first!(first::Union{NamedTuple, DataFrameRow},
-                                   outcols::NTuple{N, AbstractVector},
-                                   rowstart::Integer, colstart::Integer,
-                                   f::Any, gd::GroupedDataFrame,
-                                   incols::Union{Nothing, AbstractVector, Tuple, NamedTuple},
-                                   colnames::NTuple{N, Symbol},
-                                   firstmulticol::Val) where N
-    len = length(gd)
-    gdidx = gd.idx
-    starts = gd.starts
-    ends = gd.ends
-
-    # handle empty GroupedDataFrame
-    len == 0 && return outcols, colnames
-
-    # Handle first group
-    j = fill_row!(first, outcols, rowstart, colstart, colnames)
-    @assert j === nothing # eltype is guaranteed to match
-    # Handle remaining groups
-    @inbounds for i in rowstart+1:len
-        row = wrap_row(do_call(f, gdidx, starts, ends, gd, incols, i), firstmulticol)
-        j = fill_row!(row, outcols, i, 1, colnames)
-        if j !== nothing # Need to widen column type
-            local newcols
-            let i = i, j = j, outcols=outcols, row=row # Workaround for julia#15276
-                newcols = ntuple(length(outcols)) do k
-                    S = typeof(row[k])
-                    T = eltype(outcols[k])
-                    U = promote_type(S, T)
-                    if S <: T || U <: T
-                        outcols[k]
-                    else
-                        copyto!(Tables.allocatecolumn(U, length(outcols[k])),
-                                1, outcols[k], 1, k >= j ? i-1 : i)
-                    end
-                end
-            end
-            return _combine_rows_with_first!(row, newcols, i, j,
-                                             f, gd, incols, colnames, firstmulticol)
-        end
-    end
-    return outcols, colnames
+    return idx, DataFrame(outcols, nms, copycols=false)
 end
 
-# This needs to be in a separate function
-# to work around a crash due to JuliaLang/julia#29430
-if VERSION >= v"1.1.0-DEV.723"
-    @inline function do_append!(do_it, col, vals)
-        do_it && append!(col, vals)
-        return do_it
-    end
-else
-    @noinline function do_append!(do_it, col, vals)
-        do_it && append!(col, vals)
-        return do_it
+function combine(f::Base.Callable, gd::GroupedDataFrame;
+                 keepkeys::Bool=true, ungroup::Bool=true, renamecols::Bool=true)
+    if f isa Colon
+        throw(ArgumentError("First argument must be a transformation if the second argument is a GroupedDataFrame"))
     end
+    return combine(gd, f, keepkeys=keepkeys, ungroup=ungroup, renamecols=renamecols)
 end
 
-function append_rows!(rows, outcols::NTuple{N, AbstractVector},
-                      colstart::Integer, colnames::NTuple{N, Symbol}) where N
-    if !isa(rows, Union{AbstractDataFrame, NamedTuple{<:Any, <:Tuple{Vararg{AbstractVector}}}})
-        throw(ArgumentError(ERROR_ROW_COUNT))
-    elseif _ncol(rows) != N
-        throw(ArgumentError("return value must have the same number of columns " *
-                            "for all groups (got $N and $(_ncol(rows)))"))
-    end
-    @inbounds for j in colstart:length(outcols)
-        col = outcols[j]
-        cn = colnames[j]
-        local vals
-        try
-            vals = getproperty(rows, cn)
-        catch
-            throw(ArgumentError("return value must have the same column names " *
-                                "for all groups (got $colnames and $(propertynames(rows)))"))
-        end
-        S = eltype(vals)
-        T = eltype(col)
-        if !do_append!(S <: T || promote_type(S, T) <: T, col, vals)
-            return j
-        end
-    end
-    return nothing
-end
+combine(f::Pair, gd::GroupedDataFrame;
+        keepkeys::Bool=true, ungroup::Bool=true, renamecols::Bool=true) =
+    throw(ArgumentError("First argument must be a transformation if the second argument is a GroupedDataFrame. " *
+                        "You can pass a `Pair` as the second argument of the transformation. If you want the return " *
+                        "value to be processed as having multiple columns add `=> AsTable` suffix to the pair."))
 
-function _combine_tables_with_first!(first::Union{AbstractDataFrame,
-                                     NamedTuple{<:Any, <:Tuple{Vararg{AbstractVector}}}},
-                                     outcols::NTuple{N, AbstractVector},
-                                     idx::Vector{Int}, rowstart::Integer, colstart::Integer,
-                                     f::Any, gd::GroupedDataFrame,
-                                     incols::Union{Nothing, AbstractVector, Tuple, NamedTuple},
-                                     colnames::NTuple{N, Symbol},
-                                     firstmulticol::Val) where N
-    len = length(gd)
-    gdidx = gd.idx
-    starts = gd.starts
-    ends = gd.ends
-    # Handle first group
+combine(gd::GroupedDataFrame,
+        cs::Union{Pair, Base.Callable, ColumnIndex, MultiColumnIndex}...;
+        keepkeys::Bool=true, ungroup::Bool=true, renamecols::Bool=true) =
+    _combine_prepare(gd, cs..., keepkeys=keepkeys, ungroup=ungroup,
+                     copycols=true, keeprows=false, renamecols=renamecols)
 
-    @assert _ncol(first) == N
-    if !isempty(colnames) && length(gd) > 0
-        j = append_rows!(first, outcols, colstart, colnames)
-        @assert j === nothing # eltype is guaranteed to match
-        append!(idx, Iterators.repeated(gdidx[starts[rowstart]], _nrow(first)))
+function select(f::Base.Callable, gd::GroupedDataFrame; copycols::Bool=true,
+                keepkeys::Bool=true, ungroup::Bool=true, renamecols::Bool=true)
+    if f isa Colon
+        throw(ArgumentError("First argument must be a transformation if the second argument is a grouped data frame"))
     end
-    # Handle remaining groups
-    @inbounds for i in rowstart+1:len
-        rows = wrap_table(do_call(f, gdidx, starts, ends, gd, incols, i), firstmulticol)
-        _ncol(rows) == 0 && continue
-        if isempty(colnames)
-            newcolnames = tuple(propertynames(rows)...)
-            if rows isa AbstractDataFrame
-                eltys = eltype.(eachcol(rows))
-            else
-                eltys = map(eltype, rows)
-            end
-            initialcols = ntuple(i -> Tables.allocatecolumn(eltys[i], 0), _ncol(rows))
-            return _combine_tables_with_first!(rows, initialcols, idx, i, 1,
-                                               f, gd, incols, newcolnames, firstmulticol)
-        end
-        j = append_rows!(rows, outcols, 1, colnames)
-        if j !== nothing # Need to widen column type
-            local newcols
-            let i = i, j = j, outcols=outcols, rows=rows # Workaround for julia#15276
-                newcols = ntuple(length(outcols)) do k
-                    S = eltype(rows isa AbstractDataFrame ? rows[!, k] : rows[k])
-                    T = eltype(outcols[k])
-                    U = promote_type(S, T)
-                    if S <: T || U <: T
-                        outcols[k]
-                    else
-                        copyto!(Tables.allocatecolumn(U, length(outcols[k])), outcols[k])
-                    end
-                end
-            end
-            return _combine_tables_with_first!(rows, newcols, idx, i, j,
-                                               f, gd, incols, colnames, firstmulticol)
-        end
-        append!(idx, Iterators.repeated(gdidx[starts[i]], _nrow(rows)))
-    end
-    return outcols, colnames
+    return select(gd, f, copycols=copycols, keepkeys=keepkeys, ungroup=ungroup)
 end
 
-"""
-    select(gd::GroupedDataFrame, args...; copycols::Bool=true, keepkeys::Bool=true,
-           ungroup::Bool=true, renamecols::Bool=true)
-
-Apply `args` to `gd` following the rules described in [`combine`](@ref).
-
-If `ungroup=true` the result is a `DataFrame`.
-If  `ungroup=false` the result is a `GroupedDataFrame`
-(in this case the returned value retains the order of groups of `gd`).
-
-The `parent` of the returned value has as many rows as `parent(gd)` and
-in the same order, except when the returned value has no columns
-(in which case it has zero rows). If an operation in `args` returns
-a single value it is always broadcasted to have this number of rows.
-
-If `copycols=false` then do not perform copying of columns that are not transformed.
 
-$KWARG_PROCESSING_RULES
-
-# See also
-
-[`groupby`](@ref), [`combine`](@ref), [`select!`](@ref), [`transform`](@ref), [`transform!`](@ref)
-
-# Examples
-```jldoctest
-julia> df = DataFrame(a = [1, 1, 1, 2, 2, 1, 1, 2],
-                      b = repeat([2, 1], outer=[4]),
-                      c = 1:8)
-8×3 DataFrame
-│ Row │ a     │ b     │ c     │
-│     │ Int64 │ Int64 │ Int64 │
-├─────┼───────┼───────┼───────┤
-│ 1   │ 1     │ 2     │ 1     │
-│ 2   │ 1     │ 1     │ 2     │
-│ 3   │ 1     │ 2     │ 3     │
-│ 4   │ 2     │ 1     │ 4     │
-│ 5   │ 2     │ 2     │ 5     │
-│ 6   │ 1     │ 1     │ 6     │
-│ 7   │ 1     │ 2     │ 7     │
-│ 8   │ 2     │ 1     │ 8     │
-
-julia> gd = groupby(df, :a);
-
-julia> select(gd, :c => sum, nrow)
-8×3 DataFrame
-│ Row │ a     │ c_sum │ nrow  │
-│     │ Int64 │ Int64 │ Int64 │
-├─────┼───────┼───────┼───────┤
-│ 1   │ 1     │ 19    │ 5     │
-│ 2   │ 1     │ 19    │ 5     │
-│ 3   │ 1     │ 19    │ 5     │
-│ 4   │ 2     │ 17    │ 3     │
-│ 5   │ 2     │ 17    │ 3     │
-│ 6   │ 1     │ 19    │ 5     │
-│ 7   │ 1     │ 19    │ 5     │
-│ 8   │ 2     │ 17    │ 3     │
-
-julia> select(gd, :c => sum, nrow, ungroup=false)
-GroupedDataFrame with 2 groups based on key: a
-First Group (5 rows): a = 1
-│ Row │ a     │ c_sum │ nrow  │
-│     │ Int64 │ Int64 │ Int64 │
-├─────┼───────┼───────┼───────┤
-│ 1   │ 1     │ 19    │ 5     │
-│ 2   │ 1     │ 19    │ 5     │
-│ 3   │ 1     │ 19    │ 5     │
-│ 4   │ 1     │ 19    │ 5     │
-│ 5   │ 1     │ 19    │ 5     │
-⋮
-Last Group (3 rows): a = 2
-│ Row │ a     │ c_sum │ nrow  │
-│     │ Int64 │ Int64 │ Int64 │
-├─────┼───────┼───────┼───────┤
-│ 1   │ 2     │ 17    │ 3     │
-│ 2   │ 2     │ 17    │ 3     │
-│ 3   │ 2     │ 17    │ 3     │
-
-julia> select(gd, :c => (x -> sum(log, x)) => :sum_log_c) # specifying a name for target column
-8×2 DataFrame
-│ Row │ a     │ sum_log_c │
-│     │ Int64 │ Float64   │
-├─────┼───────┼───────────┤
-│ 1   │ 1     │ 5.52943   │
-│ 2   │ 1     │ 5.52943   │
-│ 3   │ 1     │ 5.52943   │
-│ 4   │ 2     │ 5.07517   │
-│ 5   │ 2     │ 5.07517   │
-│ 6   │ 1     │ 5.52943   │
-│ 7   │ 1     │ 5.52943   │
-│ 8   │ 2     │ 5.07517   │
-
-julia> select(gd, [:b, :c] .=> sum) # passing a vector of pairs
-8×3 DataFrame
-│ Row │ a     │ b_sum │ c_sum │
-│     │ Int64 │ Int64 │ Int64 │
-├─────┼───────┼───────┼───────┤
-│ 1   │ 1     │ 8     │ 19    │
-│ 2   │ 1     │ 8     │ 19    │
-│ 3   │ 1     │ 8     │ 19    │
-│ 4   │ 2     │ 4     │ 17    │
-│ 5   │ 2     │ 4     │ 17    │
-│ 6   │ 1     │ 8     │ 19    │
-│ 7   │ 1     │ 8     │ 19    │
-│ 8   │ 2     │ 4     │ 17    │
-
-julia> select(gd, :b => :b1, :c => :c1,
-              [:b, :c] => +, keepkeys=false) # multiple arguments, renaming and keepkeys
-8×3 DataFrame
-│ Row │ b1    │ c1    │ b_c_+ │
-│     │ Int64 │ Int64 │ Int64 │
-├─────┼───────┼───────┼───────┤
-│ 1   │ 2     │ 1     │ 3     │
-│ 2   │ 1     │ 2     │ 3     │
-│ 3   │ 2     │ 3     │ 5     │
-│ 4   │ 1     │ 4     │ 5     │
-│ 5   │ 2     │ 5     │ 7     │
-│ 6   │ 1     │ 6     │ 7     │
-│ 7   │ 2     │ 7     │ 9     │
-│ 8   │ 1     │ 8     │ 9     │
-
-julia> select(gd, :b, :c => sum) # passing columns and broadcasting
-8×3 DataFrame
-│ Row │ a     │ b     │ c_sum │
-│     │ Int64 │ Int64 │ Int64 │
-├─────┼───────┼───────┼───────┤
-│ 1   │ 1     │ 2     │ 19    │
-│ 2   │ 1     │ 1     │ 19    │
-│ 3   │ 1     │ 2     │ 19    │
-│ 4   │ 2     │ 1     │ 17    │
-│ 5   │ 2     │ 2     │ 17    │
-│ 6   │ 1     │ 1     │ 19    │
-│ 7   │ 1     │ 2     │ 19    │
-│ 8   │ 2     │ 1     │ 17    │
-
-julia> select(gd, :, AsTable(Not(:a)) => sum, renamecols=false)
-8×4 DataFrame
-│ Row │ a     │ b     │ c     │ b_c   │
-│     │ Int64 │ Int64 │ Int64 │ Int64 │
-├─────┼───────┼───────┼───────┼───────┤
-│ 1   │ 1     │ 2     │ 1     │ 3     │
-│ 2   │ 1     │ 1     │ 2     │ 3     │
-│ 3   │ 1     │ 2     │ 3     │ 5     │
-│ 4   │ 2     │ 1     │ 4     │ 5     │
-│ 5   │ 2     │ 2     │ 5     │ 7     │
-│ 6   │ 1     │ 1     │ 6     │ 7     │
-│ 7   │ 1     │ 2     │ 7     │ 9     │
-│ 8   │ 2     │ 1     │ 8     │ 9     │
-```
-"""
 select(gd::GroupedDataFrame, args...; copycols::Bool=true, keepkeys::Bool=true,
        ungroup::Bool=true, renamecols::Bool=true) =
     _combine_prepare(gd, args..., copycols=copycols, keepkeys=keepkeys,
                      ungroup=ungroup, keeprows=true, renamecols=renamecols)
 
-"""
-    transform(gd::GroupedDataFrame, args...;
-              copycols::Bool=true, keepkeys::Bool=true, ungroup::Bool=true)
-
-An equivalent of
-`select(gd, :, args..., copycols=copycols, keepkeys=keepkeys, ungroup=ungroup, renamecols=renamecols)`
-but keeps the columns of `parent(gd)` in their original order.
-
-# See also
+function transform(f::Base.Callable, gd::GroupedDataFrame; copycols::Bool=true,
+                keepkeys::Bool=true, ungroup::Bool=true, renamecols::Bool=true)
+    if f isa Colon
+        throw(ArgumentError("First argument must be a transformation if the second argument is a grouped data frame"))
+    end
+    return transform(gd, f, copycols=copycols, keepkeys=keepkeys, ungroup=ungroup)
+end
 
-[`groupby`](@ref), [`combine`](@ref), [`select`](@ref), [`select!`](@ref), [`transform!`](@ref)
-"""
 function transform(gd::GroupedDataFrame, args...; copycols::Bool=true,
                    keepkeys::Bool=true, ungroup::Bool=true, renamecols::Bool=true)
     res = select(gd, :, args..., copycols=copycols, keepkeys=keepkeys,
@@ -1743,21 +671,13 @@ function transform(gd::GroupedDataFrame, args...; copycols::Bool=true,
     return res
 end
 
-"""
-    select!(gd::GroupedDataFrame{DataFrame}, args...; ungroup::Bool=true, renamecols::Bool=true)
-
-An equivalent of
-`select(gd, args..., copycols=false, keepkeys=true, ungroup=ungroup, renamecols=renamecols)`
-but updates `parent(gd)` in place.
-
-`gd` is updated to reflect the new rows of its updated parent.
-If there are independent `GroupedDataFrame` objects constructed
-using the same parent data frame they might get corrupt.
-
-# See also
+function select!(f::Base.Callable, gd::GroupedDataFrame; ungroup::Bool=true, renamecols::Bool=true)
+    if f isa Colon
+        throw(ArgumentError("First argument must be a transformation if the second argument is a grouped data frame"))
+    end
+    return select!(gd, f, ungroup=ungroup)
+end
 
-[`groupby`](@ref), [`combine`](@ref), [`select`](@ref), [`transform`](@ref), [`transform!`](@ref)
-"""
 function select!(gd::GroupedDataFrame{DataFrame}, args...;
                  ungroup::Bool=true, renamecols::Bool=true)
     newdf = select(gd, args..., copycols=false, renamecols=renamecols)
@@ -1766,18 +686,13 @@ function select!(gd::GroupedDataFrame{DataFrame}, args...;
     return ungroup ? df : gd
 end
 
-"""
-    transform!(gd::GroupedDataFrame{DataFrame}, args...; ungroup::Bool=true, renamecols::Bool=true)
-
-An equivalent of
-`transform(gd, args..., copycols=false, keepkeys=true, ungroup=ungroup, renamecols=renamecols)`
-but updates `parent(gd)` in place
-and keeps the columns of `parent(gd)` in their original order.
-
-# See also
+function transform!(f::Base.Callable, gd::GroupedDataFrame; ungroup::Bool=true, renamecols::Bool=true)
+    if f isa Colon
+        throw(ArgumentError("First argument must be a transformation if the second argument is a grouped data frame"))
+    end
+    return transform!(gd, f, ungroup=ungroup)
+end
 
-[`groupby`](@ref), [`combine`](@ref), [`select`](@ref), [`select!`](@ref), [`transform`](@ref)
-"""
 function transform!(gd::GroupedDataFrame{DataFrame}, args...;
                     ungroup::Bool=true, renamecols::Bool=true)
     newdf = select(gd, :, args..., copycols=false, renamecols=renamecols)
diff --git a/test/grouping.jl b/test/grouping.jl
index 12d71e3068..ef7de6fb7a 100644
--- a/test/grouping.jl
+++ b/test/grouping.jl
@@ -492,6 +492,12 @@ end
     @test isempty(gd2.starts)
     @test isempty(gd2.ends)
     @test isequal_typed(parent(gd2), DataFrame(A=Int[], X=Int[]))
+
+    @test_throws ArgumentError combine(:x => identity, groupby_checked(DataFrame(x=[1,2,3]), :x))
+    @test_throws ArgumentError select(groupby_checked(DataFrame(x=[1,2,3], y=1), :x), [] => identity)
+    @test_throws ArgumentError select(groupby_checked(DataFrame(x=[1,2,3], y=1), :x), [:x, :y] => identity)
+    @test_throws ArgumentError select(groupby_checked(DataFrame(x=[1,2,3], y=1), :x), [] => identity => :z)
+    @test_throws ArgumentError select(groupby_checked(DataFrame(x=[1,2,3], y=1), :x), [:x, :y] => identity => :z)
 end
 
 @testset "grouping with missings" begin
@@ -770,61 +776,66 @@ end
     # Only test that different combine syntaxes work,
     # and rely on tests below for deeper checks
     @test combine(gd, :c => sum) ==
-        combine(:c => sum, gd) ==
         combine(gd, :c => sum => :c_sum) ==
-        combine(:c => sum => :c_sum, gd) ==
         combine(gd, [:c => sum]) ==
         combine(gd, [:c => sum => :c_sum]) ==
-        combine(d -> (c_sum=sum(d.c),), gd)
-    @test_throws MethodError combine(gd, d -> (c_sum=sum(d.c),))
+        combine(d -> (c_sum=sum(d.c),), gd) ==
+        combine(gd, d -> (c_sum=sum(d.c),)) ==
+        combine(gd, d -> (c_sum=[sum(d.c)],)) ==
+        combine(gd, d -> DataFrame(c_sum=sum(d.c))) ==
+        combine(gd, :c => (x -> [sum(x)]) => [:c_sum]) ==
+        combine(gd, :c => (x -> [(c_sum=sum(x),)]) => AsTable) ==
+        combine(gd, :c => (x -> fill(sum(x),1,1)) => [:c_sum]) ==
+        combine(gd, :c => (x -> [Dict(:c_sum => sum(x))]) => AsTable)
+    @test_throws ArgumentError combine(:c => sum, gd)
+    @test_throws ArgumentError combine(:, gd)
 
     @test combine(gd, :c => vexp) ==
-        combine(:c => vexp, gd) ==
         combine(gd, :c => vexp => :c_function) ==
-        combine(:c => vexp => :c_function, gd) ==
-        combine(:c => c -> (c_function = vexp(c),), gd) ==
         combine(gd, [:c => vexp]) ==
         combine(gd, [:c => vexp => :c_function]) ==
-        combine(d -> (c_function=exp.(d.c),), gd)
+        combine(d -> (c_function=exp.(d.c),), gd) ==
+        combine(gd, d -> (c_function=exp.(d.c),)) ==
+        combine(gd, :c => (x -> (c_function=exp.(x),)) => AsTable) ==
+        combine(gd, :c => ByRow(exp) => :c_function) ==
+        combine(gd, :c => ByRow(x -> [exp(x)]) => [:c_function])
     @test_throws ArgumentError combine(gd, :c => c -> (c_function = vexp(c),))
-    @test_throws MethodError combine(gd, d -> (c_function=exp.(d.c),))
 
     @test combine(gd, :b => sum, :c => sum) ==
         combine(gd, :b => sum => :b_sum, :c => sum => :c_sum) ==
         combine(gd, [:b => sum, :c => sum]) ==
         combine(gd, [:b => sum => :b_sum, :c => sum => :c_sum]) ==
-        combine(d -> (b_sum=sum(d.b), c_sum=sum(d.c)), gd)
-    @test_throws MethodError combine(gd, d -> (b_sum=sum(d.b), c_sum=sum(d.c)))
+        combine(d -> (b_sum=sum(d.b), c_sum=sum(d.c)), gd) ==
+        combine(gd, d -> (b_sum=sum(d.b), c_sum=sum(d.c))) ==
+        combine(gd, d -> (b_sum=sum(d.b),), d -> (c_sum=sum(d.c),))
 
     @test combine(gd, :b => vexp, :c => identity) ==
         combine(gd, :b => vexp => :b_function, :c => identity => :c_identity) ==
         combine(gd, [:b => vexp, :c => identity]) ==
         combine(gd, [:b => vexp => :b_function, :c => identity => :c_identity]) ==
         combine(d -> (b_function=vexp(d.b), c_identity=d.c), gd) ==
-        combine([:b, :c] => (b, c) -> (b_function=vexp(b), c_identity=c), gd)
-    @test_throws MethodError combine(gd, d -> (b_function=vexp(d.b), c_identity=d.c))
+        combine(gd, [:b, :c] => ((b, c) -> (b_function=vexp(b), c_identity=c)) => AsTable) ==
+        combine(gd, d -> (b_function=vexp(d.b), c_identity=d.c))
     @test_throws ArgumentError combine(gd, [:b, :c] => (b, c) -> (b_function=vexp(b), c_identity=c))
 
-    @test combine(x -> extrema(x.c), gd) == combine(:c => (x -> extrema(x)) => :x1, gd)
-    @test combine(x -> x.b+x.c, gd) == combine([:b,:c] => (+) => :x1, gd)
-    @test combine(x -> (p=x.b, q=x.c), gd) ==
-          combine([:b,:c] => (b,c) -> (p=b,q=c), gd)
-    @test_throws MethodError combine(gd, x -> (p=x.b, q=x.c))
+    @test combine(x -> extrema(x.c), gd) == combine(gd, :c => (x -> extrema(x)) => :x1)
+    @test combine(x -> hcat(extrema(x.c)...), gd) == combine(gd, :c => (x -> [extrema(x)]) => AsTable)
+    @test combine(x -> x.b+x.c, gd) == combine(gd, [:b,:c] => (+) => :x1)
+    @test combine(x -> (p=x.b, q=x.c), gd) == combine(gd, [:b,:c] => ((b,c) -> (p=b,q=c)) => AsTable)
     @test_throws ArgumentError combine(gd, [:b,:c] => (b,c) -> (p=b,q=c))
 
     @test combine(x -> DataFrame(p=x.b, q=x.c), gd) ==
-          combine([:b,:c] => (b,c) -> DataFrame(p=b,q=c), gd)
-    @test_throws MethodError combine(gd, x -> DataFrame(p=x.b, q=x.c))
+          combine(gd, [:b,:c] => ((b,c) -> DataFrame(p=b,q=c)) => AsTable) ==
+          combine(gd, x -> DataFrame(p=x.b, q=x.c))
     @test_throws ArgumentError combine(gd, [:b,:c] => (b,c) -> DataFrame(p=b,q=c))
 
     @test combine(x -> [1 2; 3 4], gd) ==
-          combine([:b,:c] => (b,c) -> [1 2; 3 4], gd)
-    @test_throws MethodError combine(gd, x -> [1 2; 3 4])
+          combine(gd, [:b,:c] => ((b,c) -> [1 2; 3 4]) => AsTable)
     @test_throws ArgumentError combine(gd, [:b,:c] => (b,c) -> [1 2; 3 4])
 
     @test combine(nrow, gd) == combine(gd, nrow) == combine(gd, [nrow => :nrow]) ==
           combine(gd, 1 => length => :nrow)
-    @test combine(nrow => :res, gd) == combine(gd, nrow => :res) ==
+    @test combine(gd, nrow => :res) ==
           combine(gd, [nrow => :res]) == combine(gd, 1 => length => :res)
     @test combine(gd, nrow => :res, nrow, [nrow => :res2]) ==
           combine(gd, 1 => length => :res, 1 => length => :nrow, 1 => length => :res2)
@@ -834,64 +845,54 @@ end
     @test_throws ArgumentError combine(gd, [nrow])
 
     for col in (:c, 3)
-        @test combine(col => sum, gd) == combine(d -> (c_sum=sum(d.c),), gd)
-        @test combine(col => x -> sum(x), gd) == combine(d -> (c_function=sum(d.c),), gd)
-        @test combine(col => x -> (z=sum(x),), gd) == combine(d -> (z=sum(d.c),), gd)
-        @test combine(col => x -> DataFrame(z=sum(x),), gd) == combine(d -> (z=sum(d.c),), gd)
-        @test combine(col => identity, gd) == combine(d -> (c_identity=d.c,), gd)
-        @test combine(col => x -> (z=x,), gd) == combine(d -> (z=d.c,), gd)
-
-        @test combine(col => sum => :xyz, gd) ==
-            combine(d -> (xyz=sum(d.c),), gd)
-        @test combine(col => (x -> sum(x)) => :xyz, gd) ==
-            combine(d -> (xyz=sum(d.c),), gd)
-        @test combine(col => (x -> (sum(x),)) => :xyz, gd) ==
-            combine(d -> (xyz=(sum(d.c),),), gd)
+        @test combine(gd, col => sum) == combine(d -> (c_sum=sum(d.c),), gd)
+        @test combine(gd, col => x -> sum(x)) == combine(d -> (c_function=sum(d.c),), gd)
+        @test combine(gd, col => (x -> (z=sum(x),)) => AsTable) == combine(d -> (z=sum(d.c),), gd)
+        @test combine(gd, col => (x -> DataFrame(z=sum(x),)) => AsTable) == combine(d -> (z=sum(d.c),), gd)
+        @test combine(gd, col => identity) == combine(d -> (c_identity=d.c,), gd)
+        @test combine(gd, col => (x -> (z=x,)) => AsTable) == combine(d -> (z=d.c,), gd)
+
+        @test combine(gd, col => sum => :xyz) == combine(d -> (xyz=sum(d.c),), gd)
+        @test combine(gd, col => (x -> sum(x)) => :xyz) == combine(d -> (xyz=sum(d.c),), gd)
+        @test combine(gd, col => (x -> (sum(x),)) => :xyz) == combine(d -> (xyz=(sum(d.c),),), gd)
         @test combine(nrow, gd) == combine(d -> (nrow=length(d.c),), gd)
-        @test combine(nrow => :res, gd) == combine(d -> (res=length(d.c),), gd)
-        @test combine(col => sum => :res, gd) == combine(d -> (res=sum(d.c),), gd)
-        @test combine(col => (x -> sum(x)) => :res, gd) == combine(d -> (res=sum(d.c),), gd)
-        @test_throws ArgumentError combine(col => (x -> (z=sum(x),)) => :xyz, gd)
-        @test_throws ArgumentError combine(col => (x -> DataFrame(z=sum(x),)) => :xyz, gd)
-        @test_throws ArgumentError combine(col => (x -> (z=x,)) => :xyz, gd)
-        @test_throws ArgumentError combine(col => x -> (z=1, xzz=[1]), gd)
+        @test combine(gd, nrow => :res) == combine(d -> (res=length(d.c),), gd)
+        @test combine(gd, col => sum => :res) == combine(d -> (res=sum(d.c),), gd)
+        @test combine(gd, col => (x -> sum(x)) => :res) == combine(d -> (res=sum(d.c),), gd)
+
+        @test_throws ArgumentError combine(gd, col => (x -> (z=sum(x),)) => :xyz)
+        @test_throws ArgumentError combine(gd, col => (x -> DataFrame(z=sum(x),)) => :xyz)
+        @test_throws ArgumentError combine(gd, col => (x -> (z=x,)) => :xyz)
+        @test_throws ArgumentError combine(gd, col => x -> (z=1, xzz=[1]))
     end
+
     for cols in ([:b, :c], 2:3, [2, 3], [false, true, true]), ungroup in (true, false)
-        @test combine(cols => (b,c) -> (y=exp.(b), z=c), gd, ungroup=ungroup) ==
-            combine(d -> (y=exp.(d.b), z=d.c), gd, ungroup=ungroup)
-        @test combine(cols => (b,c) -> [exp.(b) c], gd, ungroup=ungroup) ==
+        @test combine(gd, cols => ((b,c) -> (y=exp.(b), z=c)) => AsTable, ungroup=ungroup) ==
+            combine(gd, d -> (y=exp.(d.b), z=d.c), ungroup=ungroup)
+        @test combine(gd, cols => ((b,c) -> [exp.(b) c]) => AsTable, ungroup=ungroup) ==
             combine(d -> [exp.(d.b) d.c], gd, ungroup=ungroup)
-        @test combine(cols => ((b,c) -> sum(b) + sum(c)) => :xyz, gd, ungroup=ungroup) ==
+        @test combine(gd, cols => ((b,c) -> sum(b) + sum(c)) => :xyz, ungroup=ungroup) ==
             combine(d -> (xyz=sum(d.b) + sum(d.c),), gd, ungroup=ungroup)
-        if eltype(cols) === Bool
-            cols2 = [[false, true, false], [false, false, true]]
-            @test_throws MethodError combine((xyz = cols[1] => sum, xzz = cols2[2] => sum),
-                                             gd, ungroup=ungroup)
-            @test_throws MethodError combine((xyz = cols[1] => sum, xzz = cols2[1] => sum),
-                                             gd, ungroup=ungroup)
-            @test_throws MethodError combine((xyz = cols[1] => sum, xzz = cols2[2] => x -> first(x)),
-                                             gd, ungroup=ungroup)
-        else
-            cols2 = cols
-            @test combine(gd, cols2[1] => sum => :xyz, cols2[2] => sum => :xzz, ungroup=ungroup) ==
+        if eltype(cols) !== Bool
+            @test combine(gd, cols[1] => sum => :xyz, cols[2] => sum => :xzz, ungroup=ungroup) ==
                 combine(d -> (xyz=sum(d.b), xzz=sum(d.c)), gd, ungroup=ungroup)
-            @test combine(gd, cols2[1] => sum => :xyz, cols2[1] => sum => :xzz, ungroup=ungroup) ==
+            @test combine(gd, cols[1] => sum => :xyz, cols[1] => sum => :xzz, ungroup=ungroup) ==
                 combine(d -> (xyz=sum(d.b), xzz=sum(d.b)), gd, ungroup=ungroup)
-            @test combine(gd, cols2[1] => sum => :xyz,
-                    cols2[2] => (x -> first(x)) => :xzz, ungroup=ungroup) ==
+            @test combine(gd, cols[1] => sum => :xyz,
+                    cols[2] => (x -> first(x)) => :xzz, ungroup=ungroup) ==
                 combine(d -> (xyz=sum(d.b), xzz=first(d.c)), gd, ungroup=ungroup)
-            @test combine(gd, cols2[1] => vexp => :xyz,
-                    cols2[2] => sum => :xzz, ungroup=ungroup) ==
+            @test combine(gd, cols[1] => vexp => :xyz,
+                    cols[2] => sum => :xzz, ungroup=ungroup) ==
                 combine(d -> (xyz=vexp(d.b), xzz=fill(sum(d.c), length(vexp(d.b)))),
                         gd, ungroup=ungroup)
         end
 
-        @test_throws ArgumentError combine(cols => (b,c) -> (y=exp.(b), z=sum(c)),
-                                           gd, ungroup=ungroup)
-        @test_throws ArgumentError combine(cols2 => ((b,c) -> DataFrame(y=exp.(b),
-                                           z=sum(c))) => :xyz, gd, ungroup=ungroup)
-        @test_throws ArgumentError combine(cols2 => ((b,c) -> [exp.(b) c]) => :xyz,
-                                           gd, ungroup=ungroup)
+        @test_throws ArgumentError combine(gd, cols => (b,c) -> (y=exp.(b), z=sum(c)),
+                                           ungroup=ungroup)
+        @test_throws ArgumentError combine(gd, cols => ((b,c) -> DataFrame(y=exp.(b),
+                                           z=sum(c))) => :xyz, ungroup=ungroup)
+        @test_throws ArgumentError combine(gd, cols => ((b,c) -> [exp.(b) c]) => :xyz,
+                                           ungroup=ungroup)
     end
 end
 
@@ -1441,9 +1442,9 @@ end
     @test gdf[:] == gdf
     @test gdf[1:1] == gdf
 
-    @test validate_gdf(combine(nrow => :x1, gdf, ungroup=false)) ==
+    @test validate_gdf(combine(gdf, nrow => :x1, ungroup=false)) ==
           groupby_checked(DataFrame(x1=3), [])
-    @test validate_gdf(combine(:x2 => identity => :x2_identity, gdf, ungroup=false)) ==
+    @test validate_gdf(combine(gdf, :x2 => identity => :x2_identity, ungroup=false)) ==
           groupby_checked(DataFrame(x2_identity=[1,1,2]), [])
     @test isequal_typed(DataFrame(gdf), df)
 
@@ -1838,9 +1839,9 @@ end
         @test res == DataFrame(validate_gdf(combine(sdf -> sdf.x1[1] ? fr : er,
                                                     groupby_checked(df, :a), ungroup=false)))
         if fr isa AbstractVector && df.x1[1]
-            @test res == combine(:x1 => (x1 -> x1[1] ? fr : er) => :x1, gdf)
+            @test res == combine(gdf, :x1 => (x1 -> x1[1] ? fr : er) => :x1)
         else
-            @test res == combine(:x1 => x1 -> x1[1] ? fr : er, gdf)
+            @test res == combine(gdf, :x1 => (x1 -> x1[1] ? fr : er) => AsTable)
         end
         if nrow(res) == 0 && length(propertynames(er)) == 0 && er != rand(0, 1)
             @test res == DataFrame(a=[])
@@ -1867,9 +1868,8 @@ end
     @test combine(gdf, r"x" => cor) == DataFrame(g=[1,2], x1_x2_cor = [1.0, 1.0])
     @test combine(gdf, Not(:g) => ByRow(/)) == DataFrame(:g => [1,1,1,2,2,2], Symbol("x1_x2_/") => 1.0)
     @test combine(gdf, Between(:x2, :x1) => () -> 1) == DataFrame(:g => 1:2, Symbol("function") => 1)
-    @test combine(gdf, :x1 => :z) == combine(gdf, [:x1 => :z]) == combine(:x1 => :z, gdf) ==
-          DataFrame(g=[1,1,1,2,2,2], z=1:6)
-    @test validate_gdf(combine(:x1 => :z, groupby_checked(df, :g), ungroup=false)) ==
+    @test combine(gdf, :x1 => :z) == combine(gdf, [:x1 => :z]) == DataFrame(g=[1,1,1,2,2,2], z=1:6)
+    @test validate_gdf(combine(groupby_checked(df, :g), :x1 => :z, ungroup=false)) ==
           groupby_checked(DataFrame(g=[1,1,1,2,2,2], z=1:6), :g)
 end
 
@@ -1879,10 +1879,10 @@ end
     gdf = groupby_checked(df, :b)
     res = combine(sdf -> sdf.x[1:2], gdf)
     @test names(res) == ["b", "x1"]
-    res2 = combine(:x => x -> x[1:2], gdf)
+    res2 = combine(gdf, :x => x -> x[1:2])
     @test names(res2) == ["b", "x_function"]
     @test Matrix(res) == Matrix(res2)
-    res2 = combine(:x => (x -> x[1:2]) => :z, gdf)
+    res2 = combine(gdf, :x => (x -> x[1:2]) => :z)
     @test names(res2) == ["b", "z"]
     @test Matrix(res) == Matrix(res2)
 
@@ -1916,8 +1916,8 @@ end
     end
 
     for i in 1:2, v1 in [1, 1:2], v2 in [1, 1:2]
-        @test_throws ArgumentError combine([:b, :x] => ((b,x) -> b[1] == i ? x[v1] : (c=x[v2],)) => :v, gdf)
-        @test_throws ArgumentError combine([:b, :x] => ((b,x) -> b[1] == i ? x[v1] : (v=x[v2],)) => :v, gdf)
+        @test_throws ArgumentError combine(gdf, [:b, :x] => ((b,x) -> b[1] == i ? x[v1] : (c=x[v2],)) => :v)
+        @test_throws ArgumentError combine(gdf, [:b, :x] => ((b,x) -> b[1] == i ? x[v1] : (v=x[v2],)) => :v)
     end
 end
 
@@ -1927,8 +1927,8 @@ end
     @test_throws ArgumentError combine(gdf, :x1 => x -> DataFrame())
     @test_throws ArgumentError combine(gdf, :x1 => x -> (x=1, y=2))
     @test_throws ArgumentError combine(gdf, :x1 => x -> (x=[1], y=[2]))
-    @test_throws ArgumentError combine(gdf, :x1 => x -> (x=[1],y=2))
-    @test_throws ArgumentError combine(:x1 => x -> (x=[1], y=2), gdf)
+    @test_throws ArgumentError combine(gdf, :x1 => (x -> (x=[1],y=2)) => AsTable)
+    @test_throws ArgumentError combine(gdf, :x1 => x -> (x=[1], y=2))
     @test_throws ArgumentError combine(gdf, :x1 => x -> ones(2, 2))
     @test_throws ArgumentError combine(gdf, :x1 => x -> df[1, Not(:g)])
 end
@@ -2070,9 +2070,9 @@ end
 
     # whole column 4 options of single pair passed
     @test combine(gdf , AsTable([:x, :y]) => Ref) ==
-          combine(AsTable([:x, :y]) => Ref, gdf) ==
+          combine(gdf, AsTable([:x, :y]) => Ref) ==
           DataFrame(g=1:2, x_y_Ref=[(x=[1,2,3], y=[6,7,8]), (x=[4,5], y=[9,10])])
-    @test validate_gdf(combine(AsTable([:x, :y]) => Ref, gdf, ungroup=false)) ==
+    @test validate_gdf(combine(gdf, AsTable([:x, :y]) => Ref, ungroup=false)) ==
           groupby_checked(combine(gdf, AsTable([:x, :y]) => Ref), :g)
 
     @test combine(gdf, AsTable(1) => Ref) ==
@@ -2081,10 +2081,10 @@ end
 
     # ByRow 4 options of single pair passed
     @test combine(gdf, AsTable([:x, :y]) => ByRow(x -> [x])) ==
-          combine(AsTable([:x, :y]) => ByRow(x -> [x]), gdf) ==
+          combine(gdf, AsTable([:x, :y]) => ByRow(x -> [x])) ==
           DataFrame(g=[1,1,1,2,2],
                     x_y_function=[[(x=1,y=6)], [(x=2,y=7)], [(x=3,y=8)], [(x=4,y=9)], [(x=5,y=10)]])
-    @test validate_gdf(combine(AsTable([:x, :y]) => ByRow(x -> [x]), gdf, ungroup=false)) ==
+    @test validate_gdf(combine(gdf, AsTable([:x, :y]) => ByRow(x -> [x]), ungroup=false)) ==
           groupby_checked(combine(gdf, AsTable([:x, :y]) => ByRow(x -> [x])), :g)
 
     # whole column and ByRow test for multiple pairs passed
@@ -2967,7 +2967,7 @@ end
           DataFrame(a=1:3, b=4:6, c=7:9, d=10:12, a_b=5:2:9, a_b_etc=22:4:30)
     @test combine(gdf, :a => +, [:a, :b] => +, All() => +, renamecols=false) ==
           DataFrame(a=1:3, a_b=5:2:9, a_b_etc=22:4:30)
-    @test combine([:a, :b] => +, gdf, renamecols=false) == DataFrame(a=1:3, a_b=5:2:9)
+    @test combine(gdf, [:a, :b] => +, renamecols=false) == DataFrame(a=1:3, a_b=5:2:9)
     @test combine(identity, gdf, renamecols=false) == df
 
     df = DataFrame(a=1:3, b=4:6, c=7:9, d=10:12)
@@ -3022,4 +3022,154 @@ end
     @test_throws MethodError select(gdf, AsTable([]) => ByRow(inc0) => :bin)
 end
 
+@testset "aggregation of reordered groups" begin
+    df = DataFrame(id=[1, 2, 3, 1, 3, 2], x=1:6)
+    gdf = groupby(df, :id)
+    @test select(df, :id, :x => x -> 2x) == select(gdf, :x => x -> 2x)
+    @test select(df, identity) == select(gdf, identity)
+    @test select(df, :id, x -> (a=x.x, b=x.x)) == select(gdf, x -> (a=x.x, b=x.x))
+    @test transform(df, :x => x -> 2x) == transform(gdf, :x => x -> 2x)
+    @test transform(df, identity) == transform(gdf, identity)
+    @test transform(df, x -> (a=x.x, b=x.x)) == transform(gdf, x -> (a=x.x, b=x.x))
+    @test combine(gdf, :x => x -> 2x) ==
+          DataFrame(id=[1, 1, 2, 2, 3, 3], x_function=[2, 8, 4, 12, 6, 10])
+    @test combine(gdf, identity) == DataFrame(gdf)
+    @test combine(gdf, x -> (a=x.x, b=x.x)) ==
+          DataFrame(id=[1, 1, 2, 2, 3, 3], a=[1, 4, 2, 6, 3, 5], b=[1, 4, 2, 6, 3, 5])
+    gdf = groupby(df, :id)[[3, 1, 2]]
+    @test select(df, :id, :x => x -> 2x) == select(gdf, :x => x -> 2x)
+    @test select(df, identity) == select(gdf, identity)
+    @test select(df, :id, x -> (a=x.x, b=x.x)) == select(gdf, x -> (a=x.x, b=x.x))
+    @test transform(df, :x => x -> 2x) == transform(gdf, :x => x -> 2x)
+    @test transform(df, identity) == transform(gdf, identity)
+    @test transform(df, x -> (a=x.x, b=x.x)) == transform(gdf, x -> (a=x.x, b=x.x))
+    @test combine(gdf, :x => x -> 2x) ==
+          DataFrame(id=[3, 3, 1, 1, 2, 2], x_function=[6, 10, 2, 8, 4, 12])
+    @test combine(gdf, identity) == df[[3, 5, 1, 4, 2, 6], :]
+    @test combine(gdf, x -> (a=x.x, b=x.x)) ==
+          DataFrame(id=[3, 3, 1, 1, 2, 2], a=[3, 5, 1, 4, 2, 6], b=[3, 5, 1, 4, 2, 6])
+
+    df = DataFrame(id = [3, 2, 1, 3, 1, 2], x=1:6)
+    gdf = groupby(df, :id, sort=true)
+    @test select(df, :id, :x => x -> 2x) == select(gdf, :x => x -> 2x)
+    @test select(df, identity) == select(gdf, identity)
+    @test select(df, :id, x -> (a=x.x, b=x.x)) == select(gdf, x -> (a=x.x, b=x.x))
+    @test transform(df, :x => x -> 2x) == transform(gdf, :x => x -> 2x)
+    @test transform(df, identity) == transform(gdf, identity)
+    @test transform(df, x -> (a=x.x, b=x.x)) == transform(gdf, x -> (a=x.x, b=x.x))
+    @test combine(gdf, :x => x -> 2x) ==
+          DataFrame(id=[1, 1, 2, 2, 3, 3], x_function=[6, 10, 4, 12, 2, 8])
+    @test combine(gdf, identity) == DataFrame(id=[1, 1, 2, 2, 3, 3], x=[3, 5, 2, 6, 1, 4])
+    @test combine(gdf, x -> (a=x.x, b=x.x)) ==
+          DataFrame(id=[1, 1, 2, 2, 3, 3], a=[3, 5, 2, 6, 1, 4], b=[3, 5, 2, 6, 1, 4])
+
+    gdf = groupby(df, :id)[[3, 1, 2]]
+    @test select(df, :id, :x => x -> 2x) == select(gdf, :x => x -> 2x)
+    @test select(df, identity) == select(gdf, identity)
+    @test select(df, :id, x -> (a=x.x, b=x.x)) == select(gdf, x -> (a=x.x, b=x.x))
+    @test transform(df, :x => x -> 2x) == transform(gdf, :x => x -> 2x)
+    @test transform(df, identity) == transform(gdf, identity)
+    @test transform(df, x -> (a=x.x, b=x.x)) == transform(gdf, x -> (a=x.x, b=x.x))
+    @test combine(gdf, :x => x -> 2x) ==
+          DataFrame(id=[1, 1, 3, 3, 2, 2], x_function=[6, 10, 2, 8, 4, 12])
+    @test combine(gdf, identity) == DataFrame(id=[1, 1, 3, 3, 2, 2], x=[3, 5, 1, 4, 2, 6])
+    @test combine(gdf, x -> (a=x.x, b=x.x)) ==
+          DataFrame(id=[1, 1, 3, 3, 2, 2], a=[3, 5, 1, 4, 2, 6], b=[3, 5, 1, 4, 2, 6])
+end
+
+@testset "basic tests of advanced rules with multicolumn output" begin
+    df = DataFrame(id=[1, 2, 3, 1, 3, 2], x=1:6)
+    gdf = groupby(df, :id)
+
+    @test combine(gdf, x -> reshape(1:4, 2, 2)) ==
+          DataFrame(id=[1,1,2,2,3,3], x1=[1,2,1,2,1,2], x2=[3,4,3,4,3,4])
+    @test combine(gdf, x -> DataFrame(a=1:2, b=3:4)) ==
+          DataFrame(id=[1,1,2,2,3,3], a=[1,2,1,2,1,2], b=[3,4,3,4,3,4])
+    @test combine(gdf, x -> DataFrame(a=1:2, b=3:4)[1, :]) ==
+          DataFrame(id=[1,2,3], a=[1,1,1], b=[3,3,3])
+    @test combine(gdf, x -> (a=1, b=3)) ==
+          DataFrame(id=[1,2,3], a=[1,1,1], b=[3,3,3])
+    @test combine(gdf, x -> (a=1:2, b=3:4)) ==
+          DataFrame(id=[1,1,2,2,3,3], a=[1,2,1,2,1,2], b=[3,4,3,4,3,4])
+    @test combine(gdf, :x => (x -> Dict(:a => 1:2, :b => 3:4)) => AsTable) ==
+          DataFrame(id=[1,1,2,2,3,3], a=[1,2,1,2,1,2], b=[3,4,3,4,3,4])
+    @test combine(gdf, :x => ByRow(x -> [x,x+1,x+2]) => AsTable) ==
+          DataFrame(id=[1,1,2,2,3,3], x1=[1,4,2,6,3,5], x2=[2,5,3,7,4,6], x3=[3,6,4,8,5,7])
+    @test combine(gdf, :x => ByRow(x -> (x,x+1,x+2)) => AsTable) ==
+          DataFrame(id=[1,1,2,2,3,3], x1=[1,4,2,6,3,5], x2=[2,5,3,7,4,6], x3=[3,6,4,8,5,7])
+    @test combine(gdf, :x => ByRow(x -> (a=x,b=x+1,c=x+2)) => AsTable) ==
+          DataFrame(id=[1,1,2,2,3,3], a=[1,4,2,6,3,5], b=[2,5,3,7,4,6], c=[3,6,4,8,5,7])
+    @test combine(gdf, :x => ByRow(x -> [x,x+1,x+2]) => [:p, :q, :r]) ==
+          DataFrame(id=[1,1,2,2,3,3], p=[1,4,2,6,3,5], q=[2,5,3,7,4,6], r=[3,6,4,8,5,7])
+    @test combine(gdf, :x => ByRow(x -> (x,x+1,x+2)) => [:p, :q, :r]) ==
+          DataFrame(id=[1,1,2,2,3,3], p=[1,4,2,6,3,5], q=[2,5,3,7,4,6], r=[3,6,4,8,5,7])
+    @test combine(gdf, :x => ByRow(x -> (a=x,b=x+1,c=x+2)) => [:p, :q, :r]) ==
+          DataFrame(id=[1,1,2,2,3,3], p=[1,4,2,6,3,5], q=[2,5,3,7,4,6], r=[3,6,4,8,5,7])
+    @test combine(gdf, :x => ByRow(x -> 1) => [:p]) == DataFrame(id=[1,1,2,2,3,3], p=1)
+    @test_throws ArgumentError combine(gdf, :x => (x -> 1) => [:p])
+
+    @test select(gdf, x -> reshape(1:4, 2, 2)) ==
+          DataFrame(id=[1,2,3,1,3,2], x1=[1,1,1,2,2,2], x2=[3,3,3,4,4,4])
+    @test select(gdf, x -> DataFrame(a=1:2, b=3:4)) ==
+          DataFrame(id=[1,2,3,1,3,2], a=[1,1,1,2,2,2], b=[3,3,3,4,4,4])
+    @test select(gdf, x -> DataFrame(a=1:2, b=3:4)[1, :]) ==
+          DataFrame(id=[1,2,3,1,3,2], a=[1,1,1,1,1,1], b=[3,3,3,3,3,3])
+    @test select(gdf, x -> (a=1, b=3)) ==
+          DataFrame(id=[1,2,3,1,3,2], a=[1,1,1,1,1,1], b=[3,3,3,3,3,3])
+    @test select(gdf, x -> (a=1:2, b=3:4)) ==
+          DataFrame(id=[1,2,3,1,3,2], a=[1,1,1,2,2,2], b=[3,3,3,4,4,4])
+    @test select(gdf, :x => (x -> Dict(:a => 1:2, :b => 3:4)) => AsTable) ==
+          DataFrame(id=[1,2,3,1,3,2], a=[1,1,1,2,2,2], b=[3,3,3,4,4,4])
+    @test select(gdf, :x => ByRow(x -> [x,x+1,x+2]) => AsTable) ==
+          DataFrame(id=[1,2,3,1,3,2], x1=[1,2,3,4,5,6], x2=[2,3,4,5,6,7], x3=[3,4,5,6,7,8])
+    @test select(gdf, :x => ByRow(x -> (x,x+1,x+2)) => AsTable) ==
+          DataFrame(id=[1,2,3,1,3,2], x1=[1,2,3,4,5,6], x2=[2,3,4,5,6,7], x3=[3,4,5,6,7,8])
+    @test select(gdf, :x => ByRow(x -> (a=x,b=x+1,c=x+2)) => AsTable) ==
+          DataFrame(id=[1,2,3,1,3,2], a=[1,2,3,4,5,6], b=[2,3,4,5,6,7], c=[3,4,5,6,7,8])
+    @test select(gdf, :x => ByRow(x -> [x,x+1,x+2]) => [:p, :q, :r]) ==
+          DataFrame(id=[1,2,3,1,3,2], p=[1,2,3,4,5,6], q=[2,3,4,5,6,7], r=[3,4,5,6,7,8])
+    @test select(gdf, :x => ByRow(x -> (x,x+1,x+2)) => [:p, :q, :r]) ==
+          DataFrame(id=[1,2,3,1,3,2], p=[1,2,3,4,5,6], q=[2,3,4,5,6,7], r=[3,4,5,6,7,8])
+    @test select(gdf, :x => ByRow(x -> (a=x,b=x+1,c=x+2)) => [:p, :q, :r]) ==
+          DataFrame(id=[1,2,3,1,3,2], p=[1,2,3,4,5,6], q=[2,3,4,5,6,7], r=[3,4,5,6,7,8])
+    @test select(gdf, :x => ByRow(x -> 1) => [:p]) == DataFrame(id=[1,2,3,1,3,2], p=1)
+    @test_throws ArgumentError select(gdf, :x => (x -> 1) => [:p])
+end
+
+@testset "tests of invariants of transformation functions" begin
+    Random.seed!(1234)
+    df = DataFrame(x=rand(1000), id=rand(1:20, 1000), y=rand(1000), z=rand(1000))
+    gdf = groupby_checked(df, :id)
+    gdf2 = gdf[20:-1:1]
+    @test transform(df, x -> sum(df.x), x -> (p=2x.x, q=2x.y), :id => :id2, :z => :x,
+                    [:x, :y, :z] => +, [:y, :z] => ByRow(minmax) => [:min, :max], :y) ==
+          transform(gdf, x -> sum(parent(x).x), x -> (p=2x.x, q=2x.y), :id => :id2, :z => :x,
+                    [:x, :y, :z] => +, [:y, :z] => ByRow(minmax) => [:min, :max], :y) ==
+          transform(gdf2, x -> sum(parent(x).x), x -> (p=2x.x, q=2x.y), :id => :id2, :z => :x,
+                    [:x, :y, :z] => +, [:y, :z] => ByRow(minmax) => [:min, :max], :y) ==
+          DataFrame(:x => df.z, :id => df.id, :y => df.y, :z => df.z, :x1 => sum(df.x),
+                    :p => 2df.x, :q => 2df.y, :id2 => df.id, Symbol("x_y_z_+") => df.x+df.y+df.z,
+                    :min => min.(df.y, df.z), :max => max.(df.y, df.z))
+
+    @test select(df, x -> sum(df.x), x -> (p=2x.x, q=2x.y), :id => :id2, :z => :x,
+                [:x, :y, :z] => +, [:y, :z] => ByRow(minmax) => [:min, :max], :y) ==
+          select(gdf, x -> sum(parent(x).x), x -> (p=2x.x, q=2x.y), :id => :id2, :z => :x,
+                [:x, :y, :z] => +, [:y, :z] => ByRow(minmax) => [:min, :max], :y, keepkeys=false) ==
+          select(gdf2, x -> sum(parent(x).x), x -> (p=2x.x, q=2x.y), :id => :id2, :z => :x,
+                [:x, :y, :z] => +, [:y, :z] => ByRow(minmax) => [:min, :max], :y, keepkeys=false) ==
+          DataFrame(:x1 => sum(df.x), :p => 2df.x, :q => 2df.y, :id2 => df.id,
+                    :x => df.z, Symbol("x_y_z_+") => df.x+df.y+df.z,
+                    :min => min.(df.y, df.z), :max => max.(df.y, df.z), :y => df.y)
+
+    @test combine(df, x -> sum(df.x), x -> (p=2x.x, q=2x.y), :id => :id2, :z => :x,
+                  [:x, :y, :z] => +, [:y, :z] => ByRow(minmax) => [:min, :max], :y) |> sort ==
+          combine(gdf, x -> sum(parent(x).x), x -> (p=2x.x, q=2x.y), :id => :id2, :z => :x,
+                  [:x, :y, :z] => +, [:y, :z] => ByRow(minmax) => [:min, :max], :y, keepkeys=false) |> sort ==
+          combine(gdf2, x -> sum(parent(x).x), x -> (p=2x.x, q=2x.y), :id => :id2, :z => :x,
+                  [:x, :y, :z] => +, [:y, :z] => ByRow(minmax) => [:min, :max], :y, keepkeys=false) |> sort ==
+          DataFrame(:x1 => sum(df.x), :p => 2df.x, :q => 2df.y, :id2 => df.id,
+                    :x => df.z, Symbol("x_y_z_+") => df.x+df.y+df.z,
+                    :min => min.(df.y, df.z), :max => max.(df.y, df.z), :y => df.y) |> sort
+end
+
 end # module
diff --git a/test/select.jl b/test/select.jl
index fa9b4143e6..fe612794de 100644
--- a/test/select.jl
+++ b/test/select.jl
@@ -1342,178 +1342,178 @@ end
     @test df == DataFrame(a=1:3, b=4:6, c=7:9, d=10:12, a_b=5:2:9, a_b_etc=22:4:30)
 end
 
-@testset "additional tests for new rules" begin
-    @testset "transformation function with a function as first argument" begin
-        for df in (DataFrame(a=1:2, b=3:4, c=5:6), view(DataFrame(a=1:3, b=3:5, c=5:7, d=11:13), 1:2, 1:3))
-            @test select(sdf -> sdf.b, df) == DataFrame(x1=3:4)
-            @test select(sdf -> (b = 2sdf.b,), df) == DataFrame(b=[6,8])
-            @test select(sdf -> (b = 1,), df) == DataFrame(b=[1, 1])
-            @test_throws ArgumentError select(sdf -> (b = [1],), df)
-            @test select(sdf -> (b = [1, 5],), df) == DataFrame(b=[1, 5])
-            @test select(sdf -> 1, df) == DataFrame(x1=[1, 1])
-            @test select(sdf -> fill([1]), df) == DataFrame(x1=[[1], [1]])
-            @test select(sdf -> Ref([1]), df) == DataFrame(x1=[[1], [1]])
-            @test select(sdf -> "x", df) == DataFrame(x1=["x", "x"])
-            @test select(sdf -> [[1,2],[3,4]], df) == DataFrame(x1=[[1,2],[3,4]])
-            for ret in (DataFrame(), NamedTuple(), zeros(0,0), DataFrame(t=1)[1, 1:0])
-                @test select(sdf -> ret, df) == DataFrame()
-            end
-            @test_throws ArgumentError select(sdf -> DataFrame(a=10), df)
-            @test_throws ArgumentError select(sdf -> zeros(1, 2), df)
-            @test select(sdf -> DataFrame(a=[10, 11]), df) == DataFrame(a=[10, 11])
-            @test select(sdf -> [10 11; 12 13], df) == DataFrame(x1=[10, 12], x2=[11, 13])
-            @test select(sdf -> DataFrame(a=10)[1, :], df) == DataFrame(a=[10, 10])
-
-            @test transform(sdf -> sdf.b, df) == [df DataFrame(x1=3:4)]
-            @test transform(sdf -> (b = 2sdf.b,), df) == DataFrame(a=1:2, b=[6,8], c=5:6)
-            @test transform(sdf -> (b = 1,), df) == DataFrame(a=[1,2], b=[1, 1], c=[5,6])
-            @test_throws ArgumentError transform(sdf -> (b = [1],), df)
-            @test transform(sdf -> (b = [1, 5],), df) == DataFrame(a=[1,2], b=[1, 5], c=[5,6])
-            @test transform(sdf -> 1, df) == DataFrame(a=1:2, b=3:4, c=5:6, x1=1)
-            @test transform(sdf -> fill([1]), df) == DataFrame(a=1:2, b=3:4, c=5:6, x1=[[1],[1]])
-            @test transform(sdf -> Ref([1]), df) == DataFrame(a=1:2, b=3:4, c=5:6, x1=[[1],[1]])
-            @test transform(sdf -> "x", df) == DataFrame(a=1:2, b=3:4, c=5:6, x1="x")
-            @test transform(sdf -> [[1,2],[3,4]], df) == DataFrame(a=1:2, b=3:4, c=5:6, x1=[[1,2],[3,4]])
-            for ret in (DataFrame(), NamedTuple(), zeros(0,0), DataFrame(t=1)[1, 1:0])
-                @test transform(sdf -> ret, df) == df
-            end
-            @test_throws ArgumentError transform(sdf -> DataFrame(a=10), df)
-            @test_throws ArgumentError transform(sdf -> zeros(1, 2), df)
-            @test transform(sdf -> DataFrame(a=[10, 11]), df) == DataFrame(a=[10, 11], b=3:4, c=5:6)
-            @test transform(sdf -> [10 11; 12 13], df) == DataFrame(a=1:2, b=3:4, c=5:6, x1=[10, 12], x2=[11, 13])
-            @test transform(sdf -> DataFrame(a=10)[1, :], df) == DataFrame(a=[10, 10], b=3:4, c=5:6)
-
-            @test combine(sdf -> sdf.b, df) == DataFrame(x1=3:4)
-            @test combine(sdf -> (b = 2sdf.b,), df) == DataFrame(b=[6,8])
-            @test combine(sdf -> (b = 1,), df) == DataFrame(b=[1])
-            @test combine(sdf -> (b = [1],), df) == DataFrame(b=[1])
-            @test combine(sdf -> (b = [1, 5],), df) == DataFrame(b=[1, 5])
-            @test combine(sdf -> 1, df) == DataFrame(x1=[1])
-            @test combine(sdf -> fill([1]), df) == DataFrame(x1=[[1]])
-            @test combine(sdf -> Ref([1]), df) == DataFrame(x1=[[1]])
-            @test combine(sdf -> "x", df) == DataFrame(x1=["x"])
-            @test combine(sdf -> [[1,2],[3,4]], df) == DataFrame(x1=[[1,2],[3,4]])
-            for ret in (DataFrame(), NamedTuple(), zeros(0,0), DataFrame(t=1)[1, 1:0])
-                @test combine(sdf -> ret, df) == DataFrame()
-            end
-            @test combine(sdf -> DataFrame(a=10), df) == DataFrame(a=10)
-            @test combine(sdf -> zeros(1, 2), df) == DataFrame(x1=0, x2=0)
-            @test combine(sdf -> DataFrame(a=[10, 11]), df) == DataFrame(a=[10, 11])
-            @test combine(sdf -> [10 11; 12 13], df) == DataFrame(x1=[10, 12], x2=[11, 13])
-            @test combine(sdf -> DataFrame(a=10)[1, :], df) == DataFrame(a=[10])
+@testset "transformation function with a function as first argument" begin
+    for df in (DataFrame(a=1:2, b=3:4, c=5:6), view(DataFrame(a=1:3, b=3:5, c=5:7, d=11:13), 1:2, 1:3))
+        @test select(sdf -> sdf.b, df) == DataFrame(x1=3:4)
+        @test select(sdf -> (b = 2sdf.b,), df) == DataFrame(b=[6,8])
+        @test select(sdf -> (b = 1,), df) == DataFrame(b=[1, 1])
+        @test_throws ArgumentError select(sdf -> (b = [1],), df)
+        @test select(sdf -> (b = [1, 5],), df) == DataFrame(b=[1, 5])
+        @test select(sdf -> 1, df) == DataFrame(x1=[1, 1])
+        @test select(sdf -> fill([1]), df) == DataFrame(x1=[[1], [1]])
+        @test select(sdf -> Ref([1]), df) == DataFrame(x1=[[1], [1]])
+        @test select(sdf -> "x", df) == DataFrame(x1=["x", "x"])
+        @test select(sdf -> [[1,2],[3,4]], df) == DataFrame(x1=[[1,2],[3,4]])
+        for ret in (DataFrame(), NamedTuple(), zeros(0,0), DataFrame(t=1)[1, 1:0])
+            @test select(sdf -> ret, df) == DataFrame()
         end
-
-        df = DataFrame(a=1:2, b=3:4, c=5:6)
-        @test select!(sdf -> sdf.b, copy(df)) == DataFrame(x1=3:4)
-        @test select!(sdf -> (b = 2sdf.b,), copy(df)) == DataFrame(b=[6,8])
-        @test select!(sdf -> (b = 1,), copy(df)) == DataFrame(b=[1, 1])
-        @test_throws ArgumentError select!(sdf -> (b = [1],), copy(df))
-        @test select!(sdf -> (b = [1, 5],), copy(df)) == DataFrame(b=[1, 5])
-        @test select!(sdf -> 1, copy(df)) == DataFrame(x1=[1, 1])
-        @test select!(sdf -> fill([1]), copy(df)) == DataFrame(x1=[[1], [1]])
-        @test select!(sdf -> Ref([1]), copy(df)) == DataFrame(x1=[[1], [1]])
-        @test select!(sdf -> "x", copy(df)) == DataFrame(x1=["x", "x"])
-        @test select!(sdf -> [[1,2],[3,4]], copy(df)) == DataFrame(x1=[[1,2],[3,4]])
+        @test_throws ArgumentError select(sdf -> DataFrame(a=10), df)
+        @test_throws ArgumentError select(sdf -> zeros(1, 2), df)
+        @test select(sdf -> DataFrame(a=[10, 11]), df) == DataFrame(a=[10, 11])
+        @test select(sdf -> [10 11; 12 13], df) == DataFrame(x1=[10, 12], x2=[11, 13])
+        @test select(sdf -> DataFrame(a=10)[1, :], df) == DataFrame(a=[10, 10])
+
+        @test transform(sdf -> sdf.b, df) == [df DataFrame(x1=3:4)]
+        @test transform(sdf -> (b = 2sdf.b,), df) == DataFrame(a=1:2, b=[6,8], c=5:6)
+        @test transform(sdf -> (b = 1,), df) == DataFrame(a=[1,2], b=[1, 1], c=[5,6])
+        @test_throws ArgumentError transform(sdf -> (b = [1],), df)
+        @test transform(sdf -> (b = [1, 5],), df) == DataFrame(a=[1,2], b=[1, 5], c=[5,6])
+        @test transform(sdf -> 1, df) == DataFrame(a=1:2, b=3:4, c=5:6, x1=1)
+        @test transform(sdf -> fill([1]), df) == DataFrame(a=1:2, b=3:4, c=5:6, x1=[[1],[1]])
+        @test transform(sdf -> Ref([1]), df) == DataFrame(a=1:2, b=3:4, c=5:6, x1=[[1],[1]])
+        @test transform(sdf -> "x", df) == DataFrame(a=1:2, b=3:4, c=5:6, x1="x")
+        @test transform(sdf -> [[1,2],[3,4]], df) == DataFrame(a=1:2, b=3:4, c=5:6, x1=[[1,2],[3,4]])
         for ret in (DataFrame(), NamedTuple(), zeros(0,0), DataFrame(t=1)[1, 1:0])
-            @test select!(sdf -> ret, copy(df)) == DataFrame()
+            @test transform(sdf -> ret, df) == df
         end
-        @test_throws ArgumentError select!(sdf -> DataFrame(a=10), copy(df))
-        @test_throws ArgumentError select!(sdf -> zeros(1, 2), copy(df))
-        @test select!(sdf -> DataFrame(a=[10, 11]), copy(df)) == DataFrame(a=[10, 11])
-        @test select!(sdf -> [10 11; 12 13], copy(df)) == DataFrame(x1=[10, 12], x2=[11, 13])
-        @test select!(sdf -> DataFrame(a=10)[1, :], copy(df)) == DataFrame(a=[10, 10])
-
-        @test transform!(sdf -> sdf.b, copy(df)) == [df DataFrame(x1=3:4)]
-        @test transform!(sdf -> (b = 2sdf.b,), copy(df)) == DataFrame(a=1:2, b=[6,8], c=5:6)
-        @test transform!(sdf -> (b = 1,), copy(df)) == DataFrame(a=[1,2], b=[1, 1], c=[5,6])
-        @test_throws ArgumentError transform!(sdf -> (b = [1],), copy(df))
-        @test transform!(sdf -> (b = [1, 5],), copy(df)) == DataFrame(a=[1,2], b=[1, 5], c=[5,6])
-        @test transform!(sdf -> 1, copy(df)) == DataFrame(a=1:2, b=3:4, c=5:6, x1=1)
-        @test transform!(sdf -> fill([1]), copy(df)) == DataFrame(a=1:2, b=3:4, c=5:6, x1=[[1],[1]])
-        @test transform!(sdf -> Ref([1]), copy(df)) == DataFrame(a=1:2, b=3:4, c=5:6, x1=[[1],[1]])
-        @test transform!(sdf -> "x", copy(df)) == DataFrame(a=1:2, b=3:4, c=5:6, x1="x")
-        @test transform!(sdf -> [[1,2],[3,4]], copy(df)) == DataFrame(a=1:2, b=3:4, c=5:6, x1=[[1,2],[3,4]])
+        @test_throws ArgumentError transform(sdf -> DataFrame(a=10), df)
+        @test_throws ArgumentError transform(sdf -> zeros(1, 2), df)
+        @test transform(sdf -> DataFrame(a=[10, 11]), df) == DataFrame(a=[10, 11], b=3:4, c=5:6)
+        @test transform(sdf -> [10 11; 12 13], df) == DataFrame(a=1:2, b=3:4, c=5:6, x1=[10, 12], x2=[11, 13])
+        @test transform(sdf -> DataFrame(a=10)[1, :], df) == DataFrame(a=[10, 10], b=3:4, c=5:6)
+
+        @test combine(sdf -> sdf.b, df) == DataFrame(x1=3:4)
+        @test combine(sdf -> (b = 2sdf.b,), df) == DataFrame(b=[6,8])
+        @test combine(sdf -> (b = 1,), df) == DataFrame(b=[1])
+        @test combine(sdf -> (b = [1],), df) == DataFrame(b=[1])
+        @test combine(sdf -> (b = [1, 5],), df) == DataFrame(b=[1, 5])
+        @test combine(sdf -> 1, df) == DataFrame(x1=[1])
+        @test combine(sdf -> fill([1]), df) == DataFrame(x1=[[1]])
+        @test combine(sdf -> Ref([1]), df) == DataFrame(x1=[[1]])
+        @test combine(sdf -> "x", df) == DataFrame(x1=["x"])
+        @test combine(sdf -> [[1,2],[3,4]], df) == DataFrame(x1=[[1,2],[3,4]])
         for ret in (DataFrame(), NamedTuple(), zeros(0,0), DataFrame(t=1)[1, 1:0])
-            @test transform!(sdf -> ret, copy(df)) == df
+            @test combine(sdf -> ret, df) == DataFrame()
         end
-        @test_throws ArgumentError transform!(sdf -> DataFrame(a=10), copy(df))
-        @test_throws ArgumentError transform!(sdf -> zeros(1, 2), copy(df))
-        @test transform!(sdf -> DataFrame(a=[10, 11]), copy(df)) == DataFrame(a=[10, 11], b=3:4, c=5:6)
-        @test transform!(sdf -> [10 11; 12 13], copy(df)) == DataFrame(a=1:2, b=3:4, c=5:6, x1=[10, 12], x2=[11, 13])
-        @test transform!(sdf -> DataFrame(a=10)[1, :], copy(df)) == DataFrame(a=[10, 10], b=3:4, c=5:6)
+        @test combine(sdf -> DataFrame(a=10), df) == DataFrame(a=10)
+        @test combine(sdf -> zeros(1, 2), df) == DataFrame(x1=0, x2=0)
+        @test combine(sdf -> DataFrame(a=[10, 11]), df) == DataFrame(a=[10, 11])
+        @test combine(sdf -> [10 11; 12 13], df) == DataFrame(x1=[10, 12], x2=[11, 13])
+        @test combine(sdf -> DataFrame(a=10)[1, :], df) == DataFrame(a=[10])
     end
 
-    @testset "transformation function with multiple columns as destination" begin
-        for df in (DataFrame(a=1:2, b=3:4, c=5:6), view(DataFrame(a=1:3, b=3:5, c=5:7, d=11:13), 1:2, 1:3))
-            for fun in (select, combine, transform),
-                res in (DataFrame(), DataFrame(a=1,b=2)[1, :], ones(1,1),
-                        (a=1,b=2), (a=[1], b=[2]), (a=1, b=[2]))
-                @test_throws ArgumentError fun(df, :a => x -> res)
-                @test_throws ArgumentError fun(df, :a => (x -> res) => :z)
-            end
-            for res in (DataFrame(x1=1, x2=2)[1, :], (x1=1,x2=2))
-                @test select(df, :a => (x -> res) => AsTable) == DataFrame(x1=[1,1], x2=[2,2])
-                @test transform(df, :a => (x -> res) => AsTable) == [df DataFrame(x1=[1,1], x2=[2,2])]
-                @test combine(df, :a => (x -> res) => AsTable) == DataFrame(x1=[1], x2=[2])
-                @test select(df, :a => (x -> res) => [:p, :q]) == DataFrame(p=[1,1], q=[2,2])
-                @test transform(df, :a => (x -> res) => [:p, :q]) == [df DataFrame(p=[1,1], q=[2,2])]
-                @test combine(df, :a => (x -> res) => [:p, :q]) == DataFrame(p=[1], q=[2])
-                @test_throws ArgumentError select(df, :a => (x -> res) => [:p, :q, :r])
-                @test_throws ArgumentError select(df, :a => (x -> res) => [:p])
-            end
-            for res in (DataFrame(x1=1, x2=2), [1 2], Tables.table([1 2], header=[:x1, :x2]),
-                        (x1=[1], x2=[2]))
-                @test combine(df, :a => (x -> res) => AsTable) == DataFrame(x1=1, x2=2)
-                @test combine(df, :a => (x -> res) => [:p, :q]) == DataFrame(p=1, q=2)
-                @test_throws ArgumentError combine(df, :a => (x -> res) => [:p])
-                @test_throws ArgumentError select(df, :a => (x -> res) => AsTable)
-                @test_throws ArgumentError transform(df, :a => (x -> res) => AsTable)
-            end
-            @test combine(df, :a => ByRow(x -> [x,x+1]),
-                          :a => ByRow(x -> [x, x+1]) => AsTable,
-                          :a => ByRow(x -> [x, x+1]) => [:p, :q],
-                          :a => ByRow(x -> (s=x, t=x+1)) => AsTable,
-                          :a => (x -> (k=x, l=x.+1)) => AsTable,
-                          :a => ByRow(x -> (s=x, t=x+1)) => :z) ==
-                  DataFrame(a_function=[[1, 2], [2, 3]], x1=[1, 2], x2=[2, 3],
-                            p=[1, 2], q=[2, 3], s=[1, 2], t=[2, 3], k=[1, 2], l=[2, 3],
-                            z=[(s=1, t=2), (s=2, t=3)])
-            @test select(df, :a => ByRow(x -> [x,x+1]),
-                         :a => ByRow(x -> [x, x+1]) => AsTable,
-                         :a => ByRow(x -> [x, x+1]) => [:p, :q],
-                         :a => ByRow(x -> (s=x, t=x+1)) => AsTable,
-                         :a => (x -> (k=x, l=x.+1)) => AsTable,
-                         :a => ByRow(x -> (s=x, t=x+1)) => :z) ==
-                  DataFrame(a_function=[[1, 2], [2, 3]], x1=[1, 2], x2=[2, 3],
-                            p=[1, 2], q=[2, 3], s=[1, 2], t=[2, 3], k=[1, 2], l=[2, 3],
-                            z=[(s=1, t=2), (s=2, t=3)])
-            @test transform(df, :a => ByRow(x -> [x,x+1]),
-                            :a => ByRow(x -> [x, x+1]) => AsTable,
-                            :a => ByRow(x -> [x, x+1]) => [:p, :q],
-                            :a => ByRow(x -> (s=x, t=x+1)) => AsTable,
-                            :a => (x -> (k=x, l=x.+1)) => AsTable,
-                            :a => ByRow(x -> (s=x, t=x+1)) => :z) ==
-                  [df DataFrame(a_function=[[1, 2], [2, 3]], x1=[1, 2], x2=[2, 3],
-                                p=[1, 2], q=[2, 3], s=[1, 2], t=[2, 3], k=[1, 2], l=[2, 3],
-                                z=[(s=1, t=2), (s=2, t=3)])]
-            @test_throws ArgumentError select(df, :a => (x -> [(a=1,b=2), (a=1, b=2, c=3)]) => AsTable)
-            @test_throws ArgumentError select(df, :a => (x -> [(a=1,b=2), (a=1, c=3)]) => AsTable)
-            @test_throws ArgumentError combine(df, :a => (x -> (a=1,b=2)) => :x)
-        end
+    df = DataFrame(a=1:2, b=3:4, c=5:6)
+    @test select!(sdf -> sdf.b, copy(df)) == DataFrame(x1=3:4)
+    @test select!(sdf -> (b = 2sdf.b,), copy(df)) == DataFrame(b=[6,8])
+    @test select!(sdf -> (b = 1,), copy(df)) == DataFrame(b=[1, 1])
+    @test_throws ArgumentError select!(sdf -> (b = [1],), copy(df))
+    @test select!(sdf -> (b = [1, 5],), copy(df)) == DataFrame(b=[1, 5])
+    @test select!(sdf -> 1, copy(df)) == DataFrame(x1=[1, 1])
+    @test select!(sdf -> fill([1]), copy(df)) == DataFrame(x1=[[1], [1]])
+    @test select!(sdf -> Ref([1]), copy(df)) == DataFrame(x1=[[1], [1]])
+    @test select!(sdf -> "x", copy(df)) == DataFrame(x1=["x", "x"])
+    @test select!(sdf -> [[1,2],[3,4]], copy(df)) == DataFrame(x1=[[1,2],[3,4]])
+    for ret in (DataFrame(), NamedTuple(), zeros(0,0), DataFrame(t=1)[1, 1:0])
+        @test select!(sdf -> ret, copy(df)) == DataFrame()
+    end
+    @test_throws ArgumentError select!(sdf -> DataFrame(a=10), copy(df))
+    @test_throws ArgumentError select!(sdf -> zeros(1, 2), copy(df))
+    @test select!(sdf -> DataFrame(a=[10, 11]), copy(df)) == DataFrame(a=[10, 11])
+    @test select!(sdf -> [10 11; 12 13], copy(df)) == DataFrame(x1=[10, 12], x2=[11, 13])
+    @test select!(sdf -> DataFrame(a=10)[1, :], copy(df)) == DataFrame(a=[10, 10])
+
+    @test transform!(sdf -> sdf.b, copy(df)) == [df DataFrame(x1=3:4)]
+    @test transform!(sdf -> (b = 2sdf.b,), copy(df)) == DataFrame(a=1:2, b=[6,8], c=5:6)
+    @test transform!(sdf -> (b = 1,), copy(df)) == DataFrame(a=[1,2], b=[1, 1], c=[5,6])
+    @test_throws ArgumentError transform!(sdf -> (b = [1],), copy(df))
+    @test transform!(sdf -> (b = [1, 5],), copy(df)) == DataFrame(a=[1,2], b=[1, 5], c=[5,6])
+    @test transform!(sdf -> 1, copy(df)) == DataFrame(a=1:2, b=3:4, c=5:6, x1=1)
+    @test transform!(sdf -> fill([1]), copy(df)) == DataFrame(a=1:2, b=3:4, c=5:6, x1=[[1],[1]])
+    @test transform!(sdf -> Ref([1]), copy(df)) == DataFrame(a=1:2, b=3:4, c=5:6, x1=[[1],[1]])
+    @test transform!(sdf -> "x", copy(df)) == DataFrame(a=1:2, b=3:4, c=5:6, x1="x")
+    @test transform!(sdf -> [[1,2],[3,4]], copy(df)) == DataFrame(a=1:2, b=3:4, c=5:6, x1=[[1,2],[3,4]])
+    for ret in (DataFrame(), NamedTuple(), zeros(0,0), DataFrame(t=1)[1, 1:0])
+        @test transform!(sdf -> ret, copy(df)) == df
     end
+    @test_throws ArgumentError transform!(sdf -> DataFrame(a=10), copy(df))
+    @test_throws ArgumentError transform!(sdf -> zeros(1, 2), copy(df))
+    @test transform!(sdf -> DataFrame(a=[10, 11]), copy(df)) == DataFrame(a=[10, 11], b=3:4, c=5:6)
+    @test transform!(sdf -> [10 11; 12 13], copy(df)) == DataFrame(a=1:2, b=3:4, c=5:6, x1=[10, 12], x2=[11, 13])
+    @test transform!(sdf -> DataFrame(a=10)[1, :], copy(df)) == DataFrame(a=[10, 10], b=3:4, c=5:6)
+
+    @test_throws ArgumentError combine(:x => identity, DataFrame(x=[1,2,3]))
+end
 
-    @testset "check correctness of duplicate column names" begin
-        for df in (DataFrame(a=1:2, b=3:4, c=5:6), view(DataFrame(a=1:3, b=3:5, c=5:7, d=11:13), 1:2, 1:3))
-            @test select(df, :b, :) == DataFrame(b=3:4, a=1:2, c=5:6)
-            @test select(df, :b => :c, :) == DataFrame(c=3:4, a=1:2, b=3:4)
-            @test_throws ArgumentError select(df, :b => [:c, :d], :)
-            @test_throws ArgumentError select(df, :a, :a => x -> (a=[1,2], b=[3,4]))
-            @test_throws ArgumentError select(df, :a, :a => (x -> (a=[1,2], b=[3,4])) => AsTable)
-            @test select(df, [:b, :a], :a => (x -> (a=[11,12], b=[13,14])) => AsTable, :) ==
-                  DataFrame(b=[13, 14], a=[11, 12], c=[5, 6])
-            @test select(df, [:b, :a], :a => (x -> (a=[11,12], b=[13,14])) => [:b, :a], :) ==
-                  DataFrame(b=[11, 12], a=[13, 14], c=[5, 6])
+@testset "transformation function with multiple columns as destination" begin
+    for df in (DataFrame(a=1:2, b=3:4, c=5:6), view(DataFrame(a=1:3, b=3:5, c=5:7, d=11:13), 1:2, 1:3))
+        for fun in (select, combine, transform),
+            res in (DataFrame(), DataFrame(a=1,b=2)[1, :], ones(1,1),
+                    (a=1,b=2), (a=[1], b=[2]), (a=1, b=[2]))
+            @test_throws ArgumentError fun(df, :a => x -> res)
+            @test_throws ArgumentError fun(df, :a => (x -> res) => :z)
+        end
+        for res in (DataFrame(x1=1, x2=2)[1, :], (x1=1,x2=2))
+            @test select(df, :a => (x -> res) => AsTable) == DataFrame(x1=[1,1], x2=[2,2])
+            @test transform(df, :a => (x -> res) => AsTable) == [df DataFrame(x1=[1,1], x2=[2,2])]
+            @test combine(df, :a => (x -> res) => AsTable) == DataFrame(x1=[1], x2=[2])
+            @test select(df, :a => (x -> res) => [:p, :q]) == DataFrame(p=[1,1], q=[2,2])
+            @test transform(df, :a => (x -> res) => [:p, :q]) == [df DataFrame(p=[1,1], q=[2,2])]
+            @test combine(df, :a => (x -> res) => [:p, :q]) == DataFrame(p=[1], q=[2])
+            @test_throws ArgumentError select(df, :a => (x -> res) => [:p, :q, :r])
+            @test_throws ArgumentError select(df, :a => (x -> res) => [:p])
         end
+        for res in (DataFrame(x1=1, x2=2), [1 2], Tables.table([1 2], header=[:x1, :x2]),
+                    (x1=[1], x2=[2]))
+            @test combine(df, :a => (x -> res) => AsTable) == DataFrame(x1=1, x2=2)
+            @test combine(df, :a => (x -> res) => [:p, :q]) == DataFrame(p=1, q=2)
+            @test_throws ArgumentError combine(df, :a => (x -> res) => [:p])
+            @test_throws ArgumentError select(df, :a => (x -> res) => AsTable)
+            @test_throws ArgumentError transform(df, :a => (x -> res) => AsTable)
+        end
+        @test combine(df, :a => ByRow(x -> [x,x+1]),
+                      :a => ByRow(x -> [x, x+1]) => AsTable,
+                      :a => ByRow(x -> [x, x+1]) => [:p, :q],
+                      :a => ByRow(x -> (s=x, t=x+1)) => AsTable,
+                      :a => (x -> (k=x, l=x.+1)) => AsTable,
+                      :a => ByRow(x -> (s=x, t=x+1)) => :z) ==
+              DataFrame(a_function=[[1, 2], [2, 3]], x1=[1, 2], x2=[2, 3],
+                        p=[1, 2], q=[2, 3], s=[1, 2], t=[2, 3], k=[1, 2], l=[2, 3],
+                        z=[(s=1, t=2), (s=2, t=3)])
+        @test select(df, :a => ByRow(x -> [x,x+1]),
+                     :a => ByRow(x -> [x, x+1]) => AsTable,
+                     :a => ByRow(x -> [x, x+1]) => [:p, :q],
+                     :a => ByRow(x -> (s=x, t=x+1)) => AsTable,
+                     :a => (x -> (k=x, l=x.+1)) => AsTable,
+                     :a => ByRow(x -> (s=x, t=x+1)) => :z) ==
+              DataFrame(a_function=[[1, 2], [2, 3]], x1=[1, 2], x2=[2, 3],
+                        p=[1, 2], q=[2, 3], s=[1, 2], t=[2, 3], k=[1, 2], l=[2, 3],
+                        z=[(s=1, t=2), (s=2, t=3)])
+        @test transform(df, :a => ByRow(x -> [x,x+1]),
+                        :a => ByRow(x -> [x, x+1]) => AsTable,
+                        :a => ByRow(x -> [x, x+1]) => [:p, :q],
+                        :a => ByRow(x -> (s=x, t=x+1)) => AsTable,
+                        :a => (x -> (k=x, l=x.+1)) => AsTable,
+                        :a => ByRow(x -> (s=x, t=x+1)) => :z) ==
+              [df DataFrame(a_function=[[1, 2], [2, 3]], x1=[1, 2], x2=[2, 3],
+                            p=[1, 2], q=[2, 3], s=[1, 2], t=[2, 3], k=[1, 2], l=[2, 3],
+                            z=[(s=1, t=2), (s=2, t=3)])]
+        @test_throws ArgumentError select(df, :a => (x -> [(a=1,b=2), (a=1, b=2, c=3)]) => AsTable)
+        @test_throws ArgumentError select(df, :a => (x -> [(a=1,b=2), (a=1, c=3)]) => AsTable)
+        @test_throws ArgumentError combine(df, :a => (x -> (a=1,b=2)) => :x)
+    end
+end
+
+@testset "check correctness of duplicate column names" begin
+    for df in (DataFrame(a=1:2, b=3:4, c=5:6), view(DataFrame(a=1:3, b=3:5, c=5:7, d=11:13), 1:2, 1:3))
+        @test select(df, :b, :) == DataFrame(b=3:4, a=1:2, c=5:6)
+        @test select(df, :b => :c, :) == DataFrame(c=3:4, a=1:2, b=3:4)
+        @test_throws ArgumentError select(df, :b => [:c, :d], :)
+        @test_throws ArgumentError select(df, :a, :a => x -> (a=[1,2], b=[3,4]))
+        @test_throws ArgumentError select(df, :a, :a => (x -> (a=[1,2], b=[3,4])) => AsTable)
+        @test select(df, [:b, :a], :a => (x -> (a=[11,12], b=[13,14])) => AsTable, :) ==
+              DataFrame(b=[13, 14], a=[11, 12], c=[5, 6])
+        @test select(df, [:b, :a], :a => (x -> (a=[11,12], b=[13,14])) => [:b, :a], :) ==
+              DataFrame(b=[11, 12], a=[13, 14], c=[5, 6])
     end
 end
 
diff --git a/test/string.jl b/test/string.jl
index ea2e9b222a..2fd8b98dcc 100644
--- a/test/string.jl
+++ b/test/string.jl
@@ -169,19 +169,16 @@ end
     @test combine(gdf, :a) == combine(gdf, "a") ==
           combine(gdf, [:a]) == combine(gdf, ["a"])
 
-    @test combine("a" => identity, gdf, ungroup=false) ==
-          combine(:a => identity, gdf, ungroup=false)
-    @test combine(["a"] => identity, gdf, ungroup=false) ==
-          combine([:a] => identity, gdf, ungroup=false)
-    @test combine(nrow => :n, gdf, ungroup=false) ==
-          combine(nrow => "n", gdf, ungroup=false)
-
-    @test combine("a" => identity, gdf) == combine(:a => identity, gdf) ==
-          combine(gdf, "a" => identity) == combine(gdf, :a => identity)
-    @test combine(["a"] => identity, gdf) == combine([:a] => identity, gdf) ==
-          combine(gdf, ["a"] => identity) == combine(gdf, [:a] => identity)
-    @test combine(nrow => :n, gdf) == combine(nrow => "n", gdf) ==
-          combine(gdf, nrow => :n) == combine(gdf, nrow => "n")
+    @test combine(gdf, "a" => identity, ungroup=false) ==
+          combine(gdf, :a => identity, ungroup=false)
+    @test combine(gdf, ["a"] => identity, ungroup=false) ==
+          combine(gdf, [:a] => identity, ungroup=false)
+    @test combine(gdf, nrow => :n, ungroup=false) ==
+          combine(gdf, nrow => "n", ungroup=false)
+
+    @test combine(gdf, "a" => identity) == combine(gdf, :a => identity)
+    @test combine(gdf, ["a"] => identity) == combine(gdf, [:a] => identity)
+    @test combine(gdf, nrow => :n) == combine(gdf, nrow => "n")
 end
 
 @testset "DataFrameRow" begin