Skip to content

Commit

Permalink
styling etc ...
Browse files Browse the repository at this point in the history
  • Loading branch information
romainfrancois committed Jan 3, 2019
1 parent f66fa80 commit 9e1897f
Show file tree
Hide file tree
Showing 10 changed files with 115 additions and 47 deletions.
1 change: 1 addition & 0 deletions r/DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,7 @@ Collate:
'memory_pool.R'
'message.R'
'on_exit.R'
'parquet.R'
'read_record_batch.R'
'read_table.R'
'reexports-bit64.R'
Expand Down
1 change: 1 addition & 0 deletions r/NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -113,6 +113,7 @@ export(print.integer64)
export(read_arrow)
export(read_feather)
export(read_message)
export(read_parquet)
export(read_record_batch)
export(read_schema)
export(read_table)
Expand Down
4 changes: 4 additions & 0 deletions r/R/RcppExports.R

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

27 changes: 27 additions & 0 deletions r/R/parquet.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

#' Read parquet file from disk
#'
#' @param files a vector of filenames
#'
#' @importFrom purrr map_dfr
#'
#' @export
read_parquet <- function(files) {
map_dfr(files, ~as_tibble(shared_ptr(`arrow::Table`, read_parquet_file(f))))
}
2 changes: 1 addition & 1 deletion r/README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ git clone https://github.com/apache/arrow.git
cd arrow/cpp && mkdir release && cd release

# It is important to statically link to boost libraries
cmake .. -DCMAKE_BUILD_TYPE=Release -DARROW_BOOST_USE_SHARED:BOOL=Off
cmake .. -DARROW_PARQUET=ON -DCMAKE_BUILD_TYPE=Release -DARROW_BOOST_USE_SHARED:BOOL=Off
make install
```

Expand Down
61 changes: 16 additions & 45 deletions r/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ git clone https://github.com/apache/arrow.git
cd arrow/cpp && mkdir release && cd release

# It is important to statically link to boost libraries
cmake .. -DCMAKE_BUILD_TYPE=Release -DARROW_BOOST_USE_SHARED:BOOL=Off
cmake .. -DARROW_PARQUET=ON -DCMAKE_BUILD_TYPE=Release -DARROW_BOOST_USE_SHARED:BOOL=Off
make install
```

Expand All @@ -38,48 +38,19 @@ tf <- tempfile()
#> # A tibble: 10 x 2
#> x y
#> <int> <dbl>
#> 1 1 -0.255
#> 2 2 -0.162
#> 3 3 -0.614
#> 4 4 -0.322
#> 5 5 0.0693
#> 6 6 -0.920
#> 7 7 -1.08
#> 8 8 0.658
#> 9 9 0.821
#> 10 10 0.539
arrow::write_arrow(tib, tf)

# read it back with pyarrow
pa <- import("pyarrow")
as_tibble(pa$open_file(tf)$read_pandas())
#> # A tibble: 10 x 2
#> x y
#> <int> <dbl>
#> 1 1 -0.255
#> 2 2 -0.162
#> 3 3 -0.614
#> 4 4 -0.322
#> 5 5 0.0693
#> 6 6 -0.920
#> 7 7 -1.08
#> 8 8 0.658
#> 9 9 0.821
#> 10 10 0.539
```

## Development

### Code style

We use Google C++ style in our C++ code. Check for style errors with

```
./lint.sh
```

You can fix the style issues with

#> 1 1 0.0855
#> 2 2 -1.68
#> 3 3 -0.0294
#> 4 4 -0.124
#> 5 5 0.0675
#> 6 6 1.64
#> 7 7 1.54
#> 8 8 -0.0209
#> 9 9 -0.982
#> 10 10 0.349
# arrow::write_arrow(tib, tf)

# # read it back with pyarrow
# pa <- import("pyarrow")
# as_tibble(pa$open_file(tf)$read_pandas())
```
./lint.sh --fix
```
2 changes: 1 addition & 1 deletion r/configure
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@
# R CMD INSTALL --configure-vars='INCLUDE_DIR=/.../include LIB_DIR=/.../lib'

# Library settings
PKG_CONFIG_NAME="arrow"
PKG_CONFIG_NAME="arrow parquet"
PKG_DEB_NAME="arrow"
PKG_RPM_NAME="arrow"
PKG_CSW_NAME="arrow"
Expand Down
14 changes: 14 additions & 0 deletions r/man/read_parquet.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

12 changes: 12 additions & 0 deletions r/src/RcppExports.cpp

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

38 changes: 38 additions & 0 deletions r/src/parquetfilereader.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
// // Licensed to the Apache Software Foundation (ASF) under one
// // or more contributor license agreements. See the NOTICE file
// // distributed with this work for additional information
// // regarding copyright ownership. The ASF licenses this file
// // to you under the Apache License, Version 2.0 (the
// // "License"); you may not use this file except in compliance
// // with the License. You may obtain a copy of the License at
// //
// // http://www.apache.org/licenses/LICENSE-2.0
// //
// // Unless required by applicable law or agreed to in writing,
// // software distributed under the License is distributed on an
// // "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// // KIND, either express or implied. See the License for the
// // specific language governing permissions and limitations
// // under the License.
//
//
#include <arrow/api.h>
#include <arrow/io/api.h>
#include <parquet/arrow/reader.h>
#include <parquet/arrow/writer.h>
#include <parquet/exception.h>

// [[Rcpp::export]]
std::shared_ptr<arrow::Table> read_parquet_file(std::string filename) {
std::shared_ptr<arrow::io::ReadableFile> infile;
PARQUET_THROW_NOT_OK(
arrow::io::ReadableFile::Open(filename, arrow::default_memory_pool(), &infile));

std::unique_ptr<parquet::arrow::FileReader> reader;
PARQUET_THROW_NOT_OK(
parquet::arrow::OpenFile(infile, arrow::default_memory_pool(), &reader));
std::shared_ptr<arrow::Table> table;
PARQUET_THROW_NOT_OK(reader->ReadTable(&table));

return table;
}

0 comments on commit 9e1897f

Please sign in to comment.