Skip to content

Commit

Permalink
Create release version 1.1.0 (#22)
Browse files Browse the repository at this point in the history
  • Loading branch information
bichht0608 authored Dec 20, 2023
1 parent c263a28 commit 50f3e02
Show file tree
Hide file tree
Showing 306 changed files with 78,858 additions and 91,417 deletions.
21 changes: 21 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
ARG PG_MAJOR
FROM postgres:${PG_MAJOR}-bullseye

ENV LANG=C.UTF-8 PGDATA=/pgdata

COPY . /src
WORKDIR /src

# Prepare the environment
RUN apt-get update && \
apt-get install -y gcc make g++ postgresql-server-dev-${PG_MAJOR} && \
mkdir ${PGDATA} && \
chown postgres:postgres ${PGDATA} && \
chown -R postgres:postgres /src && \
chmod a+rwx -R /usr/share/postgresql/$PG_MAJOR/extension && \
chmod a+rwx -R /usr/lib/postgresql/$PG_MAJOR/lib && \
bash /src/install_arrow.sh

USER postgres

ENTRYPOINT PGDATA=${PGDATA} bash /src/test/run_tests.sh
12 changes: 8 additions & 4 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -12,10 +12,10 @@ SHLIB_LINK += -laws-cpp-sdk-core -laws-cpp-sdk-s3
EXTENSION = parquet_s3_fdw
DATA = parquet_s3_fdw--0.1.sql parquet_s3_fdw--0.1--0.2.sql parquet_s3_fdw--0.2--0.3.sql parquet_s3_fdw--0.3.sql

REGRESS = import_local import_server parquet_s3_fdw_local parquet_s3_fdw_server parquet_s3_fdw_post_local parquet_s3_fdw_post_server parquet_s3_fdw2 parquet_s3_fdw_modify_local parquet_s3_fdw_modify_server schemaless/schemaless_local schemaless/schemaless_server schemaless/import_local schemaless/import_server schemaless/parquet_s3_fdw_local schemaless/parquet_s3_fdw_server schemaless/parquet_s3_fdw_post_local schemaless/parquet_s3_fdw_post_server schemaless/parquet_s3_fdw2 schemaless/parquet_s3_fdw_modify_local schemaless/parquet_s3_fdw_modify_server
REGRESS = import_local import_server parquet_s3_fdw_local parquet_s3_fdw_server parquet_s3_fdw_post_local parquet_s3_fdw_post_server parquet_s3_fdw2 parquet_s3_fdw_modify_local parquet_s3_fdw_modify_server partition_local partition_server schemaless/schemaless_local schemaless/schemaless_server schemaless/import_local schemaless/import_server schemaless/parquet_s3_fdw_local schemaless/parquet_s3_fdw_server schemaless/parquet_s3_fdw_post_local schemaless/parquet_s3_fdw_post_server schemaless/parquet_s3_fdw2 schemaless/parquet_s3_fdw_modify_local schemaless/parquet_s3_fdw_modify_server schemaless/partition_local schemaless/partition_server

# parquet_impl.cpp requires C++ 11.
override PG_CXXFLAGS += -std=c++11 -O3
# parquet_impl.cpp requires C++ 11 and libarrow 10+ requires C++ 17
override PG_CXXFLAGS += -std=c++17 -O3

# pass CCFLAGS (when defined) to both C and C++ compilers.
ifdef CCFLAGS
Expand All @@ -39,7 +39,11 @@ top_builddir = ../..

# PostgreSQL uses link time optimization option which may break compilation
# (this happens on travis-ci). Redefine COMPILE.cxx.bc without this option.
COMPILE.cxx.bc = $(CLANG) -xc++ -Wno-ignored-attributes $(BITCODE_CXXFLAGS) $(CPPFLAGS) -emit-llvm -c
#
# We need to use -Wno-register since C++17 raises an error if "register" keyword
# is used. PostgreSQL headers still uses the keyword, particularly:
# src/include/storage/s_lock.h.
COMPILE.cxx.bc = $(CLANG) -xc++ -Wno-ignored-attributes -Wno-register $(BITCODE_CXXFLAGS) $(CPPFLAGS) -emit-llvm -c

include $(top_builddir)/src/Makefile.global
include $(top_srcdir)/contrib/contrib-global.mk
Expand Down
21 changes: 15 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,21 +1,30 @@
# Parquet S3 Foreign Data Wrapper for PostgreSQL

This PostgreSQL extension is a Foreign Data Wrapper (FDW) for accessing Parquet file on local file system and [Amazon S3][2].
This version of parquet_s3_fdw can work for PostgreSQL 13, 14 and 15.
This version of parquet_s3_fdw can work for PostgreSQL 13, 14, 15 and 16.0.

Read-only Apache Parquet foreign data wrapper supporting S3 access for PostgreSQL.


## Installation
### 1. Install dependent libraries
`parquet_s3_fdw` requires `libarrow` and `libparquet` installed in your system (requires version 0.15+, for previous versions use branch [arrow-0.14](https://github.com/adjust/parquet_fdw/tree/arrow-0.14)). Please refer to [building guide](https://github.com/apache/arrow/blob/master/docs/source/developers/cpp/building.rst).

`AWS SDK for C++ (libaws-cpp-sdk-core libaws-cpp-sdk-s3)` is also required (Confirmed version is 1.9.263).
### 1. Build requirements
* CMake 3.26.3+
* C++11 compiler
* libcurl-devel
* openssl-devel
* libuuid-devel
* pulseaudio-libs-devel
### 2. Install dependent libraries
* `libarrow` and `libparquet`: Confirmed version is 12.0.0 (required).
Please refer to [building guide](https://github.com/apache/arrow/blob/master/docs/source/developers/cpp/building.rst).

* `AWS SDK for C++ (libaws-cpp-sdk-core libaws-cpp-sdk-s3)`: Confirmed version is 1.11.91 (required).
Please refer to [bulding guide](https://docs.aws.amazon.com/sdk-for-cpp/v1/developer-guide/setup-linux.html)

Attention!
We reccomend to build `libarrow`, `libparquet` and `AWS SDK for C++` from the source code. We failed to link if using pre-compiled binaries because gcc version is different between arrow and AWS SDK.

### 2. Build and install parquet_s3_fdw
### 3. Build and install parquet_s3_fdw
```sh
make install
```
Expand Down
Binary file modified data/complex/example3.parquet
Binary file not shown.
Binary file added data/ddlcommand/ddlcommand4/tbl1.parquet
Binary file not shown.
Binary file added data/ddlcommand/ddlcommand5/tbl2.parquet
Binary file not shown.
Binary file added data/ddlcommand/ddlcommand7/tbl.parquet
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file added data/ddlcommand/ddlcommand8/tbl.parquet
Binary file not shown.
83 changes: 0 additions & 83 deletions data/delete_first_parquet_row.py

This file was deleted.

29 changes: 25 additions & 4 deletions data/generate.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq
from datetime import datetime, date
from datetime import datetime, date, timedelta

# example1.parquet file
df1 = pd.DataFrame({'one': [1, 2, 3],
Expand All @@ -24,9 +24,9 @@
df2 = pd.DataFrame({'one': [4, 5, 6],
'two': [[10, 11, 12], [13, 14, 15], [16, 17, 18]],
'three': ['uno', 'dos', 'tres'],
'four': [datetime(2018, 1, 4),
datetime(2018, 1, 5),
datetime(2018, 1, 6)],
'four': [datetime(2018, 1, 4) + timedelta(seconds=10),
datetime(2018, 1, 5) + timedelta(milliseconds=10),
datetime(2018, 1, 6) + timedelta(microseconds=10)],
'five': [date(2018, 1, 4),
date(2018, 1, 5),
date(2018, 1, 6)],
Expand Down Expand Up @@ -88,6 +88,27 @@
with pq.ParquetWriter('complex/example3.parquet', table.schema) as writer:
writer.write_table(table)

# Parquet files for partitions
df_part1 = pd.DataFrame({'id': [1, 1, 2],
'date': [datetime(2018, 1, 1),
datetime(2018, 1, 2),
datetime(2018, 1, 3)],
'num': [10, 23, 9]})
table_part1 = pa.Table.from_pandas(df_part1)

with pq.ParquetWriter('partition/example_part1.parquet', table_part1.schema) as writer:
writer.write_table(table_part1)

df_part2 = pd.DataFrame({'id': [1, 2, 2],
'date': [datetime(2018, 2, 1),
datetime(2018, 2, 2),
datetime(2018, 2, 3)],
'num': [59, 1, 32]})
table_part2 = pa.Table.from_pandas(df_part2)

with pq.ParquetWriter('partition/example_part2.parquet', table_part2.schema) as writer:
writer.write_table(table_part2)

# example4.parquet file
mdt1 = pa.map_(pa.int32(), pa.string())
mdt2 = pa.list_(pa.int32())
Expand Down
Binary file added data/partition/example_part1.parquet
Binary file not shown.
Binary file added data/partition/example_part2.parquet
Binary file not shown.
Binary file modified data/simple/example1.parquet
Binary file not shown.
26 changes: 13 additions & 13 deletions expected/14.0/import_local.out → expected/13.12/import_local.out
Original file line number Diff line number Diff line change
Expand Up @@ -68,19 +68,19 @@ SELECT import_parquet_s3(

--Testcase 10:
SELECT * FROM example_import ORDER BY one, three;
one | two | three | four | five | six | seven
-----+------------+-------+---------------------+------------+-----+-------
1 | {19,20} | eins | 2018-01-01 00:00:00 | 2018-01-01 | t |
1 | {1,2,3} | foo | 2018-01-01 00:00:00 | 2018-01-01 | t | 0.5
2 | {NULL,5,6} | bar | 2018-01-02 00:00:00 | 2018-01-02 | f |
3 | {7,8,9} | baz | 2018-01-03 00:00:00 | 2018-01-03 | t | 1
3 | {21,22} | zwei | 2018-01-03 00:00:00 | 2018-01-03 | f |
4 | {10,11,12} | uno | 2018-01-04 00:00:00 | 2018-01-04 | f | 1.5
5 | {13,14,15} | dos | 2018-01-05 00:00:00 | 2018-01-05 | f |
5 | {23,24} | drei | 2018-01-05 00:00:00 | 2018-01-05 | t |
6 | {16,17,18} | tres | 2018-01-06 00:00:00 | 2018-01-06 | f | 2
7 | {25,26} | vier | 2018-01-07 00:00:00 | 2018-01-07 | f |
9 | {27,28} | fünf | 2018-01-09 00:00:00 | 2018-01-09 | t |
one | two | three | four | five | six | seven
-----+------------+-------+---------------------------+------------+-----+-------
1 | {19,20} | eins | 2018-01-01 00:00:00 | 2018-01-01 | t |
1 | {1,2,3} | foo | 2018-01-01 00:00:00 | 2018-01-01 | t | 0.5
2 | {NULL,5,6} | bar | 2018-01-02 00:00:00 | 2018-01-02 | f |
3 | {7,8,9} | baz | 2018-01-03 00:00:00 | 2018-01-03 | t | 1
3 | {21,22} | zwei | 2018-01-03 00:00:00 | 2018-01-03 | f |
4 | {10,11,12} | uno | 2018-01-04 00:00:10 | 2018-01-04 | f | 1.5
5 | {13,14,15} | dos | 2018-01-05 00:00:00.01 | 2018-01-05 | f |
5 | {23,24} | drei | 2018-01-05 00:00:00 | 2018-01-05 | t |
6 | {16,17,18} | tres | 2018-01-06 00:00:00.00001 | 2018-01-06 | f | 2
7 | {25,26} | vier | 2018-01-07 00:00:00 | 2018-01-07 | f |
9 | {27,28} | fünf | 2018-01-09 00:00:00 | 2018-01-09 | t |
(11 rows)

--Testcase 11:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -68,19 +68,19 @@ SELECT import_parquet_s3(

--Testcase 10:
SELECT * FROM example_import ORDER BY one, three;
one | two | three | four | five | six | seven
-----+------------+-------+---------------------+------------+-----+-------
1 | {19,20} | eins | 2018-01-01 00:00:00 | 2018-01-01 | t |
1 | {1,2,3} | foo | 2018-01-01 00:00:00 | 2018-01-01 | t | 0.5
2 | {NULL,5,6} | bar | 2018-01-02 00:00:00 | 2018-01-02 | f |
3 | {7,8,9} | baz | 2018-01-03 00:00:00 | 2018-01-03 | t | 1
3 | {21,22} | zwei | 2018-01-03 00:00:00 | 2018-01-03 | f |
4 | {10,11,12} | uno | 2018-01-04 00:00:00 | 2018-01-04 | f | 1.5
5 | {13,14,15} | dos | 2018-01-05 00:00:00 | 2018-01-05 | f |
5 | {23,24} | drei | 2018-01-05 00:00:00 | 2018-01-05 | t |
6 | {16,17,18} | tres | 2018-01-06 00:00:00 | 2018-01-06 | f | 2
7 | {25,26} | vier | 2018-01-07 00:00:00 | 2018-01-07 | f |
9 | {27,28} | fünf | 2018-01-09 00:00:00 | 2018-01-09 | t |
one | two | three | four | five | six | seven
-----+------------+-------+---------------------------+------------+-----+-------
1 | {19,20} | eins | 2018-01-01 00:00:00 | 2018-01-01 | t |
1 | {1,2,3} | foo | 2018-01-01 00:00:00 | 2018-01-01 | t | 0.5
2 | {NULL,5,6} | bar | 2018-01-02 00:00:00 | 2018-01-02 | f |
3 | {7,8,9} | baz | 2018-01-03 00:00:00 | 2018-01-03 | t | 1
3 | {21,22} | zwei | 2018-01-03 00:00:00 | 2018-01-03 | f |
4 | {10,11,12} | uno | 2018-01-04 00:00:10 | 2018-01-04 | f | 1.5
5 | {13,14,15} | dos | 2018-01-05 00:00:00.01 | 2018-01-05 | f |
5 | {23,24} | drei | 2018-01-05 00:00:00 | 2018-01-05 | t |
6 | {16,17,18} | tres | 2018-01-06 00:00:00.00001 | 2018-01-06 | f | 2
7 | {25,26} | vier | 2018-01-07 00:00:00 | 2018-01-07 | f |
9 | {27,28} | fünf | 2018-01-09 00:00:00 | 2018-01-09 | t |
(11 rows)

--Testcase 11:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -148,7 +148,7 @@ SELECT * FROM dir11;
CREATE FOREIGN TABLE dummyfile (timestamp timestamp, col1 text, col2 bigint, col3 double precision) SERVER parquet_s3_srv OPTIONS (filename 's3://test-bucket/dummy-file.parquet');
--Testcase 21:
SELECT * FROM dummyfile;
ERROR: parquet_s3_fdw: failed to exctract row groups from Parquet file: failed to open Parquet file HeadObject failed
ERROR: parquet_s3_fdw: failed to extract row groups from Parquet file: failed to open Parquet file HeadObject failed ('s3://test-bucket/dummy-file.parquet')
-- Bucket does not exist
--Testcase 22:
CREATE FOREIGN TABLE dummybucket (timestamp timestamp, col1 text, col2 bigint, col3 double precision) SERVER parquet_s3_srv OPTIONS (dirname 's3://dummy-bucket');
Expand Down
Loading

0 comments on commit 50f3e02

Please sign in to comment.