🎉 New destination: S3 #3672

tuliren · 2021-05-27T20:02:32Z

What

This PR addresses S3 as destination #573.
The MVP support CSV format. More formats (e.g. Apache Parquet) will be added soon.

How

This destination is similar to S3StreamCopier, but without DB operations.

Pre-merge Checklist

Run integration tests
Publish Docker images

Recommended reading order

destination-s3/spec.json
S3Destination.java
S3Consumer.java
S3OutputFormatter.java
S3CsvOutputFormatter.java
The rest

sherifnada

having looked into StreamTransferManager I think it's unlikely that it will do the kind of partitioning we want (uploading a stream of data into multiple files). It seems that it's trying to solve the use case of uploading a file of unknown size to S3. The way it happens to do that is to split it up into multiple files of known size in memory, uploading them, then concatenating them via a native S3 operation: MultipartUploads.

Given this, I think we have a few options:

don't partition output within a stream
partition by counting how many bytes have been been uploaded on the current upload manager. Once we exceed the desired size, complete that manager's work, then create a new manager to upload a new part file. Repeat this process until done.
Load a single file then split after uploading. I don't like this approach because it has many moving parts, plus not all file formats are easilysplittable (CSV or JSON for example) -- it's possible to split those by reading N lines and counting bytes then saving that offset but it feels similar to approach Singer Postgres --> Postgres replication demo #2.

I suspect we should do approach 1 then 2 in a follow up release. I think this sequencing does not introduce any backwards incompatibilities and is congruent with an incremental value delivering approach. WDYT?

...e-java/src/main/java/io/airbyte/integrations/base/FailureTrackingAirbyteMessageConsumer.java

airbyte-integrations/connectors/destination-snowflake/src/main/resources/spec.json

airbyte-integrations/connectors/destination-s3/src/main/resources/spec.json

...ctors/destination-s3/src/main/java/io/airbyte/integrations/destination/s3/S3Destination.java

...nnectors/destination-s3/src/main/java/io/airbyte/integrations/destination/s3/S3Consumer.java

docs/integrations/destinations/s3.md

...onnectors/destination-s3/src/main/java/io/airbyte/integrations/destination/s3/S3Handler.java

...rs/destination-s3/src/main/java/io/airbyte/integrations/destination/s3/csv/S3CsvHandler.java

sherifnada · 2021-05-27T22:59:28Z

...nnectors/destination-s3/src/main/java/io/airbyte/integrations/destination/s3/S3Consumer.java

+import java.util.Map;
+import java.util.UUID;
+
+public class S3Consumer extends FailureTrackingAirbyteMessageConsumer {


why is this a distinct class from the one in JDBC? should it reuse that class?

The JDBC one has lots of database operations in it. I tried to reuse that one at the beginning of last week, but it was unnecessarily complicated. So I decided to create a separate one that only deals with S3 logic.

tuliren · 2021-06-01T23:56:23Z

/test connector=connectors/destination-s3

🕑 connectors/destination-s3 https://github.com/airbytehq/airbyte/actions/runs/897761847
✅ connectors/destination-s3 https://github.com/airbytehq/airbyte/actions/runs/897761847

tuliren · 2021-06-02T00:32:54Z

/publish connector=connectors/destination-s3

🕑 connectors/destination-s3 https://github.com/airbytehq/airbyte/actions/runs/897829029
❌ connectors/destination-s3 https://github.com/airbytehq/airbyte/actions/runs/897829029

sherifnada

few comments but we are almost there!

...ation-s3/src/test/java/io/airbyte/integrations/destination/s3/csv/S3CsvFormatConfigTest.java

...ctors/destination-s3/src/main/java/io/airbyte/integrations/destination/s3/S3Destination.java

...nation-s3/src/main/java/io/airbyte/integrations/destination/s3/csv/S3CsvOutputFormatter.java

...tination-s3/src/main/java/io/airbyte/integrations/destination/s3/S3DestinationConstants.java

...nation-s3/src/main/java/io/airbyte/integrations/destination/s3/csv/S3CsvOutputFormatter.java

...on-s3/src/test/java/io/airbyte/integrations/destination/s3/csv/S3CsvOutputFormatterTest.java

...nation-s3/src/main/java/io/airbyte/integrations/destination/s3/csv/S3CsvOutputFormatter.java

They belong to another PR.

tuliren · 2021-06-02T10:10:10Z

/test connector=connectors/destination-s3

🕑 connectors/destination-s3 https://github.com/airbytehq/airbyte/actions/runs/898964579
✅ connectors/destination-s3 https://github.com/airbytehq/airbyte/actions/runs/898964579

tuliren · 2021-06-03T16:33:59Z

/publish connector=connectors/destination-s3

🕑 connectors/destination-s3 https://github.com/airbytehq/airbyte/actions/runs/903551907
✅ connectors/destination-s3 https://github.com/airbytehq/airbyte/actions/runs/903551907

tuliren added 6 commits May 25, 2021 13:29

Update README icon links

1728c21

Update airbyte-specification doc

88044a5

Extend base connector

b6084bf

Remove redundant region

0818163

Separate warning from info

6f4aa9c

Implement s3 destination

fdaf213

tuliren requested a review from sherifnada May 27, 2021 20:13

tuliren self-assigned this May 27, 2021

tuliren linked an issue May 27, 2021 that may be closed by this pull request

S3 as destination #573

Closed

Run format

2d90864

sherifnada suggested changes May 27, 2021

View reviewed changes

tuliren added 10 commits May 29, 2021 16:31

Clarify logging message

dc1220e

Rename variables and functions

f9d07a0

Update documentation

df040f7

Rename and annotate interface

8ec265a

Inject formatter factory

0048bf7

Remove part size

dffae5a

Fix spec field names and add unit tests

b026598

Add unit tests for csv output formatter

fc3bba7

Format code

4efa629

Complete acceptance test and fix bugs

a6e6856

tuliren changed the title ~~Implement s3 destination (csv only mvp)~~ 🎉 New destination: S3 Jun 1, 2021

tuliren marked this pull request as ready for review June 1, 2021 23:57

auto-assign bot requested review from cgardens and davinchia June 1, 2021 23:57

tuliren requested a review from sherifnada June 1, 2021 23:57

sherifnada requested a review from subodh1810 June 2, 2021 00:05

Fix uuid

9ac8fcc

Merge branch 'master' into liren/s3-destination-mvp

d3949f0

sherifnada suggested changes Jun 2, 2021

View reviewed changes

Remove generator template files

d840504

They belong to another PR.

michel-tricot mentioned this pull request Jun 2, 2021

File system abstraction for source/destination #3736

Closed

tuliren added 7 commits June 1, 2021 22:31

Add unhappy test case

dbb2f79

Checkin airbyte state message

434c8a4

Adjust stream transfer manager parameters

edbcfec

Use underscore in filename

8f940a3

Create csv sheet generator to handle data processing

6cb86c0

Format code

60b252a

Add partition id to filename

5373cd7

tuliren requested a review from sherifnada June 2, 2021 10:10

tuliren mentioned this pull request Jun 2, 2021

🎉 Add Java destination generator #3820

Merged

1 task

Rename date format variable

caa44f4

sherifnada approved these changes Jun 3, 2021

View reviewed changes

tuliren merged commit c13b988 into master Jun 3, 2021

tuliren deleted the liren/s3-destination-mvp branch June 3, 2021 16:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🎉 New destination: S3 #3672

🎉 New destination: S3 #3672

tuliren commented May 27, 2021 •

edited

Loading

sherifnada left a comment

sherifnada May 27, 2021

tuliren May 31, 2021

tuliren commented Jun 1, 2021 •

edited by github-actions bot

Loading

tuliren commented Jun 2, 2021 •

edited by github-actions bot

Loading

sherifnada left a comment

tuliren commented Jun 2, 2021 •

edited by github-actions bot

Loading

tuliren commented Jun 3, 2021 •

edited by github-actions bot

Loading

🎉 New destination: S3 #3672

🎉 New destination: S3 #3672

Conversation

tuliren commented May 27, 2021 • edited Loading

What

How

Pre-merge Checklist

Recommended reading order

sherifnada left a comment

Choose a reason for hiding this comment

sherifnada May 27, 2021

Choose a reason for hiding this comment

tuliren May 31, 2021

Choose a reason for hiding this comment

tuliren commented Jun 1, 2021 • edited by github-actions bot Loading

tuliren commented Jun 2, 2021 • edited by github-actions bot Loading

sherifnada left a comment

Choose a reason for hiding this comment

tuliren commented Jun 2, 2021 • edited by github-actions bot Loading

tuliren commented Jun 3, 2021 • edited by github-actions bot Loading

tuliren commented May 27, 2021 •

edited

Loading

tuliren commented Jun 1, 2021 •

edited by github-actions bot

Loading

tuliren commented Jun 2, 2021 •

edited by github-actions bot

Loading

tuliren commented Jun 2, 2021 •

edited by github-actions bot

Loading

tuliren commented Jun 3, 2021 •

edited by github-actions bot

Loading