Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🎉 Source S3 - memory & performance optimisations + advanced CSV options #6615

Merged
merged 14 commits into from
Oct 19, 2021

Conversation

Phlair
Copy link
Contributor

@Phlair Phlair commented Oct 1, 2021

What

closes #6606

I'm fairly confident I've been able to make some small but impactful improvements here:

  • significantly reduced the memory footprint of the connector, meaning we shouldn't see that memory issue in the linked issue until the bucket is far more populated with files.
  • made incremental reads only parse schemas of new files rather than all files, which should massively improve runtimes when number of files in bucket is anything non-trivial.

How

  • Changing our time-ordered iterator to hold just filepath strings rather than StorageFile objects, therefore consuming much less memory per file in the bucket (saw about 4-5x reduction in my testing).
  • Adding a min_datetime parameter to _get_master_schema() so we can pass in the state on incremental runs and therefore only use new files to create our master schema (previously using all of them). Since the schema will be saved in state from a previous run, the only time we run the schema inference will be during read where we can pass in the state.

Additionally

@github-actions github-actions bot added the area/connectors Connector related issues label Oct 1, 2021
@Phlair Phlair self-assigned this Oct 1, 2021
@Phlair Phlair requested a review from antixar October 1, 2021 15:21
@Phlair
Copy link
Contributor Author

Phlair commented Oct 1, 2021

/test connector=connectors/source-s3

🕑 connectors/source-s3 https://github.com/airbytehq/airbyte/actions/runs/1295630422
✅ connectors/source-s3 https://github.com/airbytehq/airbyte/actions/runs/1295630422
Python tests coverage:

	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                                                 Stmts   Miss  Cover
	 ------------------------------------------------------------------------
	 source_acceptance_test/__init__.py                       2      0   100%
	 source_acceptance_test/base.py                          10      4    60%
	 source_acceptance_test/config.py                        74      8    89%
	 source_acceptance_test/conftest.py                     108    108     0%
	 source_acceptance_test/plugin.py                        47     47     0%
	 source_acceptance_test/tests/__init__.py                 4      0   100%
	 source_acceptance_test/tests/test_core.py              200     94    53%
	 source_acceptance_test/tests/test_full_refresh.py       18     11    39%
	 source_acceptance_test/tests/test_incremental.py        69     38    45%
	 source_acceptance_test/utils/__init__.py                 6      0   100%
	 source_acceptance_test/utils/asserts.py                 37      2    95%
	 source_acceptance_test/utils/common.py                  41     24    41%
	 source_acceptance_test/utils/compare.py                 47     20    57%
	 source_acceptance_test/utils/connector_runner.py        82     49    40%
	 source_acceptance_test/utils/json_schema_helper.py     111     11    90%
	 ------------------------------------------------------------------------
	 TOTAL                                                  856    416    51%
	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                                       Stmts   Miss  Cover
	 --------------------------------------------------------------
	 base_python/__init__.py                       13      0   100%
	 base_python/catalog_helpers.py                10      6    40%
	 base_python/cdk/__init__.py                    0      0   100%
	 base_python/cdk/abstract_source.py            83     59    29%
	 base_python/cdk/streams/__init__.py            0      0   100%
	 base_python/cdk/streams/auth/__init__.py       0      0   100%
	 base_python/cdk/streams/auth/core.py           8      1    88%
	 base_python/cdk/streams/auth/jwt.py            5      5     0%
	 base_python/cdk/streams/auth/oauth.py         37     26    30%
	 base_python/cdk/streams/auth/token.py          9      4    56%
	 base_python/cdk/streams/core.py               63     32    49%
	 base_python/cdk/streams/exceptions.py         10      2    80%
	 base_python/cdk/streams/http.py               67     33    51%
	 base_python/cdk/streams/rate_limiting.py      30     14    53%
	 base_python/cdk/utils/__init__.py              0      0   100%
	 base_python/cdk/utils/casing.py                4      0   100%
	 base_python/client.py                         56     33    41%
	 base_python/entrypoint.py                     70     56    20%
	 base_python/integration.py                    52     25    52%
	 base_python/logger.py                         33     19    42%
	 base_python/schema_helpers.py                 56     41    27%
	 base_python/source.py                         51     34    33%
	 main_dev.py                                    3      3     0%
	 --------------------------------------------------------------
	 TOTAL                                        660    393    40%
	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                                                              Stmts   Miss  Cover
	 -------------------------------------------------------------------------------------
	 source_s3/__init__.py                                                 2      0   100%
	 source_s3/s3_utils.py                                                20      3    85%
	 source_s3/s3file.py                                                  49      3    94%
	 source_s3/source.py                                                  23      0   100%
	 source_s3/source_files_abstract/__init__.py                           0      0   100%
	 source_s3/source_files_abstract/formats/abstract_file_parser.py      37      2    95%
	 source_s3/source_files_abstract/formats/csv_parser.py                71     20    72%
	 source_s3/source_files_abstract/formats/csv_spec.py                  14      0   100%
	 source_s3/source_files_abstract/formats/parquet_parser.py            41     28    32%
	 source_s3/source_files_abstract/formats/parquet_spec.py               9      0   100%
	 source_s3/source_files_abstract/source.py                            40     18    55%
	 source_s3/source_files_abstract/spec.py                              42     22    48%
	 source_s3/source_files_abstract/storagefile.py                       16      0   100%
	 source_s3/source_files_abstract/stream.py                           197     11    94%
	 source_s3/stream.py                                                  43      3    93%
	 -------------------------------------------------------------------------------------
	 TOTAL                                                               604    110    82%
	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                                                              Stmts   Miss  Cover
	 -------------------------------------------------------------------------------------
	 source_s3/__init__.py                                                 2      0   100%
	 source_s3/s3_utils.py                                                20     13    35%
	 source_s3/s3file.py                                                  49     26    47%
	 source_s3/source.py                                                  23      0   100%
	 source_s3/source_files_abstract/__init__.py                           0      0   100%
	 source_s3/source_files_abstract/formats/abstract_file_parser.py      37      0   100%
	 source_s3/source_files_abstract/formats/csv_parser.py                71     19    73%
	 source_s3/source_files_abstract/formats/csv_spec.py                  14      0   100%
	 source_s3/source_files_abstract/formats/parquet_parser.py            41      0   100%
	 source_s3/source_files_abstract/formats/parquet_spec.py               9      0   100%
	 source_s3/source_files_abstract/source.py                            40     18    55%
	 source_s3/source_files_abstract/spec.py                              42     22    48%
	 source_s3/source_files_abstract/storagefile.py                       16      3    81%
	 source_s3/source_files_abstract/stream.py                           197    106    46%
	 source_s3/stream.py                                                  43     31    28%
	 -------------------------------------------------------------------------------------
	 TOTAL                                                               604    238    61%

@jrhizor jrhizor temporarily deployed to more-secrets October 1, 2021 15:25 Inactive
Copy link
Contributor

@antixar antixar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the main reason of performance issues is loading of file objects for filtering/sorting( by last_modified).

    def last_modified(self) -> datetime:
        """
        Using decorator set up boto3 session & s3 resource.
        Note: slight nuance for grabbing this when we have no credentials.

        :return: last_modified property of the blob/file
        """
        bucket = self._provider.get("bucket")
        try:
            obj = self._boto_s3_resource.Object(bucket, self.url)
            return obj.last_modified

And thus your code tries to download all files (including not relevant for incremental mode).
I propose to move this filtering/sorting logic to the bucket listing function:

   def _list_bucket(self, accept_key=lambda k: True) -> Iterator[str]:
       ....
                content = response["Contents"]
                # =======
                # content["LastModified"]
                # =======
                raise Exception(content)
            except KeyError:
                pass
       ....

This function should return relevant sorted filepath values only.

@Phlair
Copy link
Contributor Author

Phlair commented Oct 15, 2021

I propose to move this filtering/sorting logic to the bucket listing function:

I like the thinking but few reasons not to make that change:

  • the filter/sort logic is currently at the abstract level, meaning adding new sources like GCS / Azure is easier and they don't have to implement that in each one.
  • we don't have the state available there (timestamp to filter on) without some sizeable refactoring.
  • We're only using generators until we do the sort, but the eventual memory usage correlating to number of files seems unavoidable to me (without getting into overkill sorting solutions).

I've greatly reduced mem usage (4-5x) by storing just the filepaths rather than objects, this seemed like the problem there.

@Phlair
Copy link
Contributor Author

Phlair commented Oct 15, 2021

/test connector=connectors/source-s3

🕑 connectors/source-s3 https://github.com/airbytehq/airbyte/actions/runs/1346062203
✅ connectors/source-s3 https://github.com/airbytehq/airbyte/actions/runs/1346062203
Python tests coverage:

	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                                                 Stmts   Miss  Cover
	 ------------------------------------------------------------------------
	 source_acceptance_test/__init__.py                       2      0   100%
	 source_acceptance_test/base.py                          10      4    60%
	 source_acceptance_test/config.py                        74      8    89%
	 source_acceptance_test/conftest.py                     108    108     0%
	 source_acceptance_test/plugin.py                        47     47     0%
	 source_acceptance_test/tests/__init__.py                 4      0   100%
	 source_acceptance_test/tests/test_core.py              200     94    53%
	 source_acceptance_test/tests/test_full_refresh.py       18     11    39%
	 source_acceptance_test/tests/test_incremental.py        69     38    45%
	 source_acceptance_test/utils/__init__.py                 6      0   100%
	 source_acceptance_test/utils/asserts.py                 37      2    95%
	 source_acceptance_test/utils/common.py                  41     24    41%
	 source_acceptance_test/utils/compare.py                 47     20    57%
	 source_acceptance_test/utils/connector_runner.py        82     49    40%
	 source_acceptance_test/utils/json_schema_helper.py     115     14    88%
	 ------------------------------------------------------------------------
	 TOTAL                                                  860    419    51%
	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                                       Stmts   Miss  Cover
	 --------------------------------------------------------------
	 base_python/__init__.py                       13      0   100%
	 base_python/catalog_helpers.py                10      6    40%
	 base_python/cdk/__init__.py                    0      0   100%
	 base_python/cdk/abstract_source.py            83     59    29%
	 base_python/cdk/streams/__init__.py            0      0   100%
	 base_python/cdk/streams/auth/__init__.py       0      0   100%
	 base_python/cdk/streams/auth/core.py           8      1    88%
	 base_python/cdk/streams/auth/jwt.py            5      5     0%
	 base_python/cdk/streams/auth/oauth.py         37     26    30%
	 base_python/cdk/streams/auth/token.py          9      4    56%
	 base_python/cdk/streams/core.py               63     32    49%
	 base_python/cdk/streams/exceptions.py         10      2    80%
	 base_python/cdk/streams/http.py               67     33    51%
	 base_python/cdk/streams/rate_limiting.py      30     14    53%
	 base_python/cdk/utils/__init__.py              0      0   100%
	 base_python/cdk/utils/casing.py                4      0   100%
	 base_python/client.py                         56     33    41%
	 base_python/entrypoint.py                     70     56    20%
	 base_python/integration.py                    52     25    52%
	 base_python/logger.py                         33     19    42%
	 base_python/schema_helpers.py                 56     41    27%
	 base_python/source.py                         51     34    33%
	 main_dev.py                                    3      3     0%
	 --------------------------------------------------------------
	 TOTAL                                        660    393    40%
	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                                                              Stmts   Miss  Cover
	 -------------------------------------------------------------------------------------
	 source_s3/__init__.py                                                 2      0   100%
	 source_s3/s3_utils.py                                                20      3    85%
	 source_s3/s3file.py                                                  49      3    94%
	 source_s3/source.py                                                  23      0   100%
	 source_s3/source_files_abstract/__init__.py                           0      0   100%
	 source_s3/source_files_abstract/formats/abstract_file_parser.py      37      2    95%
	 source_s3/source_files_abstract/formats/csv_parser.py                71     20    72%
	 source_s3/source_files_abstract/formats/csv_spec.py                  14      0   100%
	 source_s3/source_files_abstract/formats/parquet_parser.py            61     44    28%
	 source_s3/source_files_abstract/formats/parquet_spec.py               9      0   100%
	 source_s3/source_files_abstract/source.py                            40     18    55%
	 source_s3/source_files_abstract/spec.py                              42     22    48%
	 source_s3/source_files_abstract/storagefile.py                       16      0   100%
	 source_s3/source_files_abstract/stream.py                           195     10    95%
	 source_s3/stream.py                                                  43      3    93%
	 -------------------------------------------------------------------------------------
	 TOTAL                                                               622    125    80%
	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                                                              Stmts   Miss  Cover
	 -------------------------------------------------------------------------------------
	 source_s3/__init__.py                                                 2      0   100%
	 source_s3/s3_utils.py                                                20     13    35%
	 source_s3/s3file.py                                                  49     26    47%
	 source_s3/source.py                                                  23      0   100%
	 source_s3/source_files_abstract/__init__.py                           0      0   100%
	 source_s3/source_files_abstract/formats/abstract_file_parser.py      37      0   100%
	 source_s3/source_files_abstract/formats/csv_parser.py                71     19    73%
	 source_s3/source_files_abstract/formats/csv_spec.py                  14      0   100%
	 source_s3/source_files_abstract/formats/parquet_parser.py            61      3    95%
	 source_s3/source_files_abstract/formats/parquet_spec.py               9      0   100%
	 source_s3/source_files_abstract/source.py                            40     18    55%
	 source_s3/source_files_abstract/spec.py                              42     22    48%
	 source_s3/source_files_abstract/storagefile.py                       16      3    81%
	 source_s3/source_files_abstract/stream.py                           195    103    47%
	 source_s3/stream.py                                                  43     31    28%
	 -------------------------------------------------------------------------------------
	 TOTAL                                                               622    238    62%

@Phlair Phlair requested review from sherifnada and tuliren October 15, 2021 13:17
@Phlair Phlair temporarily deployed to more-secrets October 15, 2021 13:18 Inactive
@jrhizor jrhizor temporarily deployed to more-secrets October 15, 2021 13:18 Inactive
@github-actions github-actions bot added the area/documentation Improvements or additions to documentation label Oct 15, 2021
@Phlair Phlair temporarily deployed to more-secrets October 15, 2021 13:24 Inactive
jzhuan-icims and others added 5 commits October 15, 2021 10:46
…3-csv-advanced-options' into george/fix-s3-oom

# Conflicts:
#	airbyte-integrations/connectors/source-s3/setup.py
#	airbyte-integrations/connectors/source-s3/source_s3/source_files_abstract/stream.py
#	docs/integrations/sources/s3.md
@Phlair Phlair temporarily deployed to more-secrets October 15, 2021 16:30 Inactive
@Phlair
Copy link
Contributor Author

Phlair commented Oct 15, 2021

/test connector=connectors/source-s3

🕑 connectors/source-s3 https://github.com/airbytehq/airbyte/actions/runs/1346754425
✅ connectors/source-s3 https://github.com/airbytehq/airbyte/actions/runs/1346754425
Python tests coverage:

	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                                                 Stmts   Miss  Cover
	 ------------------------------------------------------------------------
	 source_acceptance_test/__init__.py                       2      0   100%
	 source_acceptance_test/base.py                          10      4    60%
	 source_acceptance_test/config.py                        74      8    89%
	 source_acceptance_test/conftest.py                     108    108     0%
	 source_acceptance_test/plugin.py                        47     47     0%
	 source_acceptance_test/tests/__init__.py                 4      0   100%
	 source_acceptance_test/tests/test_core.py              200     94    53%
	 source_acceptance_test/tests/test_full_refresh.py       18     11    39%
	 source_acceptance_test/tests/test_incremental.py        69     38    45%
	 source_acceptance_test/utils/__init__.py                 6      0   100%
	 source_acceptance_test/utils/asserts.py                 37      2    95%
	 source_acceptance_test/utils/common.py                  41     24    41%
	 source_acceptance_test/utils/compare.py                 47     20    57%
	 source_acceptance_test/utils/connector_runner.py        82     49    40%
	 source_acceptance_test/utils/json_schema_helper.py     115     14    88%
	 ------------------------------------------------------------------------
	 TOTAL                                                  860    419    51%
	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                                       Stmts   Miss  Cover
	 --------------------------------------------------------------
	 base_python/__init__.py                       13      0   100%
	 base_python/catalog_helpers.py                10      6    40%
	 base_python/cdk/__init__.py                    0      0   100%
	 base_python/cdk/abstract_source.py            83     59    29%
	 base_python/cdk/streams/__init__.py            0      0   100%
	 base_python/cdk/streams/auth/__init__.py       0      0   100%
	 base_python/cdk/streams/auth/core.py           8      1    88%
	 base_python/cdk/streams/auth/jwt.py            5      5     0%
	 base_python/cdk/streams/auth/oauth.py         37     26    30%
	 base_python/cdk/streams/auth/token.py          9      4    56%
	 base_python/cdk/streams/core.py               63     32    49%
	 base_python/cdk/streams/exceptions.py         10      2    80%
	 base_python/cdk/streams/http.py               67     33    51%
	 base_python/cdk/streams/rate_limiting.py      30     14    53%
	 base_python/cdk/utils/__init__.py              0      0   100%
	 base_python/cdk/utils/casing.py                4      0   100%
	 base_python/client.py                         56     33    41%
	 base_python/entrypoint.py                     70     56    20%
	 base_python/integration.py                    52     25    52%
	 base_python/logger.py                         33     19    42%
	 base_python/schema_helpers.py                 56     41    27%
	 base_python/source.py                         51     34    33%
	 main_dev.py                                    3      3     0%
	 --------------------------------------------------------------
	 TOTAL                                        660    393    40%
	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                                                              Stmts   Miss  Cover
	 -------------------------------------------------------------------------------------
	 source_s3/__init__.py                                                 2      0   100%
	 source_s3/s3_utils.py                                                20      3    85%
	 source_s3/s3file.py                                                  49      3    94%
	 source_s3/source.py                                                  23      0   100%
	 source_s3/source_files_abstract/__init__.py                           0      0   100%
	 source_s3/source_files_abstract/formats/abstract_file_parser.py      37      2    95%
	 source_s3/source_files_abstract/formats/csv_parser.py                71     20    72%
	 source_s3/source_files_abstract/formats/csv_spec.py                  15      0   100%
	 source_s3/source_files_abstract/formats/parquet_parser.py            61     44    28%
	 source_s3/source_files_abstract/formats/parquet_spec.py               9      0   100%
	 source_s3/source_files_abstract/source.py                            40     18    55%
	 source_s3/source_files_abstract/spec.py                              42     22    48%
	 source_s3/source_files_abstract/storagefile.py                       16      0   100%
	 source_s3/source_files_abstract/stream.py                           195     10    95%
	 source_s3/stream.py                                                  43      3    93%
	 -------------------------------------------------------------------------------------
	 TOTAL                                                               623    125    80%
	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                                                              Stmts   Miss  Cover
	 -------------------------------------------------------------------------------------
	 source_s3/__init__.py                                                 2      0   100%
	 source_s3/s3_utils.py                                                20     13    35%
	 source_s3/s3file.py                                                  49     26    47%
	 source_s3/source.py                                                  23      0   100%
	 source_s3/source_files_abstract/__init__.py                           0      0   100%
	 source_s3/source_files_abstract/formats/abstract_file_parser.py      37      0   100%
	 source_s3/source_files_abstract/formats/csv_parser.py                71     19    73%
	 source_s3/source_files_abstract/formats/csv_spec.py                  15      0   100%
	 source_s3/source_files_abstract/formats/parquet_parser.py            61      3    95%
	 source_s3/source_files_abstract/formats/parquet_spec.py               9      0   100%
	 source_s3/source_files_abstract/source.py                            40     18    55%
	 source_s3/source_files_abstract/spec.py                              42     22    48%
	 source_s3/source_files_abstract/storagefile.py                       16      3    81%
	 source_s3/source_files_abstract/stream.py                           195    103    47%
	 source_s3/stream.py                                                  43     31    28%
	 -------------------------------------------------------------------------------------
	 TOTAL                                                               623    238    62%

@jrhizor jrhizor temporarily deployed to more-secrets October 15, 2021 16:48 Inactive
Copy link
Contributor

@tuliren tuliren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid that this python code is too fancy for me to provide any useful review comments.

@Phlair Phlair temporarily deployed to more-secrets October 18, 2021 11:14 Inactive
# Conflicts:
#	airbyte-integrations/connectors/source-s3/source_s3/source_files_abstract/stream.py
@Phlair Phlair temporarily deployed to more-secrets October 19, 2021 10:58 Inactive
@Phlair Phlair changed the title 🐛 Source S3 - memory & performance optimisations 🐛 Source S3 - memory & performance optimisations + advanced CSV options Oct 19, 2021
@Phlair Phlair changed the title 🐛 Source S3 - memory & performance optimisations + advanced CSV options 🐛 Source S3 - memory & performance optimisations + advanced CSV options Oct 19, 2021
@Phlair Phlair changed the title 🐛 Source S3 - memory & performance optimisations + advanced CSV options 🎉 Source S3 - memory & performance optimisations + advanced CSV options Oct 19, 2021
@Phlair
Copy link
Contributor Author

Phlair commented Oct 19, 2021

/publish connector=connectors/source-s3

🕑 connectors/source-s3 https://github.com/airbytehq/airbyte/actions/runs/1358910144
❌ connectors/source-s3 https://github.com/airbytehq/airbyte/actions/runs/1358910144

@jrhizor jrhizor temporarily deployed to more-secrets October 19, 2021 11:32 Inactive
@Phlair
Copy link
Contributor Author

Phlair commented Oct 19, 2021

/publish connector=connectors/source-s3

🕑 connectors/source-s3 https://github.com/airbytehq/airbyte/actions/runs/1358926853
❌ connectors/source-s3 https://github.com/airbytehq/airbyte/actions/runs/1358926853

@jrhizor jrhizor temporarily deployed to more-secrets October 19, 2021 11:36 Inactive
@davinchia davinchia temporarily deployed to more-secrets October 19, 2021 12:38 Inactive
@Phlair
Copy link
Contributor Author

Phlair commented Oct 19, 2021

/publish connector=connectors/source-s3

🕑 connectors/source-s3 https://github.com/airbytehq/airbyte/actions/runs/1359321992
❌ connectors/source-s3 https://github.com/airbytehq/airbyte/actions/runs/1359321992

@jrhizor jrhizor temporarily deployed to more-secrets October 19, 2021 13:24 Inactive
@iamzjk
Copy link
Contributor

iamzjk commented Oct 19, 2021

/publish connector=connectors/source-s3

🕑 connectors/source-s3 https://github.com/airbytehq/airbyte/actions/runs/1359321992
❌ connectors/source-s3 https://github.com/airbytehq/airbyte/actions/runs/1359321992

Looks like the EC2 we are using for deployment is missing pip

@jrhizor jrhizor temporarily deployed to more-secrets October 19, 2021 13:35 Inactive
@Phlair Phlair temporarily deployed to more-secrets October 19, 2021 13:42 Inactive
@Phlair
Copy link
Contributor Author

Phlair commented Oct 19, 2021

/publish connector=connectors/source-s3

🕑 connectors/source-s3 https://github.com/airbytehq/airbyte/actions/runs/1359415616
❌ connectors/source-s3 https://github.com/airbytehq/airbyte/actions/runs/1359415616

@Phlair Phlair temporarily deployed to more-secrets October 19, 2021 13:46 Inactive
@jrhizor jrhizor temporarily deployed to more-secrets October 19, 2021 13:48 Inactive
@Phlair
Copy link
Contributor Author

Phlair commented Oct 19, 2021

/publish connector=connectors/source-s3

🕑 connectors/source-s3 https://github.com/airbytehq/airbyte/actions/runs/1359631426
✅ connectors/source-s3 https://github.com/airbytehq/airbyte/actions/runs/1359631426

@jrhizor jrhizor temporarily deployed to more-secrets October 19, 2021 14:40 Inactive
@Phlair Phlair temporarily deployed to more-secrets October 19, 2021 15:01 Inactive
@Phlair Phlair merged commit 1d3a17a into master Oct 19, 2021
@Phlair Phlair deleted the george/fix-s3-oom branch October 19, 2021 15:50
schlattk pushed a commit to schlattk/airbyte that referenced this pull request Jan 4, 2022
…ns (airbytehq#6615)

* memory & performance optimisations

* address comments

* version bump

* added advanced_options for reading csv without header, and more custom pyarrow ReadOptions

* updated to use the latest airbyte-cdk

* updated docs

* bump source-s3 to 0.1.6

* remove unneeded lines

* Use the all dep ami for python builds.

* ec2-instance-id should be ec2-image-id

* ec2-instance-id should be ec2-image-id

Co-authored-by: Jingkun Zhuang <[email protected]>
Co-authored-by: Davin Chia <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues area/documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Source S3: Cannot allocate memory
8 participants