Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement transfer for BigQuery - read/write #1829

Merged
merged 38 commits into from
Mar 16, 2023
Merged

Conversation

rajaths010494
Copy link
Contributor

@rajaths010494 rajaths010494 commented Mar 6, 2023

Please describe the feature you'd like to see

  • Add DataProvider for Bigquery - read/write methods
  • Add non-native transfer implementation for GCS to BigQuery
  • Add non-native transfer implementation for S3 to BigQuery
  • Add non-native transfer example DAG for BigQuery to Sqlite
  • Add non-native transfer example DAG for BigQuery to Snowflake
  • Add example DAG
  • Add tests with 90% coverage

Acceptance Criteria

  • All checks and tests in the CI should pass
  • Unit tests (90% code coverage or more, once available)
  • Integration tests (if the feature relates to a new database or external service)
  • Example DAG
  • Docstrings in reStructuredText for each of methods, classes, functions and module-level attributes (including Example DAG on how it should be used)
  • Exception handling in case of errors
  • Logging (are we exposing useful information to the user? e.g. source and destination)
  • Improve the documentation (README, Sphinx, and any other relevant)
  • How to use Guide for the feature (example)

closes: #1732
closes: #1785
closes: #1730

@rajaths010494 rajaths010494 changed the title Bigquery dataprovider Implement transfer for BigQuery - read/write Mar 6, 2023
@codecov
Copy link

codecov bot commented Mar 6, 2023

Codecov Report

Patch coverage: 78.37% and project coverage change: +0.93 🎉

Comparison is base (bf189de) 85.72% compared to head (daaace8) 86.66%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1829      +/-   ##
==========================================
+ Coverage   85.72%   86.66%   +0.93%     
==========================================
  Files         124      125       +1     
  Lines        6485     6612     +127     
  Branches      643      648       +5     
==========================================
+ Hits         5559     5730     +171     
+ Misses        791      741      -50     
- Partials      135      141       +6     
Flag Coverage Δ
UTO 65.96% <78.37%> (+5.82%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...ersal_transfer_operator/data_providers/__init__.py 100.00% <ø> (ø)
...sfer_operator/data_providers/database/snowflake.py 80.00% <0.00%> (+5.00%) ⬆️
...universal_transfer_operator/data_providers/base.py 57.14% <50.00%> (-0.31%) ⬇️
...perator/data_providers/database/google/bigquery.py 71.25% <71.25%> (ø)
...ransfer_operator/data_providers/database/sqlite.py 80.00% <75.00%> (+5.00%) ⬆️
..._transfer_operator/data_providers/database/base.py 72.27% <77.77%> (+3.69%) ⬆️
...ransfer_operator/data_providers/filesystem/base.py 75.47% <80.00%> (-1.98%) ⬇️
...ansfer_operator/data_providers/filesystem/local.py 82.92% <93.75%> (+6.00%) ⬆️
...nsfer_operator/data_providers/filesystem/aws/s3.py 68.68% <100.00%> (+2.02%) ⬆️
...ator/data_providers/filesystem/google/cloud/gcs.py 68.36% <100.00%> (+4.15%) ⬆️
... and 2 more

... and 8 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@utkarsharma2 utkarsharma2 mentioned this pull request Mar 9, 2023
2 tasks
@utkarsharma2 utkarsharma2 marked this pull request as ready for review March 9, 2023 09:11
@utkarsharma2 utkarsharma2 merged commit d06d3e7 into main Mar 16, 2023
@utkarsharma2 utkarsharma2 deleted the bigquery_dataprovider branch March 16, 2023 09:31
sunank200 added a commit that referenced this pull request Mar 24, 2023
# Description
## What is the current behavior?
<!-- Please describe the current behavior that you are modifying. -->
As part of this [PR
1829](#1829), a bug was
introduced. This is not handling the case when the File path is a file
pattern or directory.
<img width="1680" alt="Screenshot 2023-03-24 at 12 23 45 AM"
src="https://user-images.githubusercontent.com/8670962/227315288-b11bc232-65a0-458f-a217-b74c9b882b07.png">


In case the destination dataset is a folder it doesn't pass the filename
from the source at all here:
https://github.com/astronomer/astro-sdk/blame/1b35bb9443d57ef7834259a3cd3c26695ea5fcf8/universal_transfer_operator/src/universal_transfer_operator/data_providers/filesystem/base.py#L131
instead of
https://github.com/astronomer/astro-sdk/blame/bf189de90e4ad9f2db77c7ac46a7193bde9a9c92/universal_transfer_operator/src/universal_transfer_operator/data_providers/filesystem/base.py#L123


<!--
Issues are required for both bug fixes and features.
Reference it using one of the following:

closes: #ISSUE
related: #ISSUE
-->
closes: #1870


## What is the new behavior?
<!-- Please describe the behavior or changes that are being added by
this PR. -->

- It should check if the provided file is a directory or file pattern
and then add the actual file name from the source dataset from
`FileStream`

```
destination_file = self.dataset.path
# check if destination dataset is folder or file pattern
if self.dataset.is_pattern():
    destination_file = os.path.join(self.dataset.path, os.path.basename(source_ref.actual_filename))
```
- Add the test

<img width="1719" alt="Screenshot 2023-03-24 at 12 44 01 AM"
src="https://user-images.githubusercontent.com/8670962/227320470-48bfc6b7-54d9-49eb-b6d5-17efd129f9d1.png">


## Does this introduce a breaking change?
No

### Checklist
- [x] Created tests which fail without the change (if possible)
- [x] Extended the README / documentation, if necessary

---------

Co-authored-by: utkarsh sharma <[email protected]>
sunank200 added a commit to astronomer/apache-airflow-providers-transfers that referenced this pull request Mar 24, 2023
# Description
## What is the current behavior?
<!-- Please describe the current behavior that you are modifying. -->
As part of this [PR
1829](astronomer/astro-sdk#1829), a bug was
introduced. This is not handling the case when the File path is a file
pattern or directory.
<img width="1680" alt="Screenshot 2023-03-24 at 12 23 45 AM"
src="https://user-images.githubusercontent.com/8670962/227315288-b11bc232-65a0-458f-a217-b74c9b882b07.png">


In case the destination dataset is a folder it doesn't pass the filename
from the source at all here:
https://github.com/astronomer/astro-sdk/blame/1b35bb9443d57ef7834259a3cd3c26695ea5fcf8/universal_transfer_operator/src/universal_transfer_operator/data_providers/filesystem/base.py#L131
instead of
https://github.com/astronomer/astro-sdk/blame/bf189de90e4ad9f2db77c7ac46a7193bde9a9c92/universal_transfer_operator/src/universal_transfer_operator/data_providers/filesystem/base.py#L123


<!--
Issues are required for both bug fixes and features.
Reference it using one of the following:

closes: #ISSUE
related: #ISSUE
-->
closes: #1870


## What is the new behavior?
<!-- Please describe the behavior or changes that are being added by
this PR. -->

- It should check if the provided file is a directory or file pattern
and then add the actual file name from the source dataset from
`FileStream`

```
destination_file = self.dataset.path
# check if destination dataset is folder or file pattern
if self.dataset.is_pattern():
    destination_file = os.path.join(self.dataset.path, os.path.basename(source_ref.actual_filename))
```
- Add the test

<img width="1719" alt="Screenshot 2023-03-24 at 12 44 01 AM"
src="https://user-images.githubusercontent.com/8670962/227320470-48bfc6b7-54d9-49eb-b6d5-17efd129f9d1.png">


## Does this introduce a breaking change?
No

### Checklist
- [x] Created tests which fail without the change (if possible)
- [x] Extended the README / documentation, if necessary

---------

Co-authored-by: utkarsh sharma <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants