Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix GCSToGCSOperator copying list of objects without wildcard #28111

Merged
merged 1 commit into from
Dec 5, 2022

Conversation

moiseenkov
Copy link
Contributor

@moiseenkov moiseenkov commented Dec 5, 2022

Changes:

  • fixed GCSToGCSOperator in case copying list of objects without wildcard
  • fixed and slightly refactored unit tests
  • fixed example DAG

Aforementioned changes of GCSToGCSOperator cover the following cases:

  1. Copy a list of files into the folder.
copy_files = GCSToGCSOperator(
    task_id='copy_files_without_wildcard',
    source_bucket=SOURCE_BUCKET,
    source_objects=['src/file_1.txt', 'src/file_2.csv'],
    destination_bucket=TARGET_BUCKET,
    destination_object='new_folder/'
)

The previous implementation didn't actually copy files - it was just creating an empty destination folder. The following fix solves this problem and performs actual copying of the listed files into the specified destination folder.

  1. Copy folder without trailing slash
copy_files_from_folder = GCSToGCSOperator(
        task_id='copy_folder_without_trailing_slash',
        source_bucket=SOURCE_BUCKET,
        source_objects=['test_folder'],
        destination_bucket=TARGET_BUCKET,
        destination_object='new_folder/'
    )

For example, we have a folder test_folder/ and a file test_folder/file.txt inside of it. If we miss a trailing slash at the source folder name, then the previous implementation instead of copying the file file.txt were creating two files test_folder and new_folderfile.txt. It seems that there are two bugs here:
a) a file new_folder created instead of a folder new_folder/;
b) a wrong path new_folderfile.txt for the copied file was generated instead of new_folder/file.txt.
The following fix resolves these problems.

@moiseenkov moiseenkov requested a review from turbaszek as a code owner December 5, 2022 09:21
@boring-cyborg boring-cyborg bot added area:providers area:system-tests provider:google Google (including GCP) related issues labels Dec 5, 2022
@potiuk potiuk merged commit 3fef462 into apache:main Dec 5, 2022
@potiuk
Copy link
Member

potiuk commented Dec 5, 2022

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers area:system-tests provider:google Google (including GCP) related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants