Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-42134: [C++][FS][Azure] Validate AzureOptions::{blob,dfs}_storage_scheme #42135

Merged
merged 2 commits into from
Jun 14, 2024

Conversation

kou
Copy link
Member

@kou kou commented Jun 13, 2024

Rationale for this change

This is for showing user-friendly error message for invalid {blob,dfs}_storage_scheme.

What changes are included in this PR?

Validate {blob,dfs}_storage_scheme before we use them in Make{Blob,DataLake}ServiceClient().

Are these changes tested?

Yes.

Are there any user-facing changes?

Yes.

Copy link

⚠️ GitHub issue #42134 has been automatically assigned in GitHub to PR creator.

@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting committer review Awaiting committer review labels Jun 13, 2024
@kou
Copy link
Member Author

kou commented Jun 14, 2024

Oh, we use invalid scheme in PyArrow tests. I'll fix the tests.

https://github.com/apache/arrow/actions/runs/9497207784/job/26173266311?pr=42135#step:6:5657

=================================== FAILURES ===================================
_____________________ test_azurefs_options[builtin_pickle] _____________________

pickle_module = <module 'pickle' from '/opt/conda/envs/arrow/lib/python3.9/pickle.py'>

    @pytest.mark.azure
    def test_azurefs_options(pickle_module):
        from pyarrow.fs import AzureFileSystem
    
        fs1 = AzureFileSystem(account_name='fake-account-name')
        assert isinstance(fs1, AzureFileSystem)
        assert pickle_module.loads(pickle_module.dumps(fs1)) == fs1
    
        fs2 = AzureFileSystem(account_name='fake-account-name',
                              account_key='fakeaccountkey')
        assert isinstance(fs2, AzureFileSystem)
        assert pickle_module.loads(pickle_module.dumps(fs2)) == fs2
        assert fs2 != fs1
    
>       fs3 = AzureFileSystem(account_name='fake-account', account_key='fakeaccount',
                              blob_storage_authority='fake-blob-authority',
                              dfs_storage_authority='fake-dfs-authority',
                              blob_storage_scheme='fake-blob-scheme',
                              dfs_storage_scheme='fake-dfs-scheme')

opt/conda/envs/arrow/lib/python3.9/site-packages/pyarrow/tests/test_fs.py:1450: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
pyarrow/_azurefs.pyx:109: in pyarrow._azurefs.AzureFileSystem.__init__
    ???
pyarrow/error.pxi:154: in pyarrow.lib.pyarrow_internal_check_status
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???
E   pyarrow.lib.ArrowInvalid: AzureOptions::blob_storage_scheme must be http or https: fake-blob-scheme

pyarrow/error.pxi:91: ArrowInvalid
______________________ test_azurefs_options[cloudpickle] _______________________

pickle_module = <module 'cloudpickle' from '/opt/conda/envs/arrow/lib/python3.9/site-packages/cloudpickle/__init__.py'>

    @pytest.mark.azure
    def test_azurefs_options(pickle_module):
        from pyarrow.fs import AzureFileSystem
    
        fs1 = AzureFileSystem(account_name='fake-account-name')
        assert isinstance(fs1, AzureFileSystem)
        assert pickle_module.loads(pickle_module.dumps(fs1)) == fs1
    
        fs2 = AzureFileSystem(account_name='fake-account-name',
                              account_key='fakeaccountkey')
        assert isinstance(fs2, AzureFileSystem)
        assert pickle_module.loads(pickle_module.dumps(fs2)) == fs2
        assert fs2 != fs1
    
>       fs3 = AzureFileSystem(account_name='fake-account', account_key='fakeaccount',
                              blob_storage_authority='fake-blob-authority',
                              dfs_storage_authority='fake-dfs-authority',
                              blob_storage_scheme='fake-blob-scheme',
                              dfs_storage_scheme='fake-dfs-scheme')

opt/conda/envs/arrow/lib/python3.9/site-packages/pyarrow/tests/test_fs.py:1450: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
pyarrow/_azurefs.pyx:109: in pyarrow._azurefs.AzureFileSystem.__init__
    ???
pyarrow/error.pxi:154: in pyarrow.lib.pyarrow_internal_check_status
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???
E   pyarrow.lib.ArrowInvalid: AzureOptions::blob_storage_scheme must be http or https: fake-blob-scheme

pyarrow/error.pxi:91: ArrowInvalid

@kou
Copy link
Member Author

kou commented Jun 14, 2024

+1

Failures are unrelated.

@kou kou merged commit d078d5c into apache:main Jun 14, 2024
34 of 38 checks passed
@kou kou deleted the cpp-azurefs-validate-scheme branch June 14, 2024 01:34
@kou kou removed the awaiting merge Awaiting merge label Jun 14, 2024
@felipecrv
Copy link
Contributor

Oh, we use invalid scheme in PyArrow tests. I'll fix the tests.

Impressive. @pitrou was right about always validating :)

Copy link

After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit d078d5c.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 28 possible false positives for unstable benchmarks that are known to sometimes produce them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants