Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run open_smart[s3] without installing dependencies for other providers. #840

Open
3 tasks done
ODudek opened this issue Oct 11, 2024 · 6 comments
Open
3 tasks done

Comments

@ODudek
Copy link

ODudek commented Oct 11, 2024

Problem description

Be sure your description clearly answers the following questions:

  • What are you trying to achieve?
    I want to use open_smart[s3] without the need to install additional dependencies for other providers (gcs/azure).
  • What is the expected result?
    Ensure the application remains functional.
  • What are you seeing instead?
    pkg_resources.DistributionNotFound: The 'google-cloud-storage' distribution was not found and is required by the application

Steps/code to reproduce the problem

requirements.txt

smart_open[s3]==7.0.4
from smart_open import open
client = boto3.client(service_name='s3',
                      endpoint_url='xxx',
                      aws_access_key_id=config['access_key'],
                      region_name=config['region'],
                      aws_secret_access_key=config['secret_key'])
def open_stream(url: str, mode: str, args: dict):
    return open(url,
        mode=mode,
        transport_params={
            'client': client,
            'client_kwargs': args
        }
    )

All I need to do is run the application, and right after starting, I get the error: pkg_resources.DistributionNotFound: The 'google-cloud-storage' distribution was not found and is required by the application

Versions

Please provide the output of:

import platform, sys, smart_open
print(platform.platform())
Linux-5.10.225-213.878.amzn2.x86_64-x86_64-with-debian-bullseye-sid
print("Python", sys.version)
Python 3.7.17 (default, Sep 19 2023, 14:13:00) 
[GCC 9.4.0]
print("smart_open", smart_open.__version__)
smart_open 7.0.4

Checklist

Before you create the issue, please make sure you have:

  • Described the problem clearly
  • Provided a minimal reproducible example, including any required data
  • Provided the version numbers of the relevant software
@ddelange
Copy link
Contributor

Can you provide the full error traceback?

@ODudek
Copy link
Author

ODudek commented Oct 11, 2024

sure

  File "test.py", line 1, in <module>                                                                                         
    from smart_open import open                                                                                                                                                       
  File "/app/lib/smart_open/__init__.py", line 34, in <module>                                                                                                      
    from .smart_open_lib import open, parse_uri, smart_open, register_compressor  # noqa: E402                                                                                        
  File "/app/lib/smart_open/smart_open_lib.py", line 35, in <module>                                                                                                
    from smart_open import doctools                                                                                                                                                   
  File "/app/lib/smart_open/doctools.py", line 21, in <module>                                                                                                      
    from . import transport                                                                                                                                                           
  File "/app/lib/smart_open/transport.py", line 101, in <module>                                                                                                    
    register_transport("smart_open.gcs")                                                                                                                                              
  File "/app/lib/smart_open/transport.py", line 49, in register_transport                                                                                           
    submodule = importlib.import_module(submodule)                                                                                                                                    
  File "/opt/sdk/python_3.7.17.3_x86_64/lib/python3.7/importlib/__init__.py", line 127, in import_module                                                                              
    return _bootstrap._gcd_import(name[level:], package, level)                                                                                                                       
  File "/app/lib/smart_open/gcs.py", line 15, in <module>                                                                                                           
    import google.cloud.storage                                                                                                                                                       
  File "/app/lib/google/cloud/storage/__init__.py", line 36, in <module>                                                                                            
    __version__ = get_distribution("google-cloud-storage").version                                                                                                                    
  File "/opt/sdk/python_3.7.17.3_x86_64/lib/python3.7/site-packages/pkg_resources/__init__.py", line 482, in get_distribution                                                         
    dist = get_provider(dist)                                                                                                                                                         
  File "/opt/sdk/python_3.7.17.3_x86_64/lib/python3.7/site-packages/pkg_resources/__init__.py", line 358, in get_provider                                                             
    return working_set.find(moduleOrReq) or require(str(moduleOrReq))[0]                                                                                                              
  File "/opt/sdk/python_3.7.17.3_x86_64/lib/python3.7/site-packages/pkg_resources/__init__.py", line 901, in require                                                                  
    needed = self.resolve(parse_requirements(requirements))                                                                                                                           
  File "/opt/sdk/python_3.7.17.3_x86_64/lib/python3.7/site-packages/pkg_resources/__init__.py", line 787, in resolve                                                                  
    raise DistributionNotFound(req, requirers)                                                                                                                                        
pkg_resources.DistributionNotFound: The 'google-cloud-storage' distribution was not found and is required by the application

@ddelange
Copy link
Contributor

ddelange commented Oct 11, 2024

it looks like the /app/lib/google/cloud/storage/__init__.py file is present on your system, but potentially not installed via pip or otherwise not able to be discovered by pkg_resources.

can you attempt to pip uninstall google-cloud-storage? does that uninstall something? is the file still present on your system afterwards? if no, does your snippet start working?

@ddelange
Copy link
Contributor

ddelange commented Oct 11, 2024

smart_open only catches ImportError to skip libs when they're not installed, but apparently on your system it can be (partially) imported but is not properly installed (erroring during import with DistributionNotFound when google-cloud-storage lib checks for its proper installation and that's a hard fail).

@ODudek
Copy link
Author

ODudek commented Oct 11, 2024

I’ll try it later, but I’m curious why open_smart is looking for those packages when I only want to use S3

@ddelange
Copy link
Contributor

ddelange commented Oct 11, 2024

It's doing so in order to populate smart_open.transport.SUPPORTED_SCHEMES (and underlying _REGISTRY, both used by get_transport) which has been part of the public API since v1.11.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants