-
-
Notifications
You must be signed in to change notification settings - Fork 382
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't package all cloud dependencies at once #443
Comments
I was just about to open this exact same issue, what are the odds! |
@Tyrannas @mwozniczak Out of interest, what is the problem with (It's a genuine question. It's obvious that this is an issue for some people, and I'd like to understand why.) |
@mpenkov speaking from my experience, there's two main factors: one is that there's a somewhat hard limit on AWS lambda filesize and when you've got some heavy libs like numpy+pandas, it starts to add up. second one is, for some reason I needed to add an explicit reference to |
@mpenkov same remarks for us, we tend to generally have a low number of requirements as possible. On bigger projects it can reduce the likelihood to have conflicts between versions. Other concerns are build time + docker image sizes. If we have less requirements then we have smaller docker images + a smaller build time, it's a more efficient process. |
@Tyrannas The PR has been merged, please go ahead. |
Alright, I'll fork and create a WIP PR |
Update on this: @mwozniczak @mpenkov This means that if I modify the setup to allow the pip install smart_open[boto3] of pip install smart_open[gcs] the pip install smart_open with no extras would only install requests. Which means that users with no fixed version in their dependencies and using s3 or gcs should then either modify their requirements to smart_open[boto3] or fix a previous version. Are you ok with that? |
@piskvorky WDYT? Seems like it will inconvenience more people than it would help. @Tyrannas Is there a way to disable certain extras explicitly? That way, people who need a lean install can request one explicitly. |
A hacky way would be to use an environment variable to specify extras explicitly. When empty, assume the user wants all the extras. So, if you just want S3, you'd do:
and then handle that in setup.py as something like: if os.environ['SMART_OPEN_EXTRAS'] == 's3':
setup_requires = [..., 'boto3'] This is un-Pythonic, but satisfies the use case without breaking default behavior. |
Huum I'm not quite fan of the env variable option, but I guess it could work. so: |
Yes, I'd prefer the default to remain "use everything" = the current behaviour = the most common use-case (most people don't care). But I like the idea of a lean install too, it's a valid use-case. Not sure what's the best way to achieve that. How about |
Yeah,
I would avoid letting the user specify which specific packages to install. That's really a task for pip, not for us, because it could involve dependency management problems. We could do:
and then let people @piskvorky's suggestion also looks good:
but also open to dependency problems (e.g. conflicts between packages required by e.g. S3 and GCS). |
Well I'm afraid those solutions would not be userfriendly at all, and are not easily packagable (which is the real problem here). What I suggest is to hope for a update on the setuptools issue before changing smart_opens packaging if that suits everyone |
@Tyrannas not sure about that. if the default would be "install everything", then I think I personally would be okay with @mpenkov's idea, as it's more of a poweruser feature anyway. side note, are the extras-related values magic in some way in how setuptools works with them? I mean, a lot of the |
@mwozniczak yeah I know but was I was saying is that is it not an easily packagable solution. Like you can indeed write --install-option in a requirements.txt but it seems that it can lead to some side effects: pypa/pip#5795 (comment) |
OK, so it sounds like there are no perfect solutions here.
Out of the above, I think (3) is the way to go, because:
Thoughts? |
I'm not a big fan of Some more talks about this here: https://stackoverflow.com/questions/8874638/how-to-maintain-pip-install-options-in-requirements-file-made-by-pip-freeze, we need to test if Another thing to test, the name: stats2
channels:
- javascript
dependencies:
- python=3.6 # or 2.7
- bokeh=0.9.2
- numpy=1.9.*
- nodejs=0.10.*
- flask
- pip:
- Flask-Testing We should test if adding My last thought is that if (when ?) pypa/setuptools fix the issue, do we will break the compatibility ? (by removing |
Is it really necessary to answer that question now? Personally, I'd deal with it when and if that happens. |
@mpenkov if my opinion carries any weight, option 3 would be perfectly fine for me :) |
I would suggest using try:
import boto3
except ImportError:
sys.stderr.write("Do pip install smart_open[aws] to use this module")
sys.exit(1) It would obviously be a breaking issue but to my knowledge The envar solution to me is something I have yet to see in the wild, probably for good reason, and would be a non starter at least imo.
If you provide a bad value for extras_require it will give you an error. Try |
This would seem to go well w/ #428 and be like a 2.0 release or however you're versioning. |
@menshikh-iv @piskvorky WDYT? |
@mpenkov I'm OK with |
Jus as an FYI this is the requirements.txt generated from a requirements.in file with just this package as a dependency.
It would only get worse with Azure support. |
just a suggestion, but might it make some sense to extract the core functionality, name it something like that repo would have the flavors solely as extras, as was suggested, and this current project would just have |
Well @mpenkov your call, I'm fine with this two last solutions ! |
It seems like pulling them out into a "core" package would cause developer toil. Every add/deletion of functionality would require 2 releases just for the sake of backwards compatibility. Given this doesn't really have to happen now you could provide the extras_require keys in advance so people can prep their code bases for the drop. core_deps = []
aws_deps = []
gcp_deps = []
azure_deps = []
all_deps = core_deps + aws_deps + gcp_deps + azure_deps
setup(
install_requires=core_deps + gcp_deps + aws_deps,
extras_require={
"aws": aws_deps,
"azure": azure_deps,
"gcp": gcp_deps,
"all": all_deps
}
) The first PR to introduce this could be the #228 PR as it will introduce Azure dependencies. If you look at So with this solution I could change my package today to be |
Like the fix to the backwards compatibility issue seems trivial compared to having to managing 2 pypi packages. Every time you added a new feature you'd have to mimic it's import path in the wrapping package. Same with deletions. Then you gotta version and deploy it. In order to fix the backwards compatibility issue with dependencies is change your install to As far as my last suggestion goes that would be something I'd release sooner rather than later. You wouldn't even have to include the Azure packages. Just make it a solo release and clearly explain the upcoming breaking change in the changelog. |
@Amertz08 well, i'm no packaging expert, so i'll just have to defer to other peoples' expertise on the matter |
Problem description
Smart open is a really useful library, but I find it a bit annoying to have all dependencies packaged with it. Most of the time one doesnt need to manipulate s3 and gcs and maybe azure storage if it were to be integrated in smart-open.
What I wish I could do:
pip install smart-open would be the same behaviour as now
pip install smart-open[s3] would only install boto3 dependencies
pip install smart-open[gcs] same for gcs
...
Note:
If you find it interesting I can assign this to myself and work on the project
BTW: i think it's the same behaviour for gensim, it packages boto3 but is not needed for common nlp tasks
The text was updated successfully, but these errors were encountered: