Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Up to date monolithic gem in addition to existing per-service gems #19202

Closed
mattwelke opened this issue Sep 20, 2022 · 2 comments
Closed

Up to date monolithic gem in addition to existing per-service gems #19202

mattwelke opened this issue Sep 20, 2022 · 2 comments

Comments

@mattwelke
Copy link

mattwelke commented Sep 20, 2022

Is your feature request related to a problem? Please describe.
I want to put together an Apache OpenWhisk runtime with gems for Google Cloud built in. Because the Google Cloud client libraries for Ruby are published as individual gems, this means in order to create a runtime image that includes all client libraries, I'd need to include each gem in the Gemfile. If I wanted to avoid having to manually review which gems are available in the future, I'd need to put together a script that scrapes public info like the contents of this GitHub repo to automatically keep the list of gems up to date, and use this script in the runtime image's build process.

Describe the solution you'd like
Taking inspiration from AWS, with their aws-skd gem, I'd like to see a "gcp-sdk", or "google-cloud" gem. Funny enough, I did find such a gem already called google-cloud, but it appears to be out of date, with its last update being in 2019. This gem appears to have been described (using the alias "gcloud gem") in #341 where it was stated:

I think we should follow what rspec does and have the main gcloud gem have dependencies on all the service gems.

So this idea may have been considered before.

Describe alternatives you've considered
The approaches described above where I maintain the list of gems myself or create build automation tooling on my side to determine the list of gem automatically upon each build.

Additional context
To help explain where this need comes from, I should explain how OpenWhisk custom runtimes images generally work.

If an OpenWhisk user wants to create an action (aka FaaS function) that interacts with a Google Cloud service, they can specify just the gems they need in their Gemfile and include the installed gems in the ZIP file they upload to OpenWhisk as part of creating or updating their action.

However, there is a limit of 48MB of code for an action, and a long list of dependencies can put a user over this limit. They can make a custom runtime image with the dependencies they need, but this can be a lot of work for OpenWhisk users who aren't familiar with Docker and for users who don't want to maintain that.

In the middle, between the built in runtime images that try to appeal to every user and the custom runtime images made by one user to appeal to only them, there can be runtime images maintained by the community. For example, a runtime image could target OpenWhisk users who also use GCP. This example is the use case I'm trying to fulfill with this feature, should it be accepted. I have use for this kind of runtime myself right now (for interacting with BigQuery from an OpenWhisk action deployed to non-GCP infra), and I could include all Google Cloud service gems in an image that I share with the community so that others can use it too.

@mattwelke mattwelke changed the title Up to date monolithic gem in addition to exist per-service gems Up to date monolithic gem in addition to existing per-service gems Sep 20, 2022
@dazuma
Copy link
Member

dazuma commented Sep 28, 2022

Hi @mattwelke, thanks for the feedback and idea!

This is a really interesting discussion because, as you noticed, we did previously maintain a monolithic gem called google-cloud, but discontinued it. Originally it was an "sdk" type gem that directly contained the code for all Google Cloud clients. Later, after we split it into separate gems for each client, we changed google-cloud to an empty gem that included all the individual client gems as dependencies.

But later we discontinued this gem for one simple reason: scale. It became far too unwieldy to be useful. As a "contain all the code" gem, it was so large that it would take a long time to download and install, and it would break documentation generators due to its sheer size. As a "depend on all the gems" gem, it would take even longer to install because of the sheer number of APIs: the dependency list today would be around 300 and growing rapidly. And the APIs now being added are getting more and more special-purpose, and we'd have to answer the question of whether to make the gem bigger for everyone to meet the needs of the smaller and smaller percentage of users who want that next special-purpose service. In a world with literally hundreds of services, a "do everything" gem was becoming less and less useful for people. Does your BigQuery image really need the bloat of 300 additional clients for government assured workloads, contact center AI, and private certificate authorities?

For your purposes, I think you have the additional challenge of gem versions. Your users may have different requirements for different services, and I don't see a good way to satisfy all of them with a single set of gems or a single monolithic gem. Maybe the latest release of every semver-major version of every service, but even that can blow up in size pretty quickly.

One way forward might be to choose a subset of common services and preinstall only those clients. That list might include bigquery, storage, pubsub, maybe some of the databases, some of the ops/telemetry tools. But choosing that subset for your particular users/community is probably something you'd be better suited to do than Google would.

Any thoughts?

@mattwelke
Copy link
Author

mattwelke commented Sep 28, 2022

Thanks for that background. That definitely explains why you chose to abandon that strategy. It appears AWS does things a bit differently. There are much fewer dependencies in their "depend on all the gems" gem compared to the number of GCP APIs that exist according to your explanation. Perhaps that makes it more feasible for them to do it. Or, perhaps they also have a huge number of APIs and they've chosen a few popular ones as a sunset of APIs to include in their monolithic gem.

One thing I'll mention is that for my use case, it's actually not a problem for the list of dependencies to be so huge that it takes a long time for them to download and inflates build artifact size. OpenWhisk runtime images are normally on the large side. They include popular dependencies (ex. numpy in Python) and also include build tools so that users can deploy source code and the runtime can compile their source code during a cold start for the function. The build tools thing is for things like Java and Go, not things like Ruby, though.

And because most users are using the same runtime, which uses the same Docker image, to deploy functions, the Docker images are cached by the nodes, not downloaded often.

So what some people might consider "too big" (ex. 1GB of dependencies that take 5 mins to download) might be completely fine for this use case.

But overall, I agree with your overall assessment. It makes sense to shift the responsibility for putting together packages of gems to the community. In my use case, an OW image to support OW users who want to do a multi cloud architecture with GCP (ex. want to use BigQuery as their data warehouse), I'd probably pick the gems for the most popular services or go for a complete list of gems as long as it wouldn't cause too much major version chaos (bumping major version every time one of the gems had its major version bumped).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants