-
Notifications
You must be signed in to change notification settings - Fork 247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[docker, scorecard] add docker base image, use in scorecard #5623
Conversation
docker/Dockerfile.base
Outdated
pip decorator pylint pytest flake8 \ | ||
requests \ | ||
jinja2 \ | ||
aiohttp aiodns aiohttp_jinja2 uvloop>=0.12 \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add asyncinit
? https://github.com/kchmck/pyasyncinit I'm going to use this for the MySQL PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
<?xml version="1.0"?> | ||
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?> | ||
|
||
<configuration> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this file used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's used in the spark-base image to configure the Google hadoop connector.
I agree about conda and having a set of docker images for the whole repo. I ran into other problems with conda and inter-project dependencies last week. The solution was to not rely on conda to specify the environment and just use |
Agreed. We just need to have the discipline to have all projects work with a fixed set of versions. What features of 3.7 does ci rely on? |
Not sure. @danking Do you know why |
It uses it because 3.7 was the newest when I started writing CI. Apparently asyncio got a lot better in 3.7. I think we should use 3.7 across the board for our non-user-distributed projects. https://docs.python.org/3/whatsnew/3.7.html#whatsnew37-asyncio |
OK, I'll upgrade to 3.7. There is a Python ppa with the latest version. |
I am (temporarily) defeated. Getting a working python 3.7 with pip on a standard distribution turns out to be non-trivial. Since ci doesn't specifically rely on 3.7 yet, I say we start with 3.6 and upgrade when it isn't too painful. It seems everyone is onboard with this. I am happy to move forward if you are, @jigold. |
We spoke about this in person, but I had this mostly typed up in this buffer so I'll leave it here for future us. Some context on the docker build cache'ing situation. Originally, Docker would use any image layer it had as a cache source. This was noted as a severe security vulnerability because I could make an image that claims to be the result of In response, Docker banned the use of non-locally-built (i.e. from the Internet) image layers as cache sources. I believe there may have been other motivations as well, but I did not carefully investigate. moby/moby#26065 documents the desire for a way to use non-locally-built images as a cache source. moby/moby#26839 implements this. Unfortunately, and I cannot find documentation on this, Another subtlety of Docker seems to be in maintenance mode and all new development is going into Moby. The replacement for |
@danking img: https://github.com/genuinetools/img "Standalone, daemon-less, unprivileged Dockerfile and OCI compatible container image builder. img is more cache-efficient than Docker and can also execute multiple build stages concurrently, as it internally uses BuildKit's DAG solver. The commands/UX are the same as docker {build,tag,push,pull,login,logout,save} so all you have to do is replace docker with img in your scripts, command line, and/or life." Oops, seems it doesn't quite work unprivileged yet, see: https://github.com/genuinetools/img#running-with-docker. Waiting on an upstream docker change, no movement in two months. Hrm. |
@cseed that seems good enough for our purposes. We can run it, unprivileged, on a VM outside of k8s. We have a thin service in k8s that has privilege to talk to that VM. It streams build context and docker file to said VM. That VM invokes |
FYI @danking @jigold @akotlar This might be contentious. The wisdom is to make docker images small, but I think this will greatly improve our docker pull/build/push times.
Watching some recent builds and deploys, it seems insane, the time we spend and the number of images we build with (mostly) duplicate contents. The goal here is to build a common base image for all containers in our cluster. I built on Ubuntu instead of Debian because it has Python 3.6 by default. It contains useful stuff I often find myself installing when debugging inside the cluster (htop, emacs, others should feel free to add tools they like), python3 and pip packages needed by any service (no conda, thank you very much), jvm and the google-cloud-sdk with kubectl. I also build a spark-base (built on base) with spark and the google hadoop connector.
Currently, I just use the base in scorecard. If we like this, I'll convert the other services (and pr-builder, which is hopefully not long for this world) on top of this. I will also make image fetcher fetch the base image as well as the notebook images.
The current scorecard image is ~900MB. The new base image is ~2GB, and scorecard adds an additional 15KB to that.