Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update docker personality to accept API versions > 1.23 #3415

Closed
sflxn opened this issue Nov 30, 2016 · 32 comments · Fixed by #3756
Closed

Update docker personality to accept API versions > 1.23 #3415

sflxn opened this issue Nov 30, 2016 · 32 comments · Fixed by #3756
Assignees
Labels
area/docker Support for the Docker operations component/persona/docker kind/investigation A scoped effort to learn the answers to a set of questions which may include prototyping priority/p2
Milestone

Comments

@sflxn
Copy link
Contributor

sflxn commented Nov 30, 2016

As a user of VIC, I want to use the latest version of the docker CLI and use it with the latest version of VIC without getting a version mismatch error like the following,

Error response from daemon: client is newer than server (client API version: 1.24, server API version: 1.23)

As a user of VIC, I should also not have to set DOCKER_API_VERSION environment variable just to use VIC.

One option:

We need to update the docker personality server to support the upcoming API version 1.25.

  1. Update vendored docker sources to 1.13
  2. Remove vendored engine-api as it is now deprecated and moved into the main docker sources
  3. Update the docker personality's backend code
  4. Update the docker personality's main.go code
@sflxn sflxn added area/docker Support for the Docker operations component/persona/docker labels Nov 30, 2016
@sflxn sflxn self-assigned this Nov 30, 2016
@sflxn
Copy link
Contributor Author

sflxn commented Dec 1, 2016

Docker has added a few more backends that we'll have to return not implemented messages.

checkpoint
plugin
swarm

@sflxn
Copy link
Contributor Author

sflxn commented Dec 1, 2016

They've added context to a few previous calls, such as container logs and stats. They've also added new parameters to a lot of previous calls. After I get everything to build again, I'll need to revisit the functions that have change and see what we'll need to do to support the new parameters.

@stuclem
Copy link
Contributor

stuclem commented Dec 1, 2016

@sflxn I take it that this is not for 0.8?

@mdubya66 mdubya66 added this to the v1.0 milestone Dec 1, 2016
@mdubya66
Copy link
Contributor

mdubya66 commented Dec 1, 2016

updated the milestone. We will take this sooner if it is ready but it is required for 1.0

@sflxn
Copy link
Contributor Author

sflxn commented Dec 2, 2016

@stuclem Yeah, this is post GA.

@sflxn
Copy link
Contributor Author

sflxn commented Dec 2, 2016

There's quite a bit of changes. I've had to update the vendor for the following:

  1. github.com/docker/docker to tag v1.13.0-rc2. This will likely get updated again to the full 1.13 release. I won't do a PR till they do the official release.
  2. github.com/docker/libnetwork to master. Unfortunately, the docker 1.13RC3 does not seem to compile well with the latest libnetwork release. I have a feeling once docker 1.13 is released, libnetwork will have a release at the same time.
  3. Remove github.com/docker/engine-api
  4. github.com/vishvananda/netlink to latest. Unfortunately, they have no official release tags.

Currently, going through the code to update the type changes docker has made. As the personality code gets more and more fleshed out, this job of keeping up with the API version will become increasingly harder. It's still possible to do with one person, but it is a more work than when I updated us to 1.23.

I've noticed they've vendored github.com/docker/distribution into the main docker repo. I may have to find out which version and then sync up our vendored version with theirs.

@sflxn
Copy link
Contributor Author

sflxn commented Dec 2, 2016

They've renamed types.Image to types.ImageSummary and added Containers and SharedSize field. We should be ok with renaming ours, but we should look into what these Contaienrs and SharedSize field refers to. I'll proceed with the first pass and then share the work with others to look into these changes. Most seem like additional features.

@mdubya66
Copy link
Contributor

mdubya66 commented Dec 2, 2016

Should we take a more holistic view of this. We have three things bearing down on us around this work.

  1. Port Layer refactor Refactor Port Layer #1043
  2. Move to new version of goswagger Upgrade go-swagger to latest #929
  3. keeping up with docker Update docker personality to accept API versions > 1.23 #3415

I suggest that @sflxn @vburenin @corrieb and @hickeng meet and let's put together a plan that works to get there for 1.0.

@sflxn
Copy link
Contributor Author

sflxn commented Dec 5, 2016

Docker has started vendoring a lot of projects we were pulling in separately (and they used to refer to externally vs vendoring), and the go compiler is enforcing namespaces around those projects so we are going to get conflicts. For example, we vendor github.com/docker/go-connections/nat. Docker vendors that into the main docker repo so the go compiler does not like that.

@sflxn
Copy link
Contributor Author

sflxn commented Dec 6, 2016

I found this discussion pertaining to the comment I added above.

https://groups.google.com/forum/#!topic/golang-nuts/liKQ6gtU4hk

They came to the same conclusion I did. However, it's not as simple as they discuss. The ideal solution would take into consideration the version of the repo. If we vendor two repos that themselves vendor a third common repo from different versions, rolling up vendored sources to the top level vendor folder is not enough. This has been solved in other installation systems, but golang did not consider this scenario when they added vendoring. This graph problem is actually a very common interview question in SV.

It might be that vendoring and versioning headaches was what drove Docker to roll engine-api back into the main docker engine repo. It makes their lives easier and our's much harder.

@fdawg4l
Copy link
Contributor

fdawg4l commented Dec 6, 2016

For example, we vendor github.com/docker/go-connections/nat. Docker vendors that into the main docker repo so the go compiler does not like that.

I tried just such a scenario when I added /vendor to our repo and couldn't make the compiler behave like you're describing. It was always bottom up and namespaces were full and complete relative to their own /vendor root. Can you give me an example I can play with?

PS. Our /vendor folder has lots of redundant pkgs but things seem to work. Could we be that lucky that they're all referring to the same or similar versions?

@sflxn
Copy link
Contributor Author

sflxn commented Dec 6, 2016

The problem occurs this way.

We import github.com/docker/. Docker then now vendors the same repo so we have

vic/vendor/github.com/docker/
vic/vendor/github.com/docker/docker/vendor/github.com/docker/

Some repo defines type foo. Docker code uses type foo (from their vendored repo) and we declare type foo and pass it into the docker code. The Go compiler will complain about type mismatch. For example,

cannot use hostconfig.PortBindings (type "github.com/vmware/vic/vendor/github.com/docker/docker/vendor/github.com/docker/go-connections/nat".PortMap) as type "github.com/vmware/vic/vendor/github.com/docker/go-connections/nat".PortMap in argument to unrollPortMap

Go will not allow us to import vendored src's vendored packages so it isn't possible to import the second line. We'll end up getting an error like this,

imports github.com/docker/docker/vendor/github.com/docker/go-connections/nat: use of vendored package not allowed

@sflxn
Copy link
Contributor Author

sflxn commented Dec 9, 2016

Added LinkLocalAddresses() to endpoint.go. Also added ResolveService(), EnableService(), and DisableService() to sandbox.go.

@sflxn
Copy link
Contributor Author

sflxn commented Dec 9, 2016

DetachKeys for attach is now a string.

@sflxn
Copy link
Contributor Author

sflxn commented Dec 13, 2016

Might have hit a roadblock. Docker assume swarm integration. They don't check if swarm's clustermanager is nil or use interfaces. This might be a problem. This might be why Red Hat forked Docker. I'll see if we can make it work, but even if I can, I'm afraid they might update the code again for Docker 1.14 that will be problematic again.

Alternatively, maybe we should do a PR for docker to make it more flexible (hmm... wonder if Redhat already tried that).

@mlh78750
Copy link
Contributor

cc @corrieb see above.

@sflxn sflxn removed this from the v1.0 milestone Jan 18, 2017
@sflxn sflxn added the Spike label Jan 18, 2017
@sflxn sflxn changed the title Update docker personality to docker API 1.25 Update docker personality to accept API versions > 1.23 Jan 18, 2017
@sflxn
Copy link
Contributor Author

sflxn commented Jan 18, 2017

Just putting this out there, if there is a willingness to modify vendored code and break API version contract, there are only two changes to two files that will disable this check.

github.com/docker/docker/api/common.go
github.com/docker/docker/api/server/middleware/version.go

@sflxn sflxn self-assigned this Jan 18, 2017
@sflxn sflxn added kind/investigation A scoped effort to learn the answers to a set of questions which may include prototyping and removed Spike labels Jan 18, 2017
@mhagen-vmware mhagen-vmware added this to the Sprint 1 milestone Jan 18, 2017
@sflxn
Copy link
Contributor Author

sflxn commented Jan 21, 2017

I pulled in the official 1.13 release today and found they are vendoring a non official release of docker/distribution. They are really sloppy about vendoring their own official releases, and their vendoring tool does not specify which commit sha that it vendors.

Also, I've concluded that it's way too difficult disentangle their tightly coupled Swarm integration. Swarm research spike isn't even in our plan for the next few sprints. The Swarm integration implies a lot of changes to our networking code. Previously, they had a networking backend that their rest server called. Instead of putting conditional swarm support in that backend, they pushed the swarm integration directly in the rest front end. Not completely sure, but one reason to not plugin swarm support cleanly in the router backend is to force Swarm integration with Docker... or other reasons that I can only guess.

At this point, I'm concluding that the path of integrating their 1.13 daemon code is not possible. Even if we continued and were able to get it VIC to build and run, we open ourselves to a lot of potential problem as 1.13 is meant to have Swarm support.

@sflxn
Copy link
Contributor Author

sflxn commented Jan 21, 2017

@vburenin
Copy link
Contributor

@sflxn https://github.com/docker/docker/blob/master/vendor.conf - this file contains commit SHAs, is it what you were looking for?

@sflxn
Copy link
Contributor Author

sflxn commented Jan 23, 2017

@vburenin Thanks, but that wasn't a road block for me. The Swarm integration at the rest layer with no use of golang's interfaces is the major roadblock. I can certainly hook up their daemon and cluster manager in our personality main code without understanding the implication of Swam on VIC, but that opens us up for a boatload of bugs. We'll revisit this issue once we commit to a Swarm research spike and understand all the implications.

@sflxn
Copy link
Contributor Author

sflxn commented Jan 23, 2017

To finish up this research spike, here are info on the last remaining issues:

  1. To include 1.13, we must install btrfs and devmapper dev libraries and C header files
  2. They are still using github.com/golang/x/net/context
  3. github.com/docker/go-connections must be rolled up so that their vendored repo doesn't collide with ours
  4. Network API calls affected by the Swarm integration:
    GetNetworks
    GetNetwork
    GetNetworkByName
    IsManager
    RemoveNetwork
    CreateNetwork
  5. I've only gone as far as to try to make the personality server code compile with the 1.13 code base, but not yet sure how the method's parameter changes affect the actual functionality. There were a few methods, such as one related to pulling images that I had to completely comment out to get it built. Basically, we'll need to review every method whose parameters changed to determine how the change affects the method's functionality.

@mhagen-vmware
Copy link
Contributor

@sflxn Not clear what the deliverable outcome of this was, did you create any new stories for the work that needs to be eventually done?

@sflxn
Copy link
Contributor Author

sflxn commented Jan 23, 2017

@mhagen-vmware
I have an update. Docker API version isn't an issue anymore with docker 1.13 CLI. This is one issue they said they resolved in 1.13. I just tested it, and it no longer gives me a warning with VIC. We no longer need to pull in the 1.13 sources.

We should add a message to upgrade to Docker CLI 1.13 or add DOCKER_API_VERSION=1.23 to vic-machine.

@sflxn
Copy link
Contributor Author

sflxn commented Jan 23, 2017

Created #3697 as the result of this research spike.

@sflxn sflxn reopened this Jan 24, 2017
@sflxn
Copy link
Contributor Author

sflxn commented Jan 24, 2017

Docker 1.13 doesn't solve our problem after all. I made a mistake with my tests yesterday. I was using a one off VCH that had a personality server that bypasses the version check. After using master to create a new VCH, I noticed it no longer worked.

After some research, I found these two threads from Docker related to the versioning.
moby/moby#27745
moby/moby#21930

Apparently, they only handle the version mismatch starting with API 1.24. Their solution looks like a half measure. They had a good proposal in in 21930 and another good proposal in the PR but decided to go another direction. Versioning is just a headache in many systems, esp one where they are constantly changing the API calls and protocol.

We're back to needing to do a Swarm research spike just to complete the API version work. It's not that we have to fully support Swarm. We just need to understand what adding all of Docker's new Swarm connection implies for VIC, what new edge cases it presents.

@sflxn
Copy link
Contributor Author

sflxn commented Jan 26, 2017

Had to update more vendoring after pulling in the released 1.13 repo:

github.com/docker/distribution  28602af35aceda2f8d571bad7ca37a54cf0250bc
github.com/docker/libtrust      9cbd2a1374f46905c68a4eb3694a130610adc62a
    -- rolled up github.com/docker/libtrust from docker/docker and docker/distribution
github.com/docker/swarmkit      335561b66a44bf214224afe879e4368204e7fa45
    -- rolled up swarmkit from docker/docker
github.com/Sirupsen/logrus      v0.11.0
    -- rolled up logrus from docker/docker
    -- rolled up logrus from docker/swarmkit
google.golang.org/grpc          v1.0.2
    -- rolled up from docker/docker
    -- rolled up from docker/swarmkit
github.com/golang/protobuf      1f49d83d9aa00e6ce4fc8258c71cc7786aec968a
github.com/vishvananda/netns    604eaf189ee867d8c147fafc28def2394e878d25
    -- rolled up from docker/docker
github.com/vishvananda/netlink  482f7a52b758233521878cb6c5904b6bd63f3457
    -- rolled up from docker/docker


rolled up golang.org/x/net/context in docker/swarmkit

These revision and tag corresponds to the same one used by Docker.

@mhagen-vmware mhagen-vmware removed this from the Sprint 1 milestone Jan 31, 2017
@sflxn
Copy link
Contributor Author

sflxn commented Feb 1, 2017

#3756 is a WIP for the implementation of this.

I know this was a research spike, but I can't really research it without actually attempting to implement it and try to get it to build and run to see where the problems resides.

I've gotten the WIP to build and run. It does the basics fine. I didn't fully hook up the cluster manager, but I noticed compose works with it so I ran a full ci cycle on this, and as expected, network is where it fails.

I need to still need robot scripts to verify swarm, checkpoint, build, and plugin calls returns not implemented.

@sflxn sflxn self-assigned this Feb 1, 2017
@sflxn
Copy link
Contributor Author

sflxn commented Feb 1, 2017

There's SEVERAL dozen new API calls that I need to write robot scripts to verify we at least return a not implemented message.

Secrets, Checkpoint, Deploy, Node, Plugin, Service, Stack, Swarm

Each of those have multiple new CLI and API calls.

@mhagen-vmware mhagen-vmware added this to the Sprint 2 milestone Feb 1, 2017
@sflxn
Copy link
Contributor Author

sflxn commented Feb 2, 2017

Docker CLI will output "Error response from daemon: This experimental feature is disabled by default. Start the Docker daemon with --experimental in order to enable it" for new APIs that are experimental and are not enabled.

To prevent such message in VIC, I am enabling the backend routers' experimental flag. This will result in VIC outputting a "not yet implemented" message instead. Enabling the experimental flag only opens up access to the API router in question so it should not have any adverse effect on VIC as we're implementing the API routers ourselves.

@sflxn
Copy link
Contributor Author

sflxn commented Feb 2, 2017

VIC will respond to docker deploy with "only supported with experimental daemon." That error does not originate from our code. Just stating here for the record.

@sflxn
Copy link
Contributor Author

sflxn commented Feb 3, 2017

The PR needs update to our CI to add docker 1.11, 1.12, and 1.13 CLI. #3844

At the point, it's pending full test suite before I remove the WIP label.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/docker Support for the Docker operations component/persona/docker kind/investigation A scoped effort to learn the answers to a set of questions which may include prototyping priority/p2
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants