Skip to content
This repository has been archived by the owner on Mar 5, 2024. It is now read-only.

v3 API and refactorings #109

Merged
merged 27 commits into from
Jul 12, 2018
Merged

v3 API and refactorings #109

merged 27 commits into from
Jul 12, 2018

Conversation

pingles
Copy link
Contributor

@pingles pingles commented Jul 4, 2018

This change was focused on changes to move to a new Agent/Server interaction ahead of v3. As part of doing this I also refactored/changed a few other interfaces to streamline stuff a little.

API Changes

This adds a new Server API call: GetPodCredentials that allows credentials to be obtained by the server given an IP address. Previously the agent would make three separate calls:

  1. Find the Pod (and it's role)
  2. Check the Pod can assume the role
  3. Get credentials for the role

This collapses calls together inside the server. This makes it easier to deliver #14, as well as potentially moving to other mechanisms for agents to request credentials (like the service account token introduced in #104.

Testing

I've also broken apart the metadata server tests and moved all tests into the package directories, the Drone build also now runs with coverage stats to make it easier to figure out what parts of the system need tests added.

End to End Testing

Although the tests are in better shape there's still things that aren't covered and would be nice to have some end-to-end tests to prove the whole- for now I've parked that and created #108 to track separately.

Review and Request for Comments

I'd like some review/thoughts on the protos in particular. For now I've added new API calls and left the existing, so that the same server could support both new (3.x) and old (2.x) agents. However, given Kubernetes makes it easy to run servers side-by-side (only the agents have a hard requirement on running one per node) another option is to remove the API calls and have agents talk to different server services.

pingles added 23 commits June 29, 2018 16:35
* created additional GetPodCredentials operation
* use Client interface in place of PodFinder and other interfaces
* remove telemetry wrappers for gateway and server; will be replaced
  with prometheus soon
* added more tests and refactored to tidy
* added coverage makefile target to track progress

Acked-by: Paul Ingles <[email protected]>
Signed-off-by: Paul Ingles <[email protected]>
* create unit test for credentials handler
* test happy path: role and credentials are found
* added test for server error when requesting credentials
* remove empty role- not possible for this handler to be called without
  a role in the vars
* add credentials handler test for retry
* role handler install
* remove old metadata server tests, replaced with handler unit tests
* add server test
Makefile Outdated
test: $(SOURCES)
go test test/unit/*_test.go
go test test/functional/*_test.go
test: $(shell find . -name '*.go')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$(SOURCES)!

@pingles
Copy link
Contributor Author

pingles commented Jul 4, 2018

@DewaldV @Sambooo @Joseph-Irving @tombooth I added you all so you could take a look.

@moofish32 I know you were also keen to see where this branch/change got to so be great to hear what you think also.

credentialsProvider sts.CredentialsProvider
clientIP clientIPFunc
policy server.AssumeRolePolicy
clientIP clientIPFunc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name clientIP is a bit misleading in a few places. We're not passing around an ip address but rather an ip resolution function. Not massively important though.

client server.Client
}

func (c *credentialsHandler) Install(router *mux.Router) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this approach to splitting up the handlers 👍

}

credentials, err := c.credentialsForRole(ctx, requestedRole)
credentials, err := c.fetchCredentials(ctx, ip, requestedRole)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no more code that can be deleted from this handler without breaking it. Good stuff.

@codecov
Copy link

codecov bot commented Jul 5, 2018

Codecov Report

❗ No coverage uploaded for pull request base (master@60c82f6). Click here to learn what that means.
The diff coverage is 61.84%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master     #109   +/-   ##
=========================================
  Coverage          ?   37.79%           
=========================================
  Files             ?       21           
  Lines             ?      799           
  Branches          ?        0           
=========================================
  Hits              ?      302           
  Misses            ?      469           
  Partials          ?       28
Impacted Files Coverage Δ
pkg/aws/metadata/server.go 5.45% <0%> (ø)
pkg/server/gateway.go 0% <0%> (ø)
pkg/server/policy.go 51.78% <100%> (ø)
pkg/k8s/pod_cache.go 54.8% <100%> (ø)
pkg/prefetch/manager.go 53.48% <100%> (ø)
pkg/aws/metadata/handler_role_name.go 90% <100%> (ø)
pkg/aws/metadata/handler_credentials.go 83.33% <100%> (ø)
pkg/server/server.go 15.55% <54.54%> (ø)

Copy link
Contributor

@samb1729 samb1729 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a fan of the new testing approach here @pingles. I haven't tried to figure out whether we can avoid supporting two api versions easily but otherwise I'm happy with these changes.

t.Error("expected json result", content)
}

expected := `{"Code":"","Type":"","AccessKeyId":"A1","SecretAccessKey":"S1","Token":"","Expiration":"","LastUpdated":""}`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably unmarshal this into a struct with these fields as encoding/json implements RFC 7159 so there's no guarantee the keys will be marshaled in this order.

t.Error("unexpected status", rr.Code)
}
if !strings.Contains(rr.Body.String(), "error fetching credentials: no pod found") {
t.Error("unexpected error", rr.Body.String())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unexpected lack of error, no?

)

// StubClient returns fake server responses
type StubClient struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 to this idea

Copy link
Contributor

@moofish32 moofish32 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Major +1 for the testing improvements and refactoring of the auth being controlled by the server and not split with the client. I provided some comments as interested party that hopefully help represent at least one outside potential customer.

@@ -90,15 +66,14 @@ func (c *credentialsHandler) Handle(ctx context.Context, w http.ResponseWriter,
return http.StatusOK, nil
}

func (c *credentialsHandler) credentialsForRole(ctx context.Context, role string) (*sts.Credentials, error) {
func (c *credentialsHandler) fetchCredentials(ctx context.Context, ip, requestedRole string) (*sts.Credentials, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a great improvement and reduces the ability of compromised client to just arbitrarily request roles.

)

func TestReturnRoleWhenClientResponds(t *testing.T) {
r, _ := http.NewRequest("GET", "/latest/meta-data/iam/security-credentials/", nil)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of these tests use /latest/meta-data/iam/security-credentials/. However, when you curl on an ec2 instance the /latest/meta-data/iam/security-credentials in at least some situations you don't get 200:

curl -v 169.254.169.254/latest/meta-data/iam/security-credentials
* About to connect() to 169.254.169.254 port 80 (#0)
*   Trying 169.254.169.254...
* Connected to 169.254.169.254 (169.254.169.254) port 80 (#0)
> GET /latest/meta-data/iam/security-credentials HTTP/1.1
> User-Agent: curl/7.29.0
> Host: 169.254.169.254
> Accept: */*
>
* HTTP 1.0, assume close after body
< HTTP/1.0 301 Moved Permanently
< Location: http://169.254.169.254/latest/meta-data/iam/security-credentials/
< Content-Length: 0
< Connection: close
< Date: Mon, 09 Jul 2018 21:12:44 GMT
< Server: EC2ws

I don't think the SDK actually uses this endpoint, but should we make this match?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you're getting a 301 rather than 200 because you didn't including the trailing slash. See this Google blog post:

don’t both return a 200 response code, but that one version redirects to the other.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think @moofish32 was pointing out that the metadata server they tested on 301'ed, rather than returning credentials. Kiam currently supports either form and serves 200s on both.

@moofish32 I seem to remember others mentioning this stuff varied between instance generations (so it changed between m4 and m5 for example).

Do you remember the instance type you tested this on?

Either way I suspect we should probably add an issue to change it so that it behaves the same way- 301'ing rather than successfully working on both?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've now added #116 to track this issue separately.

@@ -11,17 +11,16 @@
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package kiam
package future
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there something really unique about this future package? I know Capital One has this one: https://github.com/capitalone/go-future-context, I'm sure there are others. I just don't know if there is a specific reason the team wants to maintain this one. I agree it's a small amount of code right now.

@@ -107,7 +107,7 @@ func (f *forbidden) Explanation() string {
}

func (p *RequestingAnnotatedRolePolicy) IsAllowedAssumeRole(ctx context.Context, role, podIP string) (Decision, error) {
pod, err := p.pods.GetPodByIP(ctx, podIP)
pod, err := p.pods.GetPodByIP(podIP)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a point where I'm curious if we should validate the client IP (kiam client making the request) and the HostIp in the PodStatus are a match? This would prevent being able to use a compromised host to provide IAM credentials for pods that could not be scheduled on the host. Probably a future enhancement but wanted to pose the question.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's only the agent that directly receives requests from clients, after which point the ip address forwarded by the agent is used to identify the pod. Are you suggesting the agent also pass its host ip to the server in this api call?

I suppose it does make sense to check that as a pod shouldn't hit another host's agent. @pingles would supporting this be another major bump or just an extension of the current request struct with graceful fallback?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@moofish32 I like the suggestion as an additional security enhancement (I assume this means check that the pod's host matches the agent's client IP?)

If so- could you open a new issue to track it please? I'm not sure if we'd commit to adding for v3 but I don't think it'd change the API as-is (the Pod IP is passed to GetPodCredentials which would allow for additional policy to check against the PodStatus).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Sambooo -- we have both actually. The agent passes the IP of the pod and GRPC enables us to track the agent IP. The issue is tracked as #105

return nil, ErrPolicyForbidden
}

creds, err := k.credentialsProvider.CredentialsForRole(ctx, req.Role)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So another potentially breaking change will come here when we want to specify the session name by pod ID. I don't know if that is important enough to roll in here or wait until later. I think the usefulness of a constant session name is fairly low.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a separate issue open for being able to name sessions #38.

The signature for CredentialsForRole will change but the only published API is the gRPC server's GetPodCredentials which should tolerate changes to reference pods by IP/ID etc.

So for now I'd leave this and pick it up as part of #38.

@@ -181,7 +224,7 @@ func NewServer(config *Config) (*KiamServer, error) {
arnResolver,
)
server.credentialsProvider = credentialsCache
server.manager = prefetch.NewManager(credentialsCache, server.pods, server.pods)
server.manager = prefetch.NewManager(credentialsCache, server.pods)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more potential big change to consider is moving the credential cache into the pod cache. Logically if you want unique credentials per pod, then maintaining two caches is less useful. If this is potentially a major version change to implement and it's high enough priority, should we consider grouping the change here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd defer picking it up until we do #38. I don't think the change would need a major version change to implement. I'd like us to sort #38 before we release v3 though- so we can still review the design for different pod and credential caches, the way sessions are named etc.

@pingles pingles merged commit 47c4922 into master Jul 12, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants