Skip to content
This repository has been archived by the owner on Oct 12, 2023. It is now read-only.

Trying to use Pod Identity fails a number of times, before finally succeeding #181

Closed
woutervanvliet opened this issue Apr 1, 2019 · 6 comments
Assignees

Comments

@woutervanvliet
Copy link

woutervanvliet commented Apr 1, 2019

Intro: I'm trying to get Azure Pod Identity to work in our cluster, and am mostly succeeding (so far so good). For the time being, we have two keyvaults, two AzureIdentity's, two AzureIdentityBinding's and two Pods using each their keyvault.

While testing, both pods are equal - only difference being their aadpodidbinding and an environment variable indicating what keyvault to use. At startup, the pod connects to the KeyVault, reads two values and prints them with Console.WriteLine. If the connection fails, the pod will crash and k8s will restart it.

The problem: One pod might startup being able to read from the keyvault immediately, while the other will crash and restart for - what seems to be - rather consistently 5 times before being able to get an access token.

When it fails, the following Exception is thrown:

Unhandled Exception: Microsoft.Azure.Services.AppAuthentication.AzureServiceTokenProviderException: Parameters: Connection String: [No connection string specified], Resource: https://vault.azure.net, Authority: https://login.windows.net/******************. Exception Message: Tried the following 3 methods to get an access token, but none of them worked.
Parameters: Connection String: [No connection string specified], Resource: https://vault.azure.net, Authority: https://login.windows.net/******************. Exception Message: Tried to get token using Managed Service Identity. Access token could not be acquired. MSI ResponseCode: Forbidden, Response: no AzureAssignedIdentity found for pod:default/kv-test-be

Parameters: Connection String: [No connection string specified], Resource: https://vault.azure.net, Authority: https://login.windows.net/******************. Exception Message: Tried to get token using Visual Studio. Access token could not be acquired. Environment variable LOCALAPPDATA not set.
Parameters: Connection String: [No connection string specified], Resource: https://vault.azure.net, Authority: https://login.windows.net/******************. Exception Message: Tried to get token using Azure CLI. Access token could not be acquired. No such file or directory

   at Microsoft.Azure.Services.AppAuthentication.AzureServiceTokenProvider.GetAuthResultAsyncImpl(String authority, String resource, String scope)
   at Microsoft.Azure.Services.AppAuthentication.AzureServiceTokenProvider.<get_KeyVaultTokenCallback>b__8_0(String authority, String resource, String scope)
   at Microsoft.Azure.KeyVault.KeyVaultCredential.PostAuthenticate(HttpResponseMessage response)
   at Microsoft.Azure.KeyVault.KeyVaultCredential.ProcessHttpRequestAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at Microsoft.Azure.KeyVault.KeyVaultClient.GetSecretsWithHttpMessagesAsync(String vaultBaseUrl, Nullable`1 maxresults, Dictionary`2 customHeaders, CancellationToken cancellationToken)
   at Microsoft.Azure.KeyVault.KeyVaultClientExtensions.GetSecretsAsync(IKeyVaultClient operations, String vaultBaseUrl, Nullable`1 maxresults, CancellationToken cancellationToken)
   at Microsoft.Extensions.Configuration.AzureKeyVault.AzureKeyVaultConfigurationProvider.LoadAsync()
   at Microsoft.Extensions.Configuration.AzureKeyVault.AzureKeyVaultConfigurationProvider.Load()
   at Microsoft.Extensions.Configuration.ConfigurationRoot..ctor(IList`1 providers)
   at Microsoft.Extensions.Configuration.ConfigurationBuilder.Build()
   at KeyvaultTest.Program.Main(String[] args) in /app/src/Program.cs:line 16

The behaviour is similar when using FlexVolume (which eventually one group of our pods will use in production), but I find it easier to relate to the error with two equal pods.

While waiting for the pod to succeed, I'm seeing both "binding removed" and "binding applied" messages in mic's log.

My questions:

  • Is this behaviour "as intendend" and perhaps documented somewhere?
  • Is there a setting I can apply to make the "remove - apply" cycle faster?
  • Is there anything else that can be done to improve the time between pod creation and the identity binding being applied? Is this issue perhaps related to Identity assignment performance issues resulting in slow scale out #145

Sourcecode:
Program.cs

using System;
using System.IO;
using System.Threading;
using Microsoft.AspNetCore;
using Microsoft.AspNetCore.Hosting;
using Microsoft.Extensions.Configuration;

namespace KeyvaultTest
{
    class Program
    {
        static void Main(string[] args)
        {
            Console.WriteLine("Starting Keyvault read");

            var configuration = new ConfigurationBuilder()
                .AddAzureKeyVault()
                .Build();

            var test1 = configuration.GetValue<string>("jtest");
            Console.WriteLine(test1);
            var test2 = configuration.GetValue<string>("jtest:jtest");

            Console.WriteLine(test2);
            Console.WriteLine("Finished Keyvault read");
        }
    }
}

KeyVaultConfiguration.cs.cs

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Text;
using System.Threading;
using Microsoft.AspNetCore.Hosting;
using Microsoft.Azure.KeyVault;
using Microsoft.Azure.Services.AppAuthentication;
using Microsoft.Extensions.Configuration;
using Microsoft.Extensions.Configuration.AzureKeyVault;

namespace KeyvaultTest
{
    public static class KeyVaultConfiguration
    {
        public static IConfigurationBuilder AddAzureKeyVault(this IConfigurationBuilder builder)
        {
            var builtConfig = builder.Build();
            var keyVaultName = Environment.GetEnvironmentVariable("KV_NAME");

            if (string.IsNullOrWhiteSpace(keyVaultName))
            {
                throw new Exception("KV_NAME is not defined");
            }

            Console.WriteLine($"Using KV_NAME = {keyVaultName}");

            var azureServiceTokenProvider = new AzureServiceTokenProvider();
            var keyVaultClient = new KeyVaultClient(
                new KeyVaultClient.AuthenticationCallback(
                    azureServiceTokenProvider.KeyVaultTokenCallback));

            builder.AddAzureKeyVault(
                $"https://{keyVaultName}.vault.azure.net/",
                keyVaultClient,
                new DefaultKeyVaultSecretManager());

            return builder;
        }
    }
}

Any help, hints or ideas are much appreciated.

Note: I've asked this same question on Stack Overflow https://stackoverflow.com/questions/55451111/trying-to-use-azure-pod-identity-fails-a-number-of-times-before-finally-succeed

@kkmsft
Copy link
Contributor

kkmsft commented Apr 9, 2019

Can you please confirm that you are using the latest release: https://github.com/Azure/aad-pod-identity/releases/tag/1.3.0-mic-1.4.0-nmi ?

@kkmsft
Copy link
Contributor

kkmsft commented Apr 15, 2019

@woutervanvliet - there were bug fixes which has gone in the latest release - https://github.com/Azure/aad-pod-identity/releases/tag/1.3.0-mic-1.4.0-nmi which would avoid a condition where identities would get deleted and then re added again even thought they are not required to be deleted. Hence want to first confirm that you are using latest to further debug this.

@woutervanvliet
Copy link
Author

@kkmsft Sorry for the delay - been off from work.

Yes, it appears like I'm using the latest version. My mic pod looks like this

Name:               mic-774f7ccc4-rj2n9
Namespace:          default
Priority:           0
PriorityClassName:  <none>
Node:               aks-nodepool1-30679781-0/10.10.3.35
Start Time:         Fri, 29 Mar 2019 15:18:11 +0100
Labels:             component=mic
                    pod-template-hash=774f7ccc4
Annotations:        <none>
Status:             Running
IP:                 10.10.3.49
Controlled By:      ReplicaSet/mic-774f7ccc4
Containers:
  mic:
    Container ID:  docker://d2d737b743f186370bd1881c8c1c2ac24b537d18f94adf84059bb8267410a7cf
    Image:         mcr.microsoft.com/k8s/aad-pod-identity/mic:1.3
    Image ID:      docker-pullable://mcr.microsoft.com/k8s/aad-pod-identity/mic@sha256:51b97f8c16fcfbdb66db8716c3047640c13e015fd34f040178932607945ec3b5
    Port:          <none>
    Host Port:     <none>
    Args:
      mic
      --cloudconfig=/etc/kubernetes/azure.json
      --logtostderr
    State:          Running
      Started:      Fri, 29 Mar 2019 15:18:19 +0100
    Ready:          True
    Restart Count:  0
    Environment:

And nmi

Name:               nmi-9t6d2
Namespace:          default
Priority:           0
PriorityClassName:  <none>
Node:               aks-nodepool1-30679781-0/10.10.3.35
Start Time:         Fri, 29 Mar 2019 15:18:11 +0100
Labels:             component=nmi
                    controller-revision-hash=c9cc58f97
                    pod-template-generation=1
                    tier=node
Annotations:        <none>
Status:             Running
IP:                 10.10.3.35
Controlled By:      DaemonSet/nmi
Containers:
  nmi:
    Container ID:  docker://20a8ea3aea3c9ee922f6f4ce056749db1c80cab2b0fa0ae6ac0c8c420234de14
    Image:         mcr.microsoft.com/k8s/aad-pod-identity/nmi:1.4
    Image ID:      docker-pullable://mcr.microsoft.com/k8s/aad-pod-identity/nmi@sha256:1fa2ea67967f64772cfb8ae4b90b54f1fa885fd0784c7743b9978ac4ec1010b8
    Port:          <none>
    Host Port:     <none>
    Args:
      nmi
      --host-ip=$(HOST_IP)
      --node=$(NODE_NAME)
    State:          Running
      Started:      Fri, 29 Mar 2019 15:18:16 +0100
    Ready:          True
    Restart Count:  0

@khenidak
Copy link
Contributor

khenidak commented Apr 22, 2019

There is a bit of an avoidable race condition where if you created your identities, binding, pods at the same time mic will be churning through the assignment but the pod actually started and nmi got no assignment for it. This situation is not avoidable. not without adding initializers which will over complicate the entire thing. This situation should not occur every time a pod restart except if the node that runs mic is over subscribed/utilized. One of the things we are thinking of doing is wrapping this in a small retry loop https://github.com/Azure/aad-pod-identity/blob/master/pkg/nmi/server/server.go#L154

I will add it to our next release work.

@keikumata
Copy link

I am hitting this issue as well where the pod restarts 5~6 times before successfully getting an access token. I see that the race condition is unavoidable and that there has been perf improvements - how can I use the latest code to test this out? We are currently using https://github.com/Azure/aad-pod-identity/releases/tag/1.3.0-mic-1.4.0-nmi

@kkmsft
Copy link
Contributor

kkmsft commented Aug 13, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants