Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to create controlplane machine, User data is limit exceeded #705

Closed
tahsinrahman opened this issue Apr 5, 2019 · 5 comments · Fixed by #710
Closed

Failed to create controlplane machine, User data is limit exceeded #705

tahsinrahman opened this issue Apr 5, 2019 · 5 comments · Fixed by #710
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now.
Milestone

Comments

@tahsinrahman
Copy link
Contributor

/kind bug

What steps did you take and what happened:

  • started minikube

  • deployed provider components(except controller statefulset)

  • deployed cluster.yaml

  • deployed controlplane machine.yaml

  • cluster.yaml

apiVersion: cluster.k8s.io/v1alpha1
kind: Cluster
metadata:
  name: aws-5
  namespace: default
spec:
  clusterNetwork:
    pods:
      cidrBlocks:
      - 192.168.0.0/16
    serviceDomain: cluster.local
    services:
      cidrBlocks:
      - 10.96.0.0/12
  providerSpec:
    value:
      apiVersion: awsprovider/v1alpha1
      kind: AWSClusterProviderSpec
      metadata:
        name: aws-5
      region: us-east-1
      caKeyPair:
        cert: <cert>
        key: <key>
      etcdCAKeyPair:
        cert: <cert>
        key: <key>
      frontProxyCAKeyPair:
        cert: <cert>
        key: <key>
      saKeyPair:
        cert: <cert>
        key: <key>
      sshKeyName: <key-name>
  • machine.yaml
apiVersion: cluster.k8s.io/v1alpha1
kind: Machine
metadata:
  labels:
    cluster.k8s.io/cluster-name: aws-5
    node-role.kubernetes.io/master: ""
    set: controlplane
  name: aws-5-master-1
  namespace: default
spec:
  providerSpec:
    value:
      apiVersion: awsprovider/v1alpha1
      kind: AWSMachineProviderSpec
      iamInstanceProfile: control-plane.cluster-api-provider-aws.sigs.k8s.io
      instanceType: m3.large
      keyName: <key-name>
  versions:
    controlPlane: v1.13.3
    kubelet: v1.13.3
  • Then I run go run cmd/manager/main.go --v=5
...
...
I0405 11:06:02.889365   18706 bastion.go:75] Reconcile bastion completed successfully
I0405 11:06:02.889407   18706 loadbalancer.go:39] Reconciling load balancers
I0405 11:06:05.052209   18706 loadbalancer.go:66] Control plane load balancer: &{Name:aws-5-apiserver DNSName:aws-5-apiserver-474485241.us-east-1.elb.amazonaws.com Scheme:internet-facing SubnetIDs:[subnet-0505022378329b34e] SecurityGroupIDs:[sg-0885327e36be47bb3] Listeners:[] HealthCheck:<nil> Attributes:{IdleTimeout:10m0s} Tags:map[]}
I0405 11:06:05.052251   18706 loadbalancer.go:68] Reconcile load balancers completed successfully
I0405 11:06:23.488878   18706 controller.go:113] Reconciling Machine "aws-5-master-1"
I0405 11:06:23.488964   18706 actuator.go:391] Checking if machine aws-5-master-1 for cluster aws-5 exists
I0405 11:06:23.494698   18706 machine_scope.go:143] Decoding ProviderConfig from Value
I0405 11:06:23.529492   18706 controller.go:222] Reconciling machine object aws-5-master-1 triggers idempotent create.
I0405 11:06:23.529544   18706 actuator.go:130] Creating machine aws-5-master-1 for cluster aws-5
I0405 11:06:23.533925   18706 machine_scope.go:143] Decoding ProviderConfig from Value
I0405 11:06:23.541774   18706 machine_scope.go:143] Decoding ProviderConfig from Value
I0405 11:06:23.542050   18706 instances.go:40] Looking for existing instance for machine "aws-5-master-1" in cluster "aws-5"
I0405 11:06:23.865077   18706 actuator.go:115] Machine "aws-5-master-1" should join the controlplane: false
I0405 11:06:24.044088   18706 instances.go:333] Attempting to create or get machine "aws-5-master-1"
I0405 11:06:24.044106   18706 instances.go:347] Looking up machine "aws-5-master-1" by tags
I0405 11:06:24.044111   18706 instances.go:40] Looking for existing instance for machine "aws-5-master-1" in cluster "aws-5"
I0405 11:06:24.393001   18706 instances.go:105] Creating a new instance for machine "aws-5-master-1"
I0405 11:06:25.669541   18706 ami.go:89] Using AMI: "ami-02b8a5477f3faca0c"
I0405 11:06:25.669804   18706 instances.go:196] Machine "aws-5-master-1" is the first controlplane machine for cluster "aws-5"
W0405 11:06:26.573686   18706 controller.go:229] Failed to create machine "aws-5-master-1": failed to create or get machine: InvalidParameterValue: User data is limited to 16384 bytes
        status code: 400, request id: 924458a9-91dc-446e-9edc-7144bf8d2ace
failed to run instance: &{  m3.large subnet-0a16b9a9e135431d4 ami-02b8a5477f3faca0c 0xc00027dc20 [sg-0885327e36be47bb3 sg-027f8e554d950dfb4] 0xc00027dc00 control-plane.cluster-api-provider-aws.sigs.k8s.io <nil> <nil> <nil> <nil> map[sigs.k8s.io/cluster-api-provider-aws/managed:true sigs.k8s.io/cluster-api-provider-aws/role:controlplane Name:aws-5-master-1 kubernetes.io/cluster/aws-5:owned]}
sigs.k8s.io/cluster-api-provider-aws/pkg/cloud/aws/services/ec2.(*Service).runInstance
        /home/tahsin/go/src/sigs.k8s.io/cluster-api-provider-aws/pkg/cloud/aws/services/ec2/instances.go:398
sigs.k8s.io/cluster-api-provider-aws/pkg/cloud/aws/services/ec2.(*Service).createInstance
        /home/tahsin/go/src/sigs.k8s.io/cluster-api-provider-aws/pkg/cloud/aws/services/ec2/instances.go:265
sigs.k8s.io/cluster-api-provider-aws/pkg/cloud/aws/services/ec2.(*Service).CreateOrGetMachine
        /home/tahsin/go/src/sigs.k8s.io/cluster-api-provider-aws/pkg/cloud/aws/services/ec2/instances.go:355
sigs.k8s.io/cluster-api-provider-aws/pkg/cloud/aws/actuators/machine.(*Actuator).Create
        /home/tahsin/go/src/sigs.k8s.io/cluster-api-provider-aws/pkg/cloud/aws/actuators/machine/actuator.go:171
sigs.k8s.io/cluster-api-provider-aws/vendor/sigs.k8s.io/cluster-api/pkg/controller/machine.(*ReconcileMachine).Reconcile
        /home/tahsin/go/src/sigs.k8s.io/cluster-api-provider-aws/vendor/sigs.k8s.io/cluster-api/pkg/controller/machine/controller.go:223
sigs.k8s.io/cluster-api-provider-aws/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
        /home/tahsin/go/src/sigs.k8s.io/cluster-api-provider-aws/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:215
sigs.k8s.io/cluster-api-provider-aws/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1
        /home/tahsin/go/src/sigs.k8s.io/cluster-api-provider-aws/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:158
sigs.k8s.io/cluster-api-provider-aws/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1
        /home/tahsin/go/src/sigs.k8s.io/cluster-api-provider-aws/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133
sigs.k8s.io/cluster-api-provider-aws/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil
        /home/tahsin/go/src/sigs.k8s.io/cluster-api-provider-aws/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134
sigs.k8s.io/cluster-api-provider-aws/vendor/k8s.io/apimachinery/pkg/util/wait.Until
        /home/tahsin/go/src/sigs.k8s.io/cluster-api-provider-aws/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88
runtime.goexit
        /usr/local/go/src/runtime/asm_amd64.s:1333

Anything else you would like to add:

Environment:

  • Cluster-api-provider-aws version: master
  • Kubernetes version: (use kubectl version):
Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.4", GitCommit:"c27b913fddd1a6c480c229191a087698aa92f0b1", GitTreeState:"clean", BuildDate:"2019-02-28T13:30:26Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}```
  • OS (e.g. from /etc/os-release):
VERSION="18.04.2 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.2 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic
@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Apr 5, 2019
@randomvariable
Copy link
Member

Makes sense. We'll need to gzip the userdata.

/priority critical-urgent
/milestone Next

@k8s-ci-robot k8s-ci-robot added this to the Next milestone Apr 5, 2019
@k8s-ci-robot k8s-ci-robot added the priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. label Apr 5, 2019
@chuckha
Copy link
Contributor

chuckha commented Apr 8, 2019

I don't know if gzip is the right solution here, but I'm happy to hear why it might be. My thoughts are that if we've already hit a limit on user data size then we are probably doing something in a way we shouldn't be doing it.

I think the problem could also be solved by something like uploading certs to S3 and sending their location with the user-data, but not sending the certs directly. Or if AWS has a built-in certs thing, maybe we could use that to generated and distribute certs.

gzip feels like it's fixing a symptom and not the problem.

@detiber
Copy link
Member

detiber commented Apr 8, 2019

I think longer term we definitely want to leverage some better mechanism for distributing the secrets, but I don't necessarily think we should wait for that to resolve the issue that is being seen here.

@vincepri
Copy link
Member

vincepri commented Apr 9, 2019

/lifecycle active

@k8s-ci-robot k8s-ci-robot added the lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. label Apr 9, 2019
@vincepri vincepri mentioned this issue Apr 9, 2019
@randomvariable
Copy link
Member

gzip feels like it's fixing a symptom and not the problem.

Yes, you're right, but this doesn't preclude us from utilising gzip by default, which depending on how we allow other customisations into cloud-init, we are likely to going to need at some point.

The certs may still cause issues in some circumstances due to the lack of compressibility :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants