Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] [DNR] ✨ Running e2e tests in AWS / GCP #8135

Closed
wants to merge 5 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
126 changes: 126 additions & 0 deletions hack/boskos.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
#!/usr/bin/env python3

# Copyright 2021 The Kubernetes Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import argparse
import json
import os

import requests
import time

BOSKOS_HOST = os.environ.get("BOSKOS_HOST", "boskos")
BOSKOS_RESOURCE_NAME = os.environ.get('BOSKOS_RESOURCE_NAME')


def checkout_account_request(resource_type, user, input_state):
url = f'http://{BOSKOS_HOST}/acquire?type={resource_type}&state={input_state}&dest=busy&owner={user}'
r = requests.post(url)
status = r.status_code
reason = r.reason
result = ""

if status == 200:
content = r.content.decode()
result = json.loads(content)

return status, reason, result


def checkout_account(resource_type, user):
status, reason, result = checkout_account_request(resource_type, user, "clean")
# TODO(sbueringer): find out if we still need this
# replicated the implementation of cluster-api-provider-gcp
# we're working around an issue with the data in boskos.
# We'll remove the code that tries both free and clean once all the data is good.
# Afterwards we should just check for free
if status == 404:
status, reason, result = checkout_account_request(resource_type, user, "free")

if status != 200:
raise Exception(f"Got invalid response {status}: {reason}")

print(f"export BOSKOS_RESOURCE_NAME={result['name']}")
print(f"export GCP_PROJECT={result['name']}")


def release_account(user):
url = f'http://{BOSKOS_HOST}/release?name={BOSKOS_RESOURCE_NAME}&dest=dirty&owner={user}'

r = requests.post(url)

if r.status_code != 200:
raise Exception(f"Got invalid response {r.status_code}: {r.reason}")


def send_heartbeat(user):
url = f'http://{BOSKOS_HOST}/update?name={BOSKOS_RESOURCE_NAME}&state=busy&owner={user}'

while True:
print(f"POST-ing heartbeat for resource {BOSKOS_RESOURCE_NAME} to {BOSKOS_HOST}")
r = requests.post(url)

if r.status_code == 200:
print(f"response status: {r.status_code}")
else:
print(f"Got invalid response {r.status_code}: {r.reason}")

time.sleep(60)


def main():
parser = argparse.ArgumentParser(description='Boskos GCP Account Management')

parser.add_argument(
'--get', dest='checkout_account', action="store_true",
help='Checkout a Boskos GCP Account'
)

parser.add_argument(
'--release', dest='release_account', action="store_true",
help='Release a Boskos GCP Account'
)

parser.add_argument(
'--heartbeat', dest='send_heartbeat', action="store_true",
help='Send heartbeat for the checked out a Boskos GCP Account'
)

parser.add_argument(
'--resource-type', dest="resource_type", type=str,
default="gce-project",
help="Type of Boskos resource to manage"
)

parser.add_argument(
'--user', dest="user", type=str,
default="cluster-api",
help="username"
)

args = parser.parse_args()

if args.checkout_account:
checkout_account(args.resource_type, args.user)

elif args.release_account:
release_account(args.user)

elif args.send_heartbeat:
send_heartbeat(args.user)


if __name__ == "__main__":
main()
162 changes: 162 additions & 0 deletions hack/remote/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,162 @@

# TODOs:

* Test on MacOS

* go over all files in the diff, finalize + FIXME /TODOs

* Get it to work on Prow
* Test & fixup GCP script

Backlog:
* Optimize scripting / automation
* Implement in Go?
* MacOS: Debug why it crashes the local Docker Desktop
* try without IPv6: docker network create -d=bridge -o com.docker.network.bridge.enable_ip_masquerade=true -o com.docker.network.driver.mtu=1500 --subnet=172.24.4.0/24 --gateway=172.24.4.1 kind
* => re-add IPv6

# Setting up a Docker engine on AWS:

Prerequisites:
* AWS CLI must be installed & configured with credentials

Setup server on AWS with Docker engine:
```bash
./hack/remote/setup-docker-on-aws-account.sh
```

Note: The script can also be run repeatedly, e.g. to create the ssh tunnel when the server already exists.

# Use remote Docker engine

## Docker CLI

```bash
export DOCKER_HOST=tcp://10.0.3.15:2375
docker version
docker info
```

## Local management cluster

### e2e tests via IDE

Prerequisites:
```bash
make generate-e2e-templates
make docker-build-e2e
```

Run configuration:
* Add to environment: `CAPD_DOCKER_HOST=tcp://10.0.3.15:2375`

### Tilt

tilt-settings.yaml:
```yaml
kustomize_substitutions:
# Use remote Docker host in CAPD.
CAPD_DOCKER_HOST: "tcp://10.0.3.15:2375"
```

```bash
tilt up
```

### Quickstart

```bash
export CAPD_DOCKER_HOST="tcp://10.0.3.15:2375"
```

## Remote management cluster

Create remote kind cluster:
```bash
# SSH to server
ssh-add ~/.ssh/aws-capi-docker
ssh cloud@${SERVER_PUBLIC_IP}
sudo su

# Note: this has to be run on the server.
# Running it locally will fails because 10.0.3.15 is not a valid IP there.
kind create cluster --name=capi-test --config=${HOME}/kind.yaml
```

### e2e tests via IDE

Prerequisites:
```bash
make generate-e2e-templates

# If local images are required (e.g. because code has changed)
export DOCKER_HOST=tcp://10.0.3.15:2375
make docker-build-e2e
kind load docker-image --name=capi-test gcr.io/k8s-staging-cluster-api/cluster-api-controller-amd64:dev
kind load docker-image --name=capi-test gcr.io/k8s-staging-cluster-api/kubeadm-bootstrap-controller-amd64:dev
kind load docker-image --name=capi-test gcr.io/k8s-staging-cluster-api/kubeadm-control-plane-controller-amd64:dev
kind load docker-image --name=capi-test gcr.io/k8s-staging-cluster-api/capd-manager-amd64:dev
kind load docker-image --name=capi-test gcr.io/k8s-staging-cluster-api/test-extension-amd64:dev
```

Run configuration:
* Add to environment: `DOCKER_HOST=tcp://10.0.3.15:2375;CAPD_DOCKER_HOST=tcp://10.0.3.15:2375`
* Add to program arguments: `-e2e.use-existing-cluster=true`

### Tilt

tilt-settings.yaml:
```yaml
kustomize_substitutions:
# Use remote Docker host in CAPD.
CAPD_DOCKER_HOST: "tcp://10.0.3.15:2375"
```

```bash
export DOCKER_HOST=tcp://10.0.3.15:2375
tilt up
```

FIXME(sbueringer): enable local registry
* let's check if it is faster (as redeploy also just copies the binary over)
* copy&paste kind-install-for-capd.sh script over(?) (already done => just test it)
* ensure registry is reachable from local machine

## Getting access to workload clusters

Retrieve kubeconfig for workload clusters via:
```bash
clusterctl get kubeconfig capi-quickstart > /tmp/kubeconfig
kubectl --kubeconfig /tmp/kubeconfig get no,po -A -A
```
Note: The kubeconfigs returned by `kind get kubeconfig` don't work.

# Troubleshooting

Verify connectivity:

```bash
# SSH to server
ssh-add ~/.ssh/aws-capi-docker
ssh cloud@${SERVER_PUBLIC_IP}

# On the server:
nc -l 10.0.3.15 8005

# Locally:
nc 10.0.3.15 8005
```

# Tested scenarios

* Local mgmt cluster:
* Tilt:
* works well
* e2e tests (via Intellij):
* works well
* Remote mgmt cluster:
* Tilt:
* loading images via kind load is slow
* e2e tests (via Intellij):
* building e2e images with make is quick
* loading images with kind load is slow
Loading