Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor: reliability, retries, reduced k8s apiserver load #14

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
63 commits
Select commit Hold shift + click to select a range
64d0054
Fix logs duplication
artem-zinnatullin May 21, 2020
df92276
Run kubectl logs --follow in a loop, it actually fails sometimes…
artem-zinnatullin May 21, 2020
2142fe6
Redirect err to /dev/null, add error example to comment
artem-zinnatullin May 21, 2020
bb3fdf9
improve comments
artem-zinnatullin May 21, 2020
e628984
Add timeout to the check
artem-zinnatullin May 22, 2020
7cf0ad5
do not tail
artem-zinnatullin May 22, 2020
e86fe72
Add check for log snapshot content
artem-zinnatullin May 23, 2020
1a8f2e6
Add debug
artem-zinnatullin May 23, 2020
a7e7b62
fix contents check
artem-zinnatullin May 23, 2020
391faaa
print debug
artem-zinnatullin May 23, 2020
27af568
print more debug
artem-zinnatullin May 23, 2020
b55eaf8
print more debug
artem-zinnatullin May 23, 2020
2f581ba
Remove timeout
artem-zinnatullin May 26, 2020
df7444b
add set -x, dont use file
artem-zinnatullin May 26, 2020
54583a9
set +e
artem-zinnatullin May 26, 2020
81e99da
Remove -xg
artem-zinnatullin May 26, 2020
75cbe9c
Add race fix
artem-zinnatullin May 26, 2020
97ae246
Remove empty print, move out local var
artem-zinnatullin May 26, 2020
724410e
Increase log loop interval to avoid k8s API endpoint overload, allow …
artem-zinnatullin Jun 2, 2020
c4735dc
Retry all kubectl commands
artem-zinnatullin Jun 2, 2020
5f03990
Add retries to pre-exit
artem-zinnatullin Jun 2, 2020
0ba78db
Add ttlSecondsAfterFinished of 1 day, same as cleanup
artem-zinnatullin Jun 2, 2020
d63009d
rely only on ttlSecondsAfterFinished
artem-zinnatullin Jun 2, 2020
7328036
Reduce TTL to ease load on apiserver
artem-zinnatullin Jun 2, 2020
5f9f582
print env vars in pre-exit, catch exit code
artem-zinnatullin Jun 3, 2020
8fe3ce5
also print cmd args in pre-exit
artem-zinnatullin Jun 3, 2020
d6b7b52
Skip job cleanup if command successful, saves 3 kubectl calls
artem-zinnatullin Jun 3, 2020
df4880b
Reduce ttl to 600
artem-zinnatullin Jun 3, 2020
2104237
Add job ttl property with old value as default
artem-zinnatullin Jun 3, 2020
3bc537a
Parse ttl as int
artem-zinnatullin Jun 3, 2020
b0ef161
debug BUILDKITE_ENV_FILE encoding
artem-zinnatullin Jun 3, 2020
e28e51a
Strip " from env vars
artem-zinnatullin Jun 3, 2020
0ea00ea
Add comment, remove print
artem-zinnatullin Jun 3, 2020
e1b42c2
Remove quote handling for now
artem-zinnatullin Jun 4, 2020
7f0ba66
Limit label values to 63 as per k8s spec
artem-zinnatullin Jun 5, 2020
1b34a6d
Fix default TTL parseInt
artem-zinnatullin Jun 5, 2020
6493b5d
Remove build/message from annotation due to k8s annotations length limit
artem-zinnatullin Jul 6, 2020
b494d88
Add BUILDKITE_BUILD_ID to allowedEnvs
artem-zinnatullin Jul 21, 2020
9c8e6c4
Add more variables to allow overrides
artem-zinnatullin Jul 21, 2020
154765b
Add comment about job cleanup
artem-zinnatullin Jul 21, 2020
a08ee4c
Dont write jobspec to a file
artem-zinnatullin Jul 22, 2020
0873b04
Add debug echo for job spec
artem-zinnatullin Jul 23, 2020
98c92c8
Fix var name
artem-zinnatullin Jul 23, 2020
8dc6865
Remove Job TTL from plugin.yaml for now
artem-zinnatullin Jul 23, 2020
259cd3b
WIP propagate exit code
artem-zinnatullin Sep 15, 2020
6519220
Wrap in loop
artem-zinnatullin Sep 15, 2020
32b9e28
Inline log_loop_snapshot_limit_bytes
artem-zinnatullin Sep 30, 2020
e8e9cfc
Use one convention for ENV var names
artem-zinnatullin Sep 30, 2020
8603721
Add docs for new env variables
artem-zinnatullin Oct 1, 2020
dffecae
Return old jobs cleanup back
artem-zinnatullin Oct 1, 2020
5daa8a1
Check job_apply_exit_code
artem-zinnatullin Oct 1, 2020
941e86b
Dont sleep if jobstatus has been resolved
artem-zinnatullin Oct 1, 2020
2085c88
Cleanup finished jobs by default
artem-zinnatullin Oct 1, 2020
1cef288
Document BUILDKITE_PLUGIN_K8S_JOB_TTL_SECONDS_AFTER_FINISHED
artem-zinnatullin Oct 1, 2020
f89209b
Move some of the properties to YAML
artem-zinnatullin Oct 2, 2020
071bb47
Add jobs-cleanup-via-plugin and job-cleanup-after-finished-via-plugin…
artem-zinnatullin Oct 6, 2020
b5a6969
Unify retry interval and timeout env var names
artem-zinnatullin Oct 6, 2020
3d1589a
Update README with new env var names
artem-zinnatullin Oct 7, 2020
f46dac5
Use init container exit code if it failed
artem-zinnatullin Oct 9, 2020
d948c35
Add warning about init container failure
artem-zinnatullin Oct 9, 2020
cb2f53e
Print bootstrap container logs
artem-zinnatullin Oct 9, 2020
09dd224
fix func name
artem-zinnatullin Oct 9, 2020
f7a867d
Apply suggestions from code review
artem-zinnatullin Oct 13, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
.idea/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not include a gitignore in the PR since its possible to have git exclude these files locally https://stackoverflow.com/questions/1753070/how-do-i-configure-git-to-ignore-some-files-locally

86 changes: 86 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -239,6 +239,92 @@ patch: |
}
```

### `print-resulting-job-spec` (optional, boolean)

If set to `true`, the resulting k8s job spec is printed to the log. This can be useful when debugging.

### `job-ttl-seconds-after-finished` (optional, integer)

Configures [`spec.ttlSecondsAfterFinished`](https://kubernetes.io/docs/concepts/workloads/controllers/ttlafterfinished/) on the k8s job, requires TTL Controller enabled in the cluster, otherwise ignored.
Default value: `86400`.

### `jobs-cleanup-via-plugin` (optional, boolean)
artem-zinnatullin marked this conversation as resolved.
Show resolved Hide resolved

If set to `true`, the plugin cleans up k8s jobs older than 1 day even if they're still running.
Default value: `true`.

If you have [TTL Controller](https://kubernetes.io/docs/concepts/workloads/controllers/ttlafterfinished/) enabled or some other means to cleanup finished jobs, it is recommended to set the value to `false` in order to reduce load on k8s api servers.

### `job-cleanup-after-finished-via-plugin` (optional, boolean)

If set to `true` plugin cleans up finished k8s job.
Default value: `true`.

If you have TTL controller or https://github.com/lwolf/kube-cleanup-operator running, it is highly recommended to set the value to `false` to reduce load on k8s api servers.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar suggestion to the job-cleanup-via-plugin option


## Low Level Configuration via Environment Variables

Some of the plugin options can be configured via environment variables as following ([also see Buildkite docs](https://buildkite.com/docs/pipelines/environment-variables#defining-your-own)):

```yaml
env:
BUILDKITE_PLUGIN_K8S_JOB_APPLY_RETRY_INTERVAL_SEC: "10"
```

### BUILDKITE_PLUGIN_K8S_JOB_APPLY_RETRY_INTERVAL_SEC

- Configures the interval between attempts to schedule the k8s job
- Default: `5`
- Unit type: integer seconds

### BUILDKITE_PLUGIN_K8S_JOB_APPLY_TIMEOUT_SEC

- Configures the total time limit across attempts to schedule the k8s job
- Default: `120`
- Unit type: integer seconds

### BUILDKITE_PLUGIN_K8S_JOB_STATUS_RETRY_INTERVAL_SEC

- Configures the interval between attempts to get k8s job status
- Default: `5`
- Unit type: integer seconds

### BUILDKITE_PLUGIN_K8S_LOG_COMPLETE_RETRY_INTERVAL_SEC

- Configures the interval between attempts to verify that log streaming has ended
- Default: `1`
- Unit type: integer seconds

### BUILDKITE_PLUGIN_K8S_LOG_COMPLETE_TIMEOUT_SEC

- Configures the total time limit across attempts to verify that log streaming has ended
- Default: `30`
- Unit type: integer seconds

### BUILDKITE_PLUGIN_K8S_LOG_RETRY_INTERVAL_SEC

- Configures the interval between attempts to stream job logs
- Default: `3`
- Unit type: integer seconds

### BUILDKITE_PLUGIN_K8S_LOG_ATTEMPT_TIMEOUT_SEC

- Configures time limit for a _single_ plugin attempt to stream job logs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can skip this option and hardcode the default value in the script since its even lower level but also doesn't seem very useful to change? Also it'll be one less option to configure ^^

- Default: `5`
- Unit type: integer seconds

### BUILDKITE_PLUGIN_K8S_JOB_CLEANUP_RETRY_INTERVAL_SEC

- Configures the interval between attempts to cleanup finished jobs
- Default: `5`
- Unit type: integer seconds

### BUILDKITE_PLUGIN_K8S_JOB_CLEANUP_TIMEOUT_SEC

- Configures the total time limit across attempts to cleanup finished jobs
- Default: `60`
- Unit type: integer seconds

## Contributing

We welcome community contributions to this project.
Expand Down
177 changes: 154 additions & 23 deletions hooks/command
Original file line number Diff line number Diff line change
Expand Up @@ -13,64 +13,195 @@ fi
((timeout=BUILDKITE_TIMEOUT*60))
export BUILDKITE_TIMEOUT

# Default values can be overridden by setting "BUILDKITE_PLUGIN_K8S_*" env vars as used below.
readonly job_apply_retry_interval_sec="${BUILDKITE_PLUGIN_K8S_JOB_APPLY_RETRY_INTERVAL_SEC:-5}"
readonly job_apply_timeout_sec="${BUILDKITE_PLUGIN_K8S_JOB_APPLY_TIMEOUT_SEC:-120}"
readonly log_loop_retry_interval_sec="${BUILDKITE_PLUGIN_K8S_LOG_RETRY_INTERVAL_SEC:-3}"
readonly log_attempt_timeout_sec="${BUILDKITE_PLUGIN_K8S_LOG_ATTEMPT_TIMEOUT_SEC:-5}"
readonly job_status_retry_interval_sec="${BUILDKITE_PLUGIN_K8S_JOB_STATUS_RETRY_INTERVAL_SEC:-5}"
readonly log_complete_retry_interval_sec="${BUILDKITE_PLUGIN_K8S_LOG_COMPLETE_RETRY_INTERVAL_SEC:-1}"
readonly log_complete_timeout_sec="${BUILDKITE_PLUGIN_K8S_LOG_COMPLETE_TIMEOUT_SEC:-30}"
readonly use_agent_node_affinity=${BUILDKITE_PLUGIN_K8S_USE_AGENT_NODE_AFFINITY:-false}
readonly print_resulting_job_spec=${BUILDKITE_PLUGIN_K8S_PRINT_RESULTING_JOB_SPEC:-false}
artem-zinnatullin marked this conversation as resolved.
Show resolved Hide resolved
readonly jobs_cleanup_via_plugin=${BUILDKITE_PLUGIN_K8S_JOBS_CLEANUP_VIA_PLUGIN:-}

readonly bootstrap_container_log_complete_marker_file="$(mktemp)"
readonly step_container_log_complete_marker_file="$(mktemp)"

function cleanup {
# Delete all jobs older than a day
kubectl delete job "$(kubectl get job -l buildkite/plugin=k8s | awk 'match($4,/[0-9]+d/) {print $1}')" 2>/dev/null || true
artem-zinnatullin marked this conversation as resolved.
Show resolved Hide resolved
rm -f "$bootstrap_container_log_complete_marker_file" "$step_container_log_complete_marker_file"

if [[ "$jobs_cleanup_via_plugin" == "true" ]]; then
# Delete all jobs older than a day.
kubectl delete job "$(kubectl get job -l buildkite/plugin=k8s | awk 'match($4,/[0-9]+d/) {print $1}')" 2>/dev/null || true
fi
}
trap cleanup EXIT

trap cleanup EXIT

function tail_logs {
while true ; do
stdbuf -o0 -e0 kubectl logs -f "job/${job_name}" 2>>/dev/null || true
sleep 0.2
local -r pod_name="$1"
local -r container_name="$2"
local -r log_completion_marker_file="$3"

local log_snapshot=""
# Once logs are not empty we start attempting to stream them.
# Keep looping otherwise since empty output means that there is no useful log to display so we're not losing information by looping.
while true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we include some mechanism to exit if we are not able to successfully verify logs so that we don't loop forever otherwise?
I think we can hardcode how long we wait in this case since the pod should've been running by then so that e.g if we're 1min in and still failing then something else is probably wrong and we can bail out?

do
set +e
log_snapshot="$(timeout "$log_attempt_timeout_sec" kubectl logs --limit-bytes "1024" "pod/$pod_name" --container "$container_name" 2>/dev/null)"
set -e
if [[ -n "$log_snapshot" ]]; then
break
fi
sleep "$log_loop_retry_interval_sec"
done

# Run kubectl logs --follow in a loop since it can fail:
# 1) It can fail due to pod not being initialized yet: "Error from server (BadRequest): container "step" in pod "somepod" is waiting to start: PodInitializing"
# 2) It can fail mid-streaming, in this case we unfortunately will display logs multiple times (partially).
# 3) It can hang not providing any result, that's why we check not only exit code but also contents in the loop above.
while ! kubectl logs --follow "pod/$pod_name" --container "$container_name" 2>/dev/null;
do
sleep "$log_loop_retry_interval_sec"
done

echo "0" > "$log_completion_marker_file"
}

echo "--- :kubernetes: Starting Kubernetes Job"

export patchFunc=${BUILDKITE_PLUGIN_K8S_PATCH:-"function(f) f"}

jobspec=$(jsonnet \
job_spec="$(jsonnet \
--tla-str "jobName=${job_name}" \
--tla-str-file "stepEnvFile=${BUILDKITE_ENV_FILE}" \
--tla-code "agentEnv=$(jq -c -n env)" \
--tla-code patchFunc \
"${basedir}/lib/job.jsonnet")
"${basedir}/lib/job.jsonnet")"

export BUILDKITE_PLUGIN_K8S_USE_AGENT_NODE_AFFINITY=${BUILDKITE_PLUGIN_K8S_USE_AGENT_NODE_AFFINITY:-'false'}
if [[ "$BUILDKITE_PLUGIN_K8S_USE_AGENT_NODE_AFFINITY" == 'true' ]]; then
if [[ "$use_agent_node_affinity" == "true" ]]; then
for field in affinity tolerations; do
buildkite_agent_value=$(kubectl get pod "$(cat /etc/hostname)" -o json | jq ".spec.$field")
jobspec=$(echo "$jobspec" | jq ".spec.template.spec.$field=$buildkite_agent_value")
buildkite_agent_value="$(kubectl get pod "$(cat /etc/hostname)" -o json | jq ".spec.$field")"
job_spec="$(echo "$job_spec" | jq ".spec.template.spec.$field=$buildkite_agent_value")"
done
fi

echo "$jobspec" | kubectl apply -f -
if [[ "$print_resulting_job_spec" == "true" ]]; then
echo -e "Resulting k8s job spec:\n$job_spec"
fi

readonly job_apply_start_time="$SECONDS"
job_apply_exit_code=""

while [[ "$((SECONDS - job_apply_start_time))" -lt "$job_apply_timeout_sec" ]]
do
set +e
echo "$job_spec" | kubectl apply -f -
job_apply_exit_code="$?"
set -e

if [[ "$job_apply_exit_code" == "0" ]]; then
break
else
echo "Attempt to apply the job failed, exit code '$job_apply_exit_code'"
sleep "$job_apply_retry_interval_sec"
fi
done

if [[ "$job_apply_exit_code" != "0" ]]; then
echo "Failed to apply job, exit code '$job_apply_exit_code'"
exit "$job_apply_exit_code"
fi

echo "Timeout: ${timeout}s"

echo "+++ :kubernetes: Running image: ${BUILDKITE_PLUGIN_K8S_IMAGE}"
echo "--- :kubernetes: Running image: ${BUILDKITE_PLUGIN_K8S_IMAGE}"

counter="$timeout"
while true
do
set +e
pod_name=$(kubectl get pod -l "job-name=$job_name" --output=jsonpath="{.items[*].metadata.name}")
set -e

if [[ -n "$pod_name" ]]; then
break
else
sleep "$job_status_retry_interval_sec"
if [[ $timeout -gt 0 ]]; then
(( counter -= job_status_retry_interval_sec ))
fi
fi
done

echo "Pod is running: $pod_name"

echo "--- :kubernetes: bootstrap container"
tail_logs "$pod_name" "bootstrap" "$bootstrap_container_log_complete_marker_file"

tail_logs &
echo "+++ :kubernetes: step container"
tail_logs "$pod_name" "step" "$step_container_log_complete_marker_file" &

sleeper=2
counter=${timeout}
counter="$timeout"
jobstatus=""
while [[ -z "$jobstatus" ]] ; do
set +e
jobstatus=$(kubectl get job "${job_name}" -o 'jsonpath={.status.conditions[].type}')
sleep $sleeper
set -e

if [[ -n "$jobstatus" ]]; then
break
else
sleep "$job_status_retry_interval_sec"
fi

if [[ $timeout -gt 0 ]]; then
(( counter -= sleeper )) || jobstatus="timeout"
(( counter -= job_status_retry_interval_sec )) || jobstatus="timed-out"
fi
done

echo
echo "--- :kubernetes: Job status: $jobstatus"
status=1

# Wait for logs to be fully printed, printing runs in a separate process and we're racing with it.
readonly log_complete_start_time="$SECONDS"
while [[ "$(cat "$step_container_log_complete_marker_file")" != "0" ]] && [[ "$((SECONDS - log_complete_start_time))" -lt "$log_complete_timeout_sec" ]]
do
sleep "$log_complete_retry_interval_sec"
done

status=""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
status=""
status=1

we see issues where somehow we end up exiting with this default value:

/buildkite/plugins/github-com-artem-zinnatullin-k8s-buildkite-plugin-git-65192202c691578f25d42d04d57fb4b12d5b49c6/hooks/command: line 149: exit: null: numeric argument required

so having this more conservative exit code is a bit nicer

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The null is likely coming from setting the status to the output of jq e.g here https://github.com/EmbarkStudios/k8s-buildkite-plugin/pull/14/files#diff-7a7c160f9b73cfeaf4d8b93aee769fc1R188

@artem-zinnatullin I think we'd want to only set status if jq finds a match? one way would be to use enable exit code with jq -e so that it exits with non-zero if no match is found and then, only set the status if the exit code is 0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jq -e sounds good, annoying that you have to do set +e; jq -e; set -e; or write to a file to like if jq -e > result to handle both error code and output, but what can you do I guess 😅

if [[ "$jobstatus" == "Complete" ]] ; then
echo "success"
status=0
status="0"
else
while true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like we'll keep looping if we keep failing to get the init container status? Let's break out of the loop after a hardcoded number of attempts or timeout instead?

do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also print a message that says we're checking init containers status

set +e
pod_json=$(kubectl get pod "$pod_name" -o json)
init_container_status="$(echo "$pod_json" | jq ".status.initContainerStatuses[0].state.terminated.exitCode")"
if [[ -n "$init_container_status" && "$init_container_status" != "0" ]]; then
echo "Warning: init container failed with exit code $init_container_status, this usually indicates plugin misconfiguration or infrastructure failure"
status="$init_container_status"
else
status="$(echo "$pod_json" | jq ".status.containerStatuses[0].state.terminated.exitCode")"
fi
set -e
if [[ -n "$status" ]]; then
break
else
sleep "$job_status_retry_interval_sec"
if [[ $timeout -gt 0 ]]; then
(( counter -= job_status_retry_interval_sec ))
fi
fi
done
Comment on lines +179 to +199
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work checking init containers too 🙌

fi

if [[ -z "$status" ]]; then
echo "Warning: could not get actual exit code of the job"
status="42"
fi

exit $status
exit "$status"
31 changes: 27 additions & 4 deletions hooks/pre-exit
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,33 @@ job_name="$(cat /tmp/job_name)"

echo "--- :kubernetes: Cleanup"

pod=$(kubectl get pod --output=name -l "job-name=${job_name}")
if [[ -n "${pod}" ]] ; then
kubectl patch --patch '{"spec":{"activeDeadlineSeconds":1}}' "${pod}"
readonly job_cleanup_after_finished_via_plugin="${BUILDKITE_PLUGIN_K8S_JOB_CLEANUP_AFTER_FINISHED_VIA_PLUGIN:-}"
readonly job_cleanup_retry_sec="${BUILDKITE_PLUGIN_K8S_JOB_CLEANUP_RETRY_INTERVAL_SEC:-5}"
readonly job_cleanup_timeout_sec="${BUILDKITE_PLUGIN_K8S_JOB_CLEANUP_TIMEOUT_SEC:-60}"
readonly job_cleanup_start_time="$SECONDS"

if [[ "$job_cleanup_after_finished_via_plugin" != "true" ]]; then
exit 0
fi

kubectl patch --patch '{"spec":{"activeDeadlineSeconds":1}}' "job/${job_name}"
job_cleanup_exit_code=""
while [[ "$((SECONDS - job_cleanup_start_time))" -lt "$job_cleanup_timeout_sec" ]]
do
set +e
pod=$(kubectl get pod --output=name -l "job-name=${job_name}")
if [[ -n "${pod}" ]] ; then
kubectl patch --patch '{"spec":{"activeDeadlineSeconds":1}}' "${pod}"
fi
kubectl patch --patch '{"spec":{"activeDeadlineSeconds":1}}' "job/${job_name}"
job_cleanup_exit_code="$?"
set -e

if [[ "$job_cleanup_exit_code" == "0" ]]; then
break
else
echo "Attempt to cleanup the job failed, exit code '$job_cleanup_exit_code'"
sleep "$job_cleanup_retry_sec"
fi
done

echo "Job cleanup exit code '$job_cleanup_exit_code'"
7 changes: 5 additions & 2 deletions lib/job.jsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ local allowedEnvs = std.set(
'BUILDKITE_MESSAGE',
'BUILDKITE_BUILD_CREATOR',
'BUILDKITE_BUILD_CREATOR_EMAIL',
'BUILDKITE_BUILD_ID',
'BUILDKITE_BUILD_NUMBER',
'BUILDKITE_BUILD_PATH',
'BUILDKITE_BUILD_URL',
Expand Down Expand Up @@ -35,10 +36,11 @@ local numberSuffix(s) =

local labelChars = std.set(std.stringChars('abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-_.'));
local labelValue(s) =
std.join('', [
local sanitizedValue = std.join('', [
if std.setMember(c, labelChars) then c else '_'
for c in std.stringChars(s)
]);
if std.length(sanitizedValue) < 63 then sanitizedValue else std.substr(sanitizedValue, 0, 63);

function(jobName, agentEnv={}, stepEnvFile='', patchFunc=identity) patchFunc({
local buildSubPath = std.join('/', [
Expand Down Expand Up @@ -69,6 +71,7 @@ function(jobName, agentEnv={}, stepEnvFile='', patchFunc=identity) patchFunc({
BUILDKITE_PLUGIN_K8S_RESOURCES_REQUEST_MEMORY: '',
BUILDKITE_PLUGIN_K8S_RESOURCES_LIMIT_MEMORY: '',
BUILDKITE_PLUGIN_K8S_WORKDIR: std.join('/', [env.BUILDKITE_BUILD_PATH, buildSubPath]),
BUILDKITE_PLUGIN_K8S_JOB_TTL_SECONDS_AFTER_FINISHED: '86400',
artem-zinnatullin marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm really bad at Jsonnet, I've declared default value in YAML, but not sure how to keep this declaration in .jsonnet 😬

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks correct! We have these as default values then we override them with any values provided by the pipeline https://github.com/EmbarkStudios/k8s-buildkite-plugin/pull/14/files#diff-d8106e0da5023a3b7eb0401c475f88b2R75

} + agentEnv,

local stepEnv =
Expand Down Expand Up @@ -141,7 +144,6 @@ function(jobName, agentEnv={}, stepEnvFile='', patchFunc=identity) patchFunc({
'build/creator-email': env.BUILDKITE_BUILD_CREATOR_EMAIL,
'build/id': env.BUILDKITE_BUILD_ID,
'build/url': env.BUILDKITE_BUILD_URL,
'build/message': env.BUILDKITE_MESSAGE,
'build/number': env.BUILDKITE_BUILD_NUMBER,
'build/organization': env.BUILDKITE_ORGANIZATION_SLUG,
'build/repo': env.BUILDKITE_REPO,
Expand Down Expand Up @@ -292,6 +294,7 @@ function(jobName, agentEnv={}, stepEnvFile='', patchFunc=identity) patchFunc({
backoffLimit: 0,
activeDeadlineSeconds: deadline,
completions: 1,
ttlSecondsAfterFinished: std.parseInt(env.BUILDKITE_PLUGIN_K8S_JOB_TTL_SECONDS_AFTER_FINISHED),
template: {
metadata: {
labels: labels,
Expand Down
Loading