Skip to content

Latest commit

 

History

History
127 lines (95 loc) · 8.06 KB

File metadata and controls

127 lines (95 loc) · 8.06 KB

Description

This module is used to create a Kubernetes job template file.

The job template file can be submitted as is or used as a template for further customization. Add the instructions output to a blueprint (as shown below) to get instructions on how to use kubectl to submit the job.

This module is designed to use one or more gke-node-pool modules. The job will be configured to run on any of the specified node pools.

NOTE: This is an experimental module and the functionality and documentation will likely be updated in the near future. This module has only been tested in limited capacity.

Example

The following example creates a GKE job template file.

  - id: job-template
    source: community/modules/compute/gke-job-template
    use: [compute_pool]
    settings:
      node_count: 3
    outputs: [instructions]

Also see a full GKE example blueprint.

Storage Options

This module natively supports:

  • Filestore as a shared file system between pods/nodes.
  • Pod level ephemeral storage options:
    • memory backed emptyDir
    • local SSD backed emptyDir
    • SSD persistent disk backed ephemeral volume
    • balanced persistent disk backed ephemeral volume

See the storage-gke.yaml blueprint and the associated documentation for examples of how to use Filestore and ephemeral storage with this module.

Requested Resources

When one or more gke-node-pool modules are referenced with the use field. The requested resources will be populated to achieve a 1 pod per node packing while still leaving some headroom for required system pods.

This functionality can be overridden by specifying the desired cpu requirement using the requested_cpu_per_pod setting.

License

Copyright 2023 Google LLC

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

 http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Requirements

Name Version
terraform >= 1.2
local >= 2.0.0
random ~> 3.0

Providers

Name Version
local >= 2.0.0
random ~> 3.0

Modules

No modules.

Resources

Name Type
local_file.job_template resource
random_id.resource_name_suffix resource

Inputs

Name Description Type Default Required
allocatable_cpu_per_node The allocatable cpu per node. Used to claim whole nodes. Generally populated from gke-node-pool via use field. list(number)
[
-1
]
no
backoff_limit Controls the number of retries before considering a Job as failed. Set to zero for shared fate. number 0 no
command The command and arguments for the container that run in the Pod. The command field corresponds to entrypoint in some container runtimes. list(string)
[
"hostname"
]
no
completion_mode Sets value of completionMode on the job. Default uses indexed jobs. See documentation for more information string "Indexed" no
ephemeral_volumes Will create an emptyDir or ephemeral volume that is backed by the specified type: memory, local-ssd, pd-balanced, pd-ssd. size_gb is provided in GiB.
list(object({
type = string
mount_path = string
size_gb = number
}))
[] no
has_gpu Indicates that the job should request nodes with GPUs. Typically supplied by a gke-node-pool module. list(bool)
[
false
]
no
image The container image the job should use. string "debian" no
k8s_service_account_name Kubernetes service account to run the job as. If null then no service account is specified. string null no
labels Labels to add to the GKE job template. Key-value pairs. map(string) n/a yes
machine_family The machine family to use in the node selector (example: n2). If null then machine family will not be used as selector criteria. string null no
name The name of the job. string "my-job" no
node_count How many nodes the job should run in parallel. number 1 no
node_pool_name A list of node pool names on which to run the job. Can be populated via use field. list(string) [] no
node_selectors A list of node selectors to use to place the job.
list(object({
key = string
value = string
}))
[] no
persistent_volume_claims A list of objects that describes a k8s PVC that is to be used and mounted on the job. Generally supplied by the gke-persistent-volume module.
list(object({
name = string
mount_path = string
mount_options = string
is_gcs = bool
}))
[] no
random_name_sufix Appends a random suffix to the job name to avoid clashes. bool true no
requested_cpu_per_pod The requested cpu per pod. If null, allocatable_cpu_per_node will be used to claim whole nodes. If provided will override allocatable_cpu_per_node. number -1 no
restart_policy Job restart policy. Only a RestartPolicy equal to Never or OnFailure is allowed. string "Never" no
tolerations Tolerations allow the scheduler to schedule pods with matching taints. Generally populated from gke-node-pool via use field.
list(object({
key = string
operator = string
value = string
effect = string
}))
[
{
"effect": "NoSchedule",
"key": "user-workload",
"operator": "Equal",
"value": "true"
}
]
no

Outputs

Name Description
instructions Instructions for submitting the GKE job.