-
Notifications
You must be signed in to change notification settings - Fork 63
Add DatabricksPythonJob
external help file: azure.databricks.cicd.tools-help.xml Module Name: azure.databricks.cicd.tools online version: schema: 2.0.0
Creates Python Job in Databricks. Script uses Databricks API 2.0 create job query: https://docs.azuredatabricks.net/api/latest/jobs.html#create
Add-DatabricksPythonJob [[-BearerToken] <String>] [[-Region] <String>] [-JobName] <String>
[[-ClusterId] <String>] [[-SparkVersion] <String>] [[-NodeType] <String>] [[-DriverNodeType] <String>]
[[-MinNumberOfWorkers] <Int32>] [[-MaxNumberOfWorkers] <Int32>] [[-Timeout] <Int32>] [[-MaxRetries] <Int32>]
[[-ScheduleCronExpression] <String>] [[-Timezone] <String>] [-PythonPath] <String>
[[-PythonParameters] <String[]>] [[-Libraries] <String[]>] [[-PythonVersion] <String>]
[[-Spark_conf] <Hashtable>] [[-CustomTags] <Hashtable>] [[-InitScripts] <String[]>]
[[-SparkEnvVars] <Hashtable>] [-RunImmediate] [[-ClusterLogPath] <String>] [[-InstancePoolId] <String>]
[<CommonParameters>]
Creates Python Job in Databricks. Script uses Databricks API 2.0 create job query: https://docs.azuredatabricks.net/api/latest/jobs.html#create If the job name exists it will be updated instead of creating a new job.
Add-DatabricksPythonJob -BearerToken $BearerToken -Region $Region -JobName "Job1" -SparkVersion "5.3.x-scala2.11" -NodeType "Standard_D3_v2" -MinNumberOfWorkers 2 -MaxNumberOfWorkers 2 -Timeout 100 -MaxRetries 3 -ScheduleCronExpression "0 15 22 ? * *" -Timezone "UTC" -PythonPath "/Shared/TestPython.py" -PythonParameters "val1", "val2" -Libraries '{"pypi":{package:"simplejson"}}', '{"jar": "DBFS:/mylibraries/test.jar"}'
The above example create a job on a new cluster.
Your Databricks Bearer token to authenticate to your workspace (see User Settings in Datatbricks WebUI)
Type: String
Parameter Sets: (All)
Aliases:
Required: False
Position: 1
Default value: None
Accept pipeline input: False
Accept wildcard characters: False
Azure Region - must match the URL of your Databricks workspace, example: northeurope
Type: String
Parameter Sets: (All)
Aliases:
Required: False
Position: 2
Default value: None
Accept pipeline input: False
Accept wildcard characters: False
Name of the job that will appear in the Job list. If a job with this name exists it will be updated.
Type: String
Parameter Sets: (All)
Aliases:
Required: True
Position: 3
Default value: None
Accept pipeline input: False
Accept wildcard characters: False
The ClusterId of an existing cluster to use. Optional.
Type: String
Parameter Sets: (All)
Aliases:
Required: False
Position: 4
Default value: None
Accept pipeline input: False
Accept wildcard characters: False
Spark version for cluster that will run the job. Example: 5.3.x-scala2.11 Note: Ignored if ClusterId is populated.
Type: String
Parameter Sets: (All)
Aliases:
Required: False
Position: 5
Default value: None
Accept pipeline input: False
Accept wildcard characters: False
Type of worker for cluster that will run the job. Example: Standard_D3_v2. Note: Ignored if ClusterId is populated.
Type: String
Parameter Sets: (All)
Aliases:
Required: False
Position: 6
Default value: None
Accept pipeline input: False
Accept wildcard characters: False
Type of driver for cluster that will run the job. Example: Standard_D3_v2. If not provided the NodeType will be used. Note: Ignored if ClusterId is populated.
Type: String
Parameter Sets: (All)
Aliases:
Required: False
Position: 7
Default value: None
Accept pipeline input: False
Accept wildcard characters: False
Number of workers for cluster that will run the job. Note: If Min & Max Workers are the same autoscale is disabled. Note: Ignored if ClusterId is populated.
Type: Int32
Parameter Sets: (All)
Aliases:
Required: False
Position: 8
Default value: 0
Accept pipeline input: False
Accept wildcard characters: False
Number of workers for cluster that will run the job. Note: If Min & Max Workers are the same autoscale is disabled. Note: Ignored if ClusterId is populated.
Type: Int32
Parameter Sets: (All)
Aliases:
Required: False
Position: 9
Default value: 0
Accept pipeline input: False
Accept wildcard characters: False
Timeout, in seconds, applied to each run of the job. If not set, there will be no timeout.
Type: Int32
Parameter Sets: (All)
Aliases:
Required: False
Position: 10
Default value: 0
Accept pipeline input: False
Accept wildcard characters: False
An optional maximum number of times to retry an unsuccessful run. A run is considered to be unsuccessful if it completes with a FAILED result_state or INTERNAL_ERROR life_cycle_state. The value -1 means to retry indefinitely and the value 0 means to never retry. If not set, the default behavior will be never retry.
Type: Int32
Parameter Sets: (All)
Aliases:
Required: False
Position: 11
Default value: 0
Accept pipeline input: False
Accept wildcard characters: False
By default, job will run when triggered using Jobs UI or sending API request to run. You can provide cron schedule expression for job's periodic run. How to compose cron schedule expression: http://www.quartz-scheduler.org/documentation/quartz-2.1.x/tutorials/tutorial-lesson-06.html
Type: String
Parameter Sets: (All)
Aliases:
Required: False
Position: 12
Default value: None
Accept pipeline input: False
Accept wildcard characters: False
Timezone for Cron Schedule Expression. Required if ScheduleCronExpression provided. See here for all possible timezones: http://joda-time.sourceforge.net/timezones.html Example: UTC
Type: String
Parameter Sets: (All)
Aliases:
Required: False
Position: 13
Default value: None
Accept pipeline input: False
Accept wildcard characters: False
Path to the py script in Databricks that will be executed by this Job. Must be a DBFS location from root, example "dbfs:/folder/file.py".
Type: String
Parameter Sets: (All)
Aliases:
Required: True
Position: 14
Default value: None
Accept pipeline input: False
Accept wildcard characters: False
Optional parameters that will be provided to script when Job is executed. Example: "val1", "val2"
Type: String[]
Parameter Sets: (All)
Aliases:
Required: False
Position: 15
Default value: None
Accept pipeline input: False
Accept wildcard characters: False
Optional. Array of json strings. Example: '{"pypi":{package:"simplejson"}}', '{"jar", "DBFS:/mylibraries/test.jar"}'
Type: String[]
Parameter Sets: (All)
Aliases:
Required: False
Position: 16
Default value: None
Accept pipeline input: False
Accept wildcard characters: False
2 or 3 - defaults to 2.
Type: String
Parameter Sets: (All)
Aliases:
Required: False
Position: 17
Default value: 3
Accept pipeline input: False
Accept wildcard characters: False
Hashtable. Example @{"spark.speculation"=$true; "spark.streaming.ui.retainedBatches"= 5}
Type: Hashtable
Parameter Sets: (All)
Aliases:
Required: False
Position: 18
Default value: None
Accept pipeline input: False
Accept wildcard characters: False
Custom Tags to set, provide hash table of tags. Example: @{CreatedBy="SimonDM";NumOfNodes=2;CanDelete=$true}
Type: Hashtable
Parameter Sets: (All)
Aliases:
Required: False
Position: 19
Default value: None
Accept pipeline input: False
Accept wildcard characters: False
Init scripts to run post creation. Example: "dbfs:/script/script1", "dbfs:/script/script2"
Type: String[]
Parameter Sets: (All)
Aliases:
Required: False
Position: 20
Default value: None
Accept pipeline input: False
Accept wildcard characters: False
An object containing a set of optional, user-specified environment variable key-value pairs. Key-value pairs of the form (X,Y) are exported as is (i.e., export X='Y') while launching the driver and workers. Example: '@{SPARK_WORKER_MEMORY="29000m";SPARK_LOCAL_DIRS="/local_disk0"}
Type: Hashtable
Parameter Sets: (All)
Aliases:
Required: False
Position: 21
Default value: None
Accept pipeline input: False
Accept wildcard characters: False
Switch. Performs a Run Now task instead of creating a job. The process is executed immediately in an async process. Setting this option returns a RunId.
Type: SwitchParameter
Parameter Sets: (All)
Aliases:
Required: False
Position: Named
Default value: False
Accept pipeline input: False
Accept wildcard characters: False
DBFS Location for Cluster logs - must start with dbfs:/ Example dbfs:/logs/mycluster
Type: String
Parameter Sets: (All)
Aliases:
Required: False
Position: 22
Default value: None
Accept pipeline input: False
Accept wildcard characters: False
{{ Fill InstancePoolId Description }}
Type: String
Parameter Sets: (All)
Aliases:
Required: False
Position: 23
Default value: None
Accept pipeline input: False
Accept wildcard characters: False
This cmdlet supports the common parameters: -Debug, -ErrorAction, -ErrorVariable, -InformationAction, -InformationVariable, -OutVariable, -OutBuffer, -PipelineVariable, -Verbose, -WarningAction, and -WarningVariable. For more information, see about_CommonParameters.
Author: Simon D'Morias / Data Thirst Ltd