-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix issue with long data source names #14620
Conversation
@@ -43,4 +45,10 @@ public static String convertTaskIdToK8sLabel(String taskId) | |||
return taskId == null ? "" : StringUtils.left(RegExUtils.replaceAll(taskId, K8S_TASK_ID_PATTERN, "") | |||
.toLowerCase(Locale.ENGLISH), 63); | |||
} | |||
|
|||
public static String convertTaskIdToJobName(String taskId) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whats the limit on length of job name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/ (its 63 characters)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still need to do this conversion if taskId is less than 63 chars? I mean whether it's valuable to keep taskId same as JobName if it's possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we do need to since there are some characters in task ids that are not valid in k8s labels/job names
@@ -43,4 +45,10 @@ public static String convertTaskIdToK8sLabel(String taskId) | |||
return taskId == null ? "" : StringUtils.left(RegExUtils.replaceAll(taskId, K8S_TASK_ID_PATTERN, "") | |||
.toLowerCase(Locale.ENGLISH), 63); | |||
} | |||
|
|||
public static String convertTaskIdToJobName(String taskId) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still need to do this conversion if taskId is less than 63 chars? I mean whether it's valuable to keep taskId same as JobName if it's possible.
+ " not found"); | ||
}, | ||
DruidK8sConstants.IS_TRANSIENT, quietTries, maxTries | ||
); | ||
} | ||
catch (Exception e) { | ||
throw new KubernetesResourceNotFoundException("K8s pod with label: job-name=" + k8sTaskId + " not found"); | ||
throw new KubernetesResourceNotFoundException("K8s pod with label: job-name=" + jobName + " not found"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we pass jobName
instead of K8sTaskId
, we lose the originalTaskId
in the log, not sure if it's useful for troubleshooting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm i think it might be, although technically the k8sTaskId being logged right now is not really the real one b/c of the bug i mentioned above. maybe i can find a way to get the actual task id in there
public static String convertTaskIdToJobName(String taskId) | ||
{ | ||
return taskId == null ? "" : StringUtils.left(RegExUtils.replaceAll(taskId, K8S_TASK_ID_PATTERN, "") | ||
.toLowerCase(Locale.ENGLISH), 30) + "-" + Hashing.murmur3_128().hashString(taskId, StandardCharsets.UTF_8); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hashing.murmur3_128().hashString(taskId, StandardCharsets.UTF_8)
How many characters will this produce? I couldn't easily figure out if there is a limit on the length of the string returned from this function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
128 bits -> 32 characters
K8sTaskId taskId = new K8sTaskId(job.getMetadata().getName()); | ||
log.info("Successfully submitted job: %s ... waiting for job to launch", taskId); | ||
String jobName = job.getMetadata().getName(); | ||
log.info("Successfully submitted job: %s ... waiting for job to launch", jobName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to report the taskId in this log line, to make it easy to map to the task_id that is being started.
Fixes a issue where having data source names (or more generally task ids) that are too long cause the K8sTaskRunner to throw errors.
Description
The K8sTaskRunner truncates task ids in order to get a valid K8s job name (less than 64 characters) and doing this can cause loss of uniqueness. I fixed this by appending a 128 bit hash onto the end of the task id which i think should be sufficient to ensure uniqueness. (Change is in KubernetesOverlordUtils).
I also had to fix a related issue that was covered up by the fact that our truncation function can be applied repeatedly wihtout any affect.
The issue is the line in KubernetesPeonClient
K8sTaskId taskId = new K8sTaskId(job.getMetadata().getName());.
This takes the job name from the k8s api (which has already been truncated by our truncation function since it is a job name), and then passes it into the K8sTaskId constructor as the originalTaskId. The K8sTaskId calls the truncation function a second time to generate the k8sTaskId. This is okay when our truncation function just grabs the first 63 characters of the task id since repeatedly applying this won't do anything, but with my latest changes this is no longer the case.
I updated this logic to just use the job name directly from the kubernetes api (that's all it needs anyways) and I had to change a lot of function signatures to do this. I also explicitly changed the k8sTaskId field to k8sJobName since that's the only thing that field is used for.
This does make it slightly more confusing to find out which task a job corresponds to from a k8s client like k8s, but the PodTemplateTaskAdapter also applies the whole unmodified task id as a annotation.
Release note
-->
Key changed/added classes in this PR
KubernetesOverlordUtils
KubernetesPeonClient
K8sTaskId
This PR has: