Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issue with long data source names #14620

Merged
merged 4 commits into from
Jul 24, 2023

Conversation

georgew5656
Copy link
Contributor

@georgew5656 georgew5656 commented Jul 19, 2023

Fixes a issue where having data source names (or more generally task ids) that are too long cause the K8sTaskRunner to throw errors.

Description

The K8sTaskRunner truncates task ids in order to get a valid K8s job name (less than 64 characters) and doing this can cause loss of uniqueness. I fixed this by appending a 128 bit hash onto the end of the task id which i think should be sufficient to ensure uniqueness. (Change is in KubernetesOverlordUtils).

I also had to fix a related issue that was covered up by the fact that our truncation function can be applied repeatedly wihtout any affect.
The issue is the line in KubernetesPeonClient
K8sTaskId taskId = new K8sTaskId(job.getMetadata().getName());.
This takes the job name from the k8s api (which has already been truncated by our truncation function since it is a job name), and then passes it into the K8sTaskId constructor as the originalTaskId. The K8sTaskId calls the truncation function a second time to generate the k8sTaskId. This is okay when our truncation function just grabs the first 63 characters of the task id since repeatedly applying this won't do anything, but with my latest changes this is no longer the case.

I updated this logic to just use the job name directly from the kubernetes api (that's all it needs anyways) and I had to change a lot of function signatures to do this. I also explicitly changed the k8sTaskId field to k8sJobName since that's the only thing that field is used for.

This does make it slightly more confusing to find out which task a job corresponds to from a k8s client like k8s, but the PodTemplateTaskAdapter also applies the whole unmodified task id as a annotation.

Release note

  • Fixes a issue where having data source names (or more generally task ids) that are too long cause the K8sTaskRunner to throw errors.
    -->

Key changed/added classes in this PR
  • KubernetesOverlordUtils
  • KubernetesPeonClient
  • K8sTaskId

This PR has:

  • [ X] been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • [ X] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • [X ] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • [ X] been tested in a test Druid cluster.

@@ -43,4 +45,10 @@ public static String convertTaskIdToK8sLabel(String taskId)
return taskId == null ? "" : StringUtils.left(RegExUtils.replaceAll(taskId, K8S_TASK_ID_PATTERN, "")
.toLowerCase(Locale.ENGLISH), 63);
}

public static String convertTaskIdToJobName(String taskId)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whats the limit on length of job name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need to do this conversion if taskId is less than 63 chars? I mean whether it's valuable to keep taskId same as JobName if it's possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we do need to since there are some characters in task ids that are not valid in k8s labels/job names

@@ -43,4 +45,10 @@ public static String convertTaskIdToK8sLabel(String taskId)
return taskId == null ? "" : StringUtils.left(RegExUtils.replaceAll(taskId, K8S_TASK_ID_PATTERN, "")
.toLowerCase(Locale.ENGLISH), 63);
}

public static String convertTaskIdToJobName(String taskId)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need to do this conversion if taskId is less than 63 chars? I mean whether it's valuable to keep taskId same as JobName if it's possible.

+ " not found");
},
DruidK8sConstants.IS_TRANSIENT, quietTries, maxTries
);
}
catch (Exception e) {
throw new KubernetesResourceNotFoundException("K8s pod with label: job-name=" + k8sTaskId + " not found");
throw new KubernetesResourceNotFoundException("K8s pod with label: job-name=" + jobName + " not found");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we pass jobName instead of K8sTaskId, we lose the originalTaskId in the log, not sure if it's useful for troubleshooting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm i think it might be, although technically the k8sTaskId being logged right now is not really the real one b/c of the bug i mentioned above. maybe i can find a way to get the actual task id in there

public static String convertTaskIdToJobName(String taskId)
{
return taskId == null ? "" : StringUtils.left(RegExUtils.replaceAll(taskId, K8S_TASK_ID_PATTERN, "")
.toLowerCase(Locale.ENGLISH), 30) + "-" + Hashing.murmur3_128().hashString(taskId, StandardCharsets.UTF_8);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hashing.murmur3_128().hashString(taskId, StandardCharsets.UTF_8)

How many characters will this produce? I couldn't easily figure out if there is a limit on the length of the string returned from this function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

128 bits -> 32 characters

K8sTaskId taskId = new K8sTaskId(job.getMetadata().getName());
log.info("Successfully submitted job: %s ... waiting for job to launch", taskId);
String jobName = job.getMetadata().getName();
log.info("Successfully submitted job: %s ... waiting for job to launch", jobName);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to report the taskId in this log line, to make it easy to map to the task_id that is being started.

@suneet-s suneet-s merged commit 28914bb into apache:master Jul 24, 2023
@LakshSingla LakshSingla added this to the 28.0 milestone Oct 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants