-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Display the output column name in InvalidNullByteException #14780
Display the output column name in InvalidNullByteException #14780
Conversation
can "Consider sanitizing the string using" be changed to "Consider sanitizing the input string using" |
IntList outputColumnsForQueryColumn = columnMappings.getOutputColumnsForQueryColumn(columnName); | ||
|
||
// outputColumnsForQueryColumn.size should always be 1 due to hasUniqueOutputColumnNames check that is done | ||
if (outputColumnsForQueryColumn.size() >= 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Should we handle the case where it is more than 1 separately for neatness? That said, since this is code to handle a different type of exception, I am okay with the current code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be fine since even if there are multiple columns for a single query column, to surface the exception, we can only display one name, and displaying the first one seems okay.
If the code which throws InvalidNullByteException
instead showed all the columns with values \0 in them (which I think won't be possible), we can modify it, however as the code currently is, we throw on encountering the first value with \0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left one comment.
Rest all LGTM.
: null; | ||
final MSQErrorReport workerError = workerErrorRef.get(); | ||
MSQErrorReport workerError = mapQueryColumnNameToOutputColumnName(workerErrorRef.get()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO: this change should go into :
druid/extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/ControllerImpl.java
Lines 739 to 747 in 3ddcaa0
public void workerError(MSQErrorReport errorReport) | |
{ | |
if (workerTaskLauncher.isTaskCanceledByController(errorReport.getTaskId()) || | |
!workerTaskLauncher.isTaskLatest(errorReport.getTaskId())) { | |
log.info("Ignoring task %s", errorReport.getTaskId()); | |
} else { | |
workerErrorRef.compareAndSet(null, errorReport); | |
} | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes much more sense, incorporated the feedback in my latest commit.
Missed out on the latest comments. Thanks for the reviews @abhishekagarwal87 @adarshsanjeev @cryptoe!! |
Description
InvalidNullByteFault for MSQ displayed the query column name (the one that is used internally while planning and executing the query) instead of the output column name (the one that is given by the user). Sometimes the two are similar, however, for columns involving expressions, it is not the case.
It can be misleading for someone who is not familiar with this quirk, and even then it would require reading the native plan to figure out the culprit column.
This PR maps the query column to the output column name while surfacing the fault since that is readily visible to the user while executing the query.
For a query like:
Original Behaviour:
InvalidNullByte: Invalid null byte at source [external input source: InlineInputSource{data='{"desc":"Normal row","text":"Hi I am a normal row"} {"desc":"Row with NULL","text":"There is a null in\u0000 here somewhere"} {"desc":"Anther normal row","text":"Nice"} '}], rowNumber [2], column[v0], value[There is A null in here somewhere], position[18]. Consider sanitizing the string using REPLACE("v0", U&'\0000', '') AS v0.
Notice it refers to the column as 'v0'
Modified Behaviour
InvalidNullByte: Invalid null byte at source [external input source: InlineInputSource{data='{"desc":"Normal row","text":"Hi I am a normal row"} {"desc":"Row with NULL","text":"There is a null in\u0000 here somewhere"} {"desc":"Anther normal row","text":"Nice"} '}], rowNumber [2], column[text], value[There is A null in here somewhere], position[18]. Consider sanitizing the string using REPLACE("text", U&'\0000', '') AS text.
Note, for columns without alias, it will display it as
EXPR$<number>
which is still more informative than something like v0.Modified Behaviour
InvalidNullByte: Invalid null byte at source [external input source: InlineInputSource{data='{"desc":"Normal row","text":"Hi I am a normal row"} {"desc":"Row with NULL","text":"There is a null in\u0000 here somewhere"} {"desc":"Anther normal row","text":"Nice"} '}], rowNumber [2], column[EXPR$1], value[There is A null in here somewhere], position[18]. Consider sanitizing the string using REPLACE("EXPR$1", U&'\0000', '') AS EXPR$1.
This PR has: