Display the output column name in InvalidNullByteException #14780

LakshSingla · 2023-08-08T20:22:00Z

Description

InvalidNullByteFault for MSQ displayed the query column name (the one that is used internally while planning and executing the query) instead of the output column name (the one that is given by the user). Sometimes the two are similar, however, for columns involving expressions, it is not the case.
It can be misleading for someone who is not familiar with this quirk, and even then it would require reading the native plan to figure out the culprit column.

This PR maps the query column to the output column name while surfacing the fault since that is readily visible to the user while executing the query.

For a query like:

WITH "ext" AS (SELECT *
FROM TABLE(
  EXTERN(
    '{"type":"inline","data":"{\"desc\":\"Normal row\",\"text\":\"Hi I am a normal row\"}\n{\"desc\":\"Row with NULL\",\"text\":\"There is a null in\\u0000 here somewhere\"}\n{\"desc\":\"Anther normal row\",\"text\":\"Nice\"}\n"}',
    '{"type":"json"}'
  )
) EXTEND ("desc" VARCHAR, "text" VARCHAR))
SELECT
  "desc",
  REPLACE("text", 'a', 'A') AS "text"
FROM "ext"

Original Behaviour:

InvalidNullByte: Invalid null byte at source [external input source: InlineInputSource{data='{"desc":"Normal row","text":"Hi I am a normal row"} {"desc":"Row with NULL","text":"There is a null in\u0000 here somewhere"} {"desc":"Anther normal row","text":"Nice"} '}], rowNumber [2], column[v0], value[There is A null in here somewhere], position[18]. Consider sanitizing the string using REPLACE("v0", U&'\0000', '') AS v0.

Notice it refers to the column as 'v0'

Modified Behaviour
InvalidNullByte: Invalid null byte at source [external input source: InlineInputSource{data='{"desc":"Normal row","text":"Hi I am a normal row"} {"desc":"Row with NULL","text":"There is a null in\u0000 here somewhere"} {"desc":"Anther normal row","text":"Nice"} '}], rowNumber [2], column[text], value[There is A null in here somewhere], position[18]. Consider sanitizing the string using REPLACE("text", U&'\0000', '') AS text.

Note, for columns without alias, it will display it as EXPR$<number> which is still more informative than something like v0.

WITH "ext" AS (SELECT *
FROM TABLE(
  EXTERN(
    '{"type":"inline","data":"{\"desc\":\"Normal row\",\"text\":\"Hi I am a normal row\"}\n{\"desc\":\"Row with NULL\",\"text\":\"There is a null in\\u0000 here somewhere\"}\n{\"desc\":\"Anther normal row\",\"text\":\"Nice\"}\n"}',
    '{"type":"json"}'
  )
) EXTEND ("desc" VARCHAR, "text" VARCHAR))
SELECT
  "desc",
  REPLACE("text", 'a', 'A')
FROM "ext"

Modified Behaviour
InvalidNullByte: Invalid null byte at source [external input source: InlineInputSource{data='{"desc":"Normal row","text":"Hi I am a normal row"} {"desc":"Row with NULL","text":"There is a null in\u0000 here somewhere"} {"desc":"Anther normal row","text":"Nice"} '}], rowNumber [2], column[EXPR$1], value[There is A null in here somewhere], position[18]. Consider sanitizing the string using REPLACE("EXPR$1", U&'\0000', '') AS EXPR$1.

This PR has:

abhishekagarwal87 · 2023-08-10T04:16:54Z

can "Consider sanitizing the string using" be changed to "Consider sanitizing the input string using"

adarshsanjeev · 2023-08-10T09:59:36Z

...core/multi-stage-query/src/main/java/org/apache/druid/msq/indexing/error/MSQErrorReport.java

+          IntList outputColumnsForQueryColumn = columnMappings.getOutputColumnsForQueryColumn(columnName);
+
+          // outputColumnsForQueryColumn.size should always be 1 due to hasUniqueOutputColumnNames check that is done
+          if (outputColumnsForQueryColumn.size() >= 1) {


nit: Should we handle the case where it is more than 1 separately for neatness? That said, since this is code to handle a different type of exception, I am okay with the current code.

I think it should be fine since even if there are multiple columns for a single query column, to surface the exception, we can only display one name, and displaying the first one seems okay.

If the code which throws InvalidNullByteException instead showed all the columns with values \0 in them (which I think won't be possible), we can modify it, however as the code currently is, we throw on encountering the first value with \0.

cryptoe

Left one comment.
Rest all LGTM.

cryptoe · 2023-08-16T05:59:03Z

extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/ControllerImpl.java

          : null;
-      final MSQErrorReport workerError = workerErrorRef.get();
+      MSQErrorReport workerError = mapQueryColumnNameToOutputColumnName(workerErrorRef.get());


IMHO: this change should go into :

druid/extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/ControllerImpl.java

Lines 739 to 747 in 3ddcaa0

public void workerError(MSQErrorReport errorReport)

{

if (workerTaskLauncher.isTaskCanceledByController(errorReport.getTaskId()) ||

!workerTaskLauncher.isTaskLatest(errorReport.getTaskId())) {

log.info("Ignoring task %s", errorReport.getTaskId());

} else {

workerErrorRef.compareAndSet(null, errorReport);

}

}

Makes much more sense, incorporated the feedback in my latest commit.

LakshSingla · 2023-08-23T20:22:41Z

Missed out on the latest comments. Thanks for the reviews @abhishekagarwal87 @adarshsanjeev @cryptoe!!

LakshSingla added 4 commits August 8, 2023 10:12

initial commit

556e2c9

refactor, add comments

aea51ec

fix test

8d92a14

fix npe

ecc8fe2

error message change

ac801a1

adarshsanjeev approved these changes Aug 10, 2023

View reviewed changes

cryptoe reviewed Aug 16, 2023

View reviewed changes

review

a60fc4d

LakshSingla added the Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 label Aug 23, 2023

LakshSingla merged commit f9f734c into apache:master Aug 24, 2023
46 checks passed

LakshSingla deleted the inbe-better-column-name branch August 24, 2023 04:24

LakshSingla added this to the 28.0 milestone Oct 12, 2023

LakshSingla mentioned this pull request Nov 4, 2023

[DRAFT] 28.0.0 release notes #15326

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Display the output column name in InvalidNullByteException #14780

Display the output column name in InvalidNullByteException #14780

LakshSingla commented Aug 8, 2023 •

edited

Loading

abhishekagarwal87 commented Aug 10, 2023

adarshsanjeev Aug 10, 2023

LakshSingla Aug 10, 2023

cryptoe left a comment

cryptoe Aug 16, 2023

LakshSingla Aug 23, 2023

LakshSingla commented Aug 23, 2023 •

edited

Loading

	public void workerError(MSQErrorReport errorReport)
	{
	if (workerTaskLauncher.isTaskCanceledByController(errorReport.getTaskId()) \|\|
	!workerTaskLauncher.isTaskLatest(errorReport.getTaskId())) {
	log.info("Ignoring task %s", errorReport.getTaskId());
	} else {
	workerErrorRef.compareAndSet(null, errorReport);
	}
	}

Display the output column name in InvalidNullByteException #14780

Display the output column name in InvalidNullByteException #14780

Conversation

LakshSingla commented Aug 8, 2023 • edited Loading

Description

abhishekagarwal87 commented Aug 10, 2023

adarshsanjeev Aug 10, 2023

Choose a reason for hiding this comment

LakshSingla Aug 10, 2023

Choose a reason for hiding this comment

cryptoe left a comment

Choose a reason for hiding this comment

cryptoe Aug 16, 2023

Choose a reason for hiding this comment

LakshSingla Aug 23, 2023

Choose a reason for hiding this comment

LakshSingla commented Aug 23, 2023 • edited Loading

LakshSingla commented Aug 8, 2023 •

edited

Loading

LakshSingla commented Aug 23, 2023 •

edited

Loading