Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-24366][SQL] Improving of error messages for type converting #21410

Closed
wants to merge 2 commits into from

Conversation

MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented May 23, 2018

What changes were proposed in this pull request?

Currently, users are getting the following error messages on type conversions:

scala.MatchError: test (of class java.lang.String)

The message doesn't give any clues to the users where in the schema the error happened. In this PR, I would like to improve the error message like:

The value (test) of the type (java.lang.String) cannot be converted to struct<f1:int>

How was this patch tested?

Added tests for converting of wrong values to struct, map, array, string and decimal.

@SparkQA
Copy link

SparkQA commented May 23, 2018

Test build #91034 has finished for PR 21410 at commit 26cc2f8.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@ssimeonov
Copy link
Contributor

This is an excellent start and a worthy improvement.

Is there a way to identify where in the schema the issue is occurring? For example, when you have a schema with many nested fields, the failing value is helpful but the breadcrumb trail, e.g., a.b.c where this is happening, is required to easily isolate the issue in the input data and resolve it.

@MaxGekk
Copy link
Member Author

MaxGekk commented May 23, 2018

jenkins, retest this, please

@SparkQA
Copy link

SparkQA commented May 23, 2018

Test build #91062 has finished for PR 21410 at commit 26cc2f8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk
Copy link
Member Author

MaxGekk commented May 24, 2018

Is there a way to identify where in the schema the issue is occurring?

We can catch the exceptions on each level of schema tree traversal, and show sub-trees in each catch. For example: array<map<..., array<struct<f2:int>>>> , the first exception will point out struct<f2:int>, the second one array<struct<f2:int>> and up to the "root" schema.

e.g., a.b.c where this is happening, is required to easily isolate the issue in the input data and resolve it.

I guess in the case of arrays and maps, you want to see indexes and keys. Could you provide concrete example with values and a schema (array, struct, map), and what kind of info the error should contain.

Just in case, I would propose to make such improvements in a separate PR.

@MaxGekk
Copy link
Member Author

MaxGekk commented May 24, 2018

@gatorsmile Could you look at the PR, please. The changes should help us in trouble shooting of customer's issues.

@@ -309,6 +322,9 @@ object CatalystTypeConverters {
case d: JavaBigDecimal => Decimal(d)
case d: JavaBigInteger => Decimal(d)
case d: Decimal => d
case other => throw new IllegalArgumentException(
s"The value (${other.toString}) of the type (${other.getClass.getCanonicalName}) "
+ s"cannot be converted to ${dataType.simpleString}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let us use catalogString here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All simpleStrings are replaced by catalogStrings

@gatorsmile
Copy link
Member

LGTM except one minor comment.

@SparkQA
Copy link

SparkQA commented May 25, 2018

Test build #91151 has finished for PR 21410 at commit ac76544.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

Thanks! Merged to master.

@asfgit asfgit closed this in 1b1528a May 25, 2018
@MaxGekk MaxGekk deleted the type-conv-error branch August 17, 2019 13:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants