-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Propagate Reason/Notes for operators disabled by default from plugin to Qualification tool unsupported operators csv file #850
Conversation
Signed-off-by: Niranjan Artal <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @nartal1 !
- I think we can simplify the code if we improve the enum class implementation. so we want to have to keep passing strings around.
- For the remaining part related on how to report the reason. The code changes here is again relying on passing a
Map[String,String]
all the way down to the report. This is going to take us back again to unstructured bridge between reporting and logic. We had this problem once before when we relied on strings to build the unsupported-CSV file instead of using the objects. For instance, if we later decide to extend that implementation to indicate whether a specific operator is not supported due to the dataTypes, then theMap[String,String]
won't work for us and we end up with two different implementations to fill in the reason details. - My suggestion is to make the changes in the SqlPlanParser while constructing the Execs. In that part of teh code we have access to the pluginTypeChecker, and all other required info to set the "ReasonID" into the exec. If you had already considered that approach and found blockers, then we can discuss it further offline.
|
||
def reportUnsupportedReason(unsupportedReason: UnsupportedReason, execValue: String): String = { | ||
def reportUnsupportedReason(unsupportedReason: UnsupportedReason, | ||
execValue: String, customReason: String): String = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can define a user-defined enum so that CUSTOM_REASON
can be initialized by the "reason".
And the enum can have a function to return the string.
In that case, we won't have to explicitly pass customReason as it becomes part of the object instant.
P.S: As a bonus we can cache the CUSTOM_REASON if they match in the same literal string so we won't allocate too many objects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @amahussein ! Updated the code to use user-defined enum and cache it if is already visited. Removed passing customReason
@@ -55,6 +56,7 @@ object UnsupportedReasons extends Enumeration { | |||
case IS_UNSUPPORTED => "Unsupported" | |||
case CONTAINS_UNSUPPORTED_EXPR => "Contains unsupported expr" | |||
case UNSUPPORTED_IO_FORMAT => "Unsupported IO format" | |||
case CUSTOM_REASON => customReason |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we change the definition of the enum type, then we can get rid of the match-case statement and simoply call unsupportedReason.getReason()
@@ -66,7 +68,8 @@ case class UnsupportedExecSummary( | |||
opType: OpTypes.OpType, | |||
reason: UnsupportedReasons.UnsupportedReason, | |||
opAction: OpActions.OpAction, | |||
isExpression: Boolean = false) { | |||
isExpression: Boolean = false, | |||
customReason: String) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I suggested earlier, I am not a big fan of having reason, and customReason as argument
assert(exit == 0) | ||
// the code above that runs the Spark query stops the Sparksession | ||
// so create a new one to read in the csv file | ||
// createSparkSession() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove commented code
case UnsupportedReasons.CONTAINS_UDF => UnsupportedReasons.IS_UDF | ||
case UnsupportedReasons.CONTAINS_DATASET => UnsupportedReasons.IS_DATASET | ||
case UnsupportedReasons.UNSUPPORTED_IO_FORMAT => UnsupportedReasons.UNSUPPORTED_IO_FORMAT | ||
case _ => UnsupportedReasons.IS_UNSUPPORTED | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the above match-case missing custom_reason case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CUSTOM_REASON check is not required here since it is assigning reasons for expressions. In the subsequent lines, we check per expression if it has any custom reason and assign it accordingly.
data.write.parquet(s"$outParquetFile/person_info") | ||
val df = spark.read.parquet(s"$outParquetFile/person_info") | ||
val dfToJson = df.withColumn("person_json", to_json($"person")) | ||
dfToJson |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: do we need variable dfToJson
?
eventLog)) | ||
|
||
val (exit, _) = | ||
QualificationMain.mainInternal(appArgs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we put val (exit, _) = QualificationMain.mainInternal(appArgs)
in one line?
def getCustomReason(operator: String, unsSupportedOpsReasons: Map[String, String]): String = | ||
unsSupportedOpsReasons.getOrElse(operator, "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: typo extra s
in unsSupportedOpsReasons
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed this function altogether.
|
||
def reportUnsupportedReason(unsupportedReason: UnsupportedReason, execValue: String): String = { | ||
def reportUnsupportedReason(unsupportedReason: UnsupportedReason, | ||
execValue: String, customReason: String): String = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably this existed from before. But the argument execValue
is not used anywhere in the method. We should remove it.
* @return A Map containing the processed operators. | ||
*/ | ||
def readOperators(source: BufferedSource, operatorType: String, isSupported: Boolean, | ||
processLine: (Array[String], String, Boolean) => Seq[(String, String)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All use cases of this method call the same function for processLine
- processOperatorLine
. In this case, should we remove this as argument?
Signed-off-by: Niranjan Artal <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Thanks @nartal1 !
This fixes #769
Currently we have Enum with static values for specifying Reasons for unsupported operators. In this PR, added another value
customReason
which gets assigned dynamically from the unsupported operators. We save the operators which are disabled by default along with the notes/reasons in a Map. And when we encounter those operators while parsing unsupported operators, we assign the reason from the Map.Refactored the code so that we could use the common functions for getting both supported operators and operators that are disabled by default(Not supported). Added unit test to test
to_json
mentioned in the issue.