Propagate Reason/Notes for operators disabled by default from plugin to Qualification tool unsupported operators csv file #850

nartal1 · 2024-03-13T00:57:17Z

This fixes #769

Currently we have Enum with static values for specifying Reasons for unsupported operators. In this PR, added another value customReason which gets assigned dynamically from the unsupported operators. We save the operators which are disabled by default along with the notes/reasons in a Map. And when we encounter those operators while parsing unsupported operators, we assign the reason from the Map.

Refactored the code so that we could use the common functions for getting both supported operators and operators that are disabled by default(Not supported). Added unit test to test to_json mentioned in the issue.

Signed-off-by: Niranjan Artal <[email protected]>

…ported_notes

amahussein

Thanks @nartal1 !

I think we can simplify the code if we improve the enum class implementation. so we want to have to keep passing strings around.
For the remaining part related on how to report the reason. The code changes here is again relying on passing a Map[String,String] all the way down to the report. This is going to take us back again to unstructured bridge between reporting and logic. We had this problem once before when we relied on strings to build the unsupported-CSV file instead of using the objects. For instance, if we later decide to extend that implementation to indicate whether a specific operator is not supported due to the dataTypes, then the Map[String,String] won't work for us and we end up with two different implementations to fill in the reason details.
My suggestion is to make the changes in the SqlPlanParser while constructing the Execs. In that part of teh code we have access to the pluginTypeChecker, and all other required info to set the "ReasonID" into the exec. If you had already considered that approach and found blockers, then we can discuss it further offline.

amahussein · 2024-03-13T14:49:14Z

core/src/main/scala/com/nvidia/spark/rapids/tool/planparser/SQLPlanParser.scala


-  def reportUnsupportedReason(unsupportedReason: UnsupportedReason, execValue: String): String = {
+  def reportUnsupportedReason(unsupportedReason: UnsupportedReason,
+      execValue: String, customReason: String): String = {


We can define a user-defined enum so that CUSTOM_REASON can be initialized by the "reason".
And the enum can have a function to return the string.
In that case, we won't have to explicitly pass customReason as it becomes part of the object instant.

P.S: As a bonus we can cache the CUSTOM_REASON if they match in the same literal string so we won't allocate too many objects.

Thanks @amahussein ! Updated the code to use user-defined enum and cache it if is already visited. Removed passing customReason

amahussein · 2024-03-13T14:51:15Z

core/src/main/scala/com/nvidia/spark/rapids/tool/planparser/SQLPlanParser.scala

@@ -55,6 +56,7 @@ object UnsupportedReasons extends Enumeration {
      case IS_UNSUPPORTED => "Unsupported"
      case CONTAINS_UNSUPPORTED_EXPR => "Contains unsupported expr"
      case UNSUPPORTED_IO_FORMAT => "Unsupported IO format"
+      case CUSTOM_REASON => customReason


If we change the definition of the enum type, then we can get rid of the match-case statement and simoply call unsupportedReason.getReason()

amahussein · 2024-03-13T14:51:17Z

core/src/main/scala/com/nvidia/spark/rapids/tool/planparser/SQLPlanParser.scala

@@ -66,7 +68,8 @@ case class UnsupportedExecSummary(
    opType: OpTypes.OpType,
    reason: UnsupportedReasons.UnsupportedReason,
    opAction: OpActions.OpAction,
-    isExpression: Boolean = false) {
+    isExpression: Boolean = false,
+    customReason: String) {


As I suggested earlier, I am not a big fan of having reason, and customReason as argument

amahussein · 2024-03-13T14:53:34Z

core/src/test/scala/com/nvidia/spark/rapids/tool/qualification/QualificationSuite.scala

+          assert(exit == 0)
+          // the code above that runs the Spark query stops the Sparksession
+          // so create a new one to read in the csv file
+//          createSparkSession()


remove commented code

amahussein · 2024-03-13T14:58:38Z

core/src/main/scala/com/nvidia/spark/rapids/tool/planparser/SQLPlanParser.scala

        case UnsupportedReasons.CONTAINS_UDF => UnsupportedReasons.IS_UDF
        case UnsupportedReasons.CONTAINS_DATASET => UnsupportedReasons.IS_DATASET
        case UnsupportedReasons.UNSUPPORTED_IO_FORMAT => UnsupportedReasons.UNSUPPORTED_IO_FORMAT
        case _ => UnsupportedReasons.IS_UNSUPPORTED
      }
+


Is the above match-case missing custom_reason case?

CUSTOM_REASON check is not required here since it is assigning reasons for expressions. In the subsequent lines, we check per expression if it has any custom reason and assign it accordingly.

…ported_notes

cindyyuanjiang · 2024-03-19T06:35:59Z

core/src/test/scala/com/nvidia/spark/rapids/tool/qualification/QualificationSuite.scala

+          data.write.parquet(s"$outParquetFile/person_info")
+          val df = spark.read.parquet(s"$outParquetFile/person_info")
+          val dfToJson = df.withColumn("person_json", to_json($"person"))
+          dfToJson


nit: do we need variable dfToJson?

cindyyuanjiang · 2024-03-19T06:37:14Z

core/src/test/scala/com/nvidia/spark/rapids/tool/qualification/QualificationSuite.scala

+            eventLog))
+
+          val (exit, _) =
+            QualificationMain.mainInternal(appArgs)


could we put val (exit, _) = QualificationMain.mainInternal(appArgs) in one line?

parthosa · 2024-03-19T17:15:12Z

core/src/main/scala/com/nvidia/spark/rapids/tool/planparser/SQLPlanParser.scala

+  def getCustomReason(operator: String, unsSupportedOpsReasons: Map[String, String]): String =
+    unsSupportedOpsReasons.getOrElse(operator, "")


nit: typo extra s in unsSupportedOpsReasons

Removed this function altogether.

parthosa · 2024-03-19T17:29:13Z

core/src/main/scala/com/nvidia/spark/rapids/tool/planparser/SQLPlanParser.scala


-  def reportUnsupportedReason(unsupportedReason: UnsupportedReason, execValue: String): String = {
+  def reportUnsupportedReason(unsupportedReason: UnsupportedReason,
+      execValue: String, customReason: String): String = {


Probably this existed from before. But the argument execValue is not used anywhere in the method. We should remove it.

parthosa · 2024-03-19T18:03:19Z

core/src/main/scala/com/nvidia/spark/rapids/tool/qualification/PluginTypeChecker.scala

+   * @return A Map containing the processed operators.
+   */
+  def readOperators(source: BufferedSource, operatorType: String, isSupported: Boolean,
+      processLine: (Array[String], String, Boolean) => Seq[(String, String)]


All use cases of this method call the same function for processLine - processOperatorLine. In this case, should we remove this as argument?

Signed-off-by: Niranjan Artal <[email protected]>

amahussein

LGTM!
Thanks @nartal1 !

nartal1 added 2 commits March 12, 2024 17:43

Propagate reasons for operators disabled by default in the plugin

f9f2578

Signed-off-by: Niranjan Artal <[email protected]>

Merge branch 'dev' of github.com:NVIDIA/spark-rapids-tools into unsup…

365768a

…ported_notes

nartal1 added the bug Something isn't working label Mar 13, 2024

nartal1 requested review from parthosa, cindyyuanjiang and amahussein March 13, 2024 00:57

nartal1 self-assigned this Mar 13, 2024

amahussein requested changes Mar 13, 2024

View reviewed changes

nartal1 added the core_tools Scope the core module (scala) label Mar 13, 2024

Merge branch 'dev' of github.com:NVIDIA/spark-rapids-tools into unsup…

048cdb5

…ported_notes

cindyyuanjiang reviewed Mar 19, 2024

View reviewed changes

parthosa reviewed Mar 19, 2024

View reviewed changes

nartal1 added 2 commits March 19, 2024 18:05

addressed review comments

a2cb43b

Signed-off-by: Niranjan Artal <[email protected]>

address review comment

14f20af

nartal1 requested review from parthosa, cindyyuanjiang and amahussein March 20, 2024 01:21

amahussein approved these changes Mar 20, 2024

View reviewed changes

amahussein merged commit 1f477d1 into NVIDIA:dev Mar 20, 2024
13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Propagate Reason/Notes for operators disabled by default from plugin to Qualification tool unsupported operators csv file #850

Propagate Reason/Notes for operators disabled by default from plugin to Qualification tool unsupported operators csv file #850

nartal1 commented Mar 13, 2024

amahussein left a comment •

edited

Loading

amahussein Mar 13, 2024

nartal1 Mar 20, 2024

amahussein Mar 13, 2024

amahussein Mar 13, 2024

amahussein Mar 13, 2024

amahussein Mar 13, 2024

nartal1 Mar 20, 2024

cindyyuanjiang Mar 19, 2024

cindyyuanjiang Mar 19, 2024

parthosa Mar 19, 2024

nartal1 Mar 20, 2024

parthosa Mar 19, 2024

parthosa Mar 19, 2024

amahussein left a comment

		def getCustomReason(operator: String, unsSupportedOpsReasons: Map[String, String]): String =
		unsSupportedOpsReasons.getOrElse(operator, "")

Propagate Reason/Notes for operators disabled by default from plugin to Qualification tool unsupported operators csv file #850

Propagate Reason/Notes for operators disabled by default from plugin to Qualification tool unsupported operators csv file #850

Conversation

nartal1 commented Mar 13, 2024

amahussein left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amahussein left a comment

Choose a reason for hiding this comment

amahussein left a comment •

edited

Loading