Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Propagate Reason/Notes for operators disabled by default from plugin to Qualification tool unsupported operators csv file #850

Merged
merged 5 commits into from
Mar 20, 2024

Conversation

nartal1
Copy link
Collaborator

@nartal1 nartal1 commented Mar 13, 2024

This fixes #769

Currently we have Enum with static values for specifying Reasons for unsupported operators. In this PR, added another value customReason which gets assigned dynamically from the unsupported operators. We save the operators which are disabled by default along with the notes/reasons in a Map. And when we encounter those operators while parsing unsupported operators, we assign the reason from the Map.

Refactored the code so that we could use the common functions for getting both supported operators and operators that are disabled by default(Not supported). Added unit test to test to_json mentioned in the issue.

@nartal1 nartal1 added the bug Something isn't working label Mar 13, 2024
@nartal1 nartal1 self-assigned this Mar 13, 2024
Copy link
Collaborator

@amahussein amahussein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @nartal1 !

  • I think we can simplify the code if we improve the enum class implementation. so we want to have to keep passing strings around.
  • For the remaining part related on how to report the reason. The code changes here is again relying on passing a Map[String,String] all the way down to the report. This is going to take us back again to unstructured bridge between reporting and logic. We had this problem once before when we relied on strings to build the unsupported-CSV file instead of using the objects. For instance, if we later decide to extend that implementation to indicate whether a specific operator is not supported due to the dataTypes, then the Map[String,String] won't work for us and we end up with two different implementations to fill in the reason details.
  • My suggestion is to make the changes in the SqlPlanParser while constructing the Execs. In that part of teh code we have access to the pluginTypeChecker, and all other required info to set the "ReasonID" into the exec. If you had already considered that approach and found blockers, then we can discuss it further offline.


def reportUnsupportedReason(unsupportedReason: UnsupportedReason, execValue: String): String = {
def reportUnsupportedReason(unsupportedReason: UnsupportedReason,
execValue: String, customReason: String): String = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can define a user-defined enum so that CUSTOM_REASON can be initialized by the "reason".
And the enum can have a function to return the string.
In that case, we won't have to explicitly pass customReason as it becomes part of the object instant.

P.S: As a bonus we can cache the CUSTOM_REASON if they match in the same literal string so we won't allocate too many objects.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @amahussein ! Updated the code to use user-defined enum and cache it if is already visited. Removed passing customReason

@@ -55,6 +56,7 @@ object UnsupportedReasons extends Enumeration {
case IS_UNSUPPORTED => "Unsupported"
case CONTAINS_UNSUPPORTED_EXPR => "Contains unsupported expr"
case UNSUPPORTED_IO_FORMAT => "Unsupported IO format"
case CUSTOM_REASON => customReason
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we change the definition of the enum type, then we can get rid of the match-case statement and simoply call unsupportedReason.getReason()

@@ -66,7 +68,8 @@ case class UnsupportedExecSummary(
opType: OpTypes.OpType,
reason: UnsupportedReasons.UnsupportedReason,
opAction: OpActions.OpAction,
isExpression: Boolean = false) {
isExpression: Boolean = false,
customReason: String) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I suggested earlier, I am not a big fan of having reason, and customReason as argument

assert(exit == 0)
// the code above that runs the Spark query stops the Sparksession
// so create a new one to read in the csv file
// createSparkSession()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove commented code

case UnsupportedReasons.CONTAINS_UDF => UnsupportedReasons.IS_UDF
case UnsupportedReasons.CONTAINS_DATASET => UnsupportedReasons.IS_DATASET
case UnsupportedReasons.UNSUPPORTED_IO_FORMAT => UnsupportedReasons.UNSUPPORTED_IO_FORMAT
case _ => UnsupportedReasons.IS_UNSUPPORTED
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the above match-case missing custom_reason case?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CUSTOM_REASON check is not required here since it is assigning reasons for expressions. In the subsequent lines, we check per expression if it has any custom reason and assign it accordingly.

@nartal1 nartal1 added the core_tools Scope the core module (scala) label Mar 13, 2024
data.write.parquet(s"$outParquetFile/person_info")
val df = spark.read.parquet(s"$outParquetFile/person_info")
val dfToJson = df.withColumn("person_json", to_json($"person"))
dfToJson
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: do we need variable dfToJson?

eventLog))

val (exit, _) =
QualificationMain.mainInternal(appArgs)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we put val (exit, _) = QualificationMain.mainInternal(appArgs) in one line?

Comment on lines 131 to 132
def getCustomReason(operator: String, unsSupportedOpsReasons: Map[String, String]): String =
unsSupportedOpsReasons.getOrElse(operator, "")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: typo extra s in unsSupportedOpsReasons

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed this function altogether.


def reportUnsupportedReason(unsupportedReason: UnsupportedReason, execValue: String): String = {
def reportUnsupportedReason(unsupportedReason: UnsupportedReason,
execValue: String, customReason: String): String = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably this existed from before. But the argument execValue is not used anywhere in the method. We should remove it.

* @return A Map containing the processed operators.
*/
def readOperators(source: BufferedSource, operatorType: String, isSupported: Boolean,
processLine: (Array[String], String, Boolean) => Seq[(String, String)]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All use cases of this method call the same function for processLine - processOperatorLine. In this case, should we remove this as argument?

Copy link
Collaborator

@amahussein amahussein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!
Thanks @nartal1 !

@amahussein amahussein merged commit 1f477d1 into NVIDIA:dev Mar 20, 2024
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working core_tools Scope the core module (scala)
Projects
None yet
4 participants