Improve sync plugin supported CSV python script #919

cindyyuanjiang · 2024-04-09T23:26:44Z

Fixes #914

Changes

For rows which exist in tools CSV but not the plugin side, keep them in the final output
After generating report, post-process the dataframe results so that rows with S in Supported Column will have None for Notes in final output

Signed-off-by: cindyyuanjiang <[email protected]>

amahussein

Thanks @cindyyuanjiang

I have a question.
I tested the new changes on the most recent code and I got the report below.

In the generated files, The row InMemoryTableScanExec actually became None which is the correct value. Still, the report does not list it as one of the changes that has been done.
I am asking to see if this intentional? or did you miss that corner case?

**supportedExecs.csv (FROM TOOLS TO PLUGIN)**
Row is removed: MapInArrowExec, S, None, Input/Output, S, S, S, S, S, S, S, S, PS, S, NS, NS, NS, NS, PS, NS, PS, NS, NS, NS

**supportedExprs.csv (FROM TOOLS TO PLUGIN)**
Row is removed: EphemeralSubstring, S, `substr`; `substring`, None, project, str, NA, NA, NA, NA, NA, NA, NA, NA, NA, S, NA, NA, NS, NA, NA, NA, NA, NA, NS, NS
Row is removed: EphemeralSubstring, S, `substr`; `substring`, None, project, pos, NA, NA, NA, S, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NS, NS
Row is removed: EphemeralSubstring, S, `substr`; `substring`, None, project, len, NA, NA, NA, S, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NS, NS
Row is removed: EphemeralSubstring, S, `substr`; `substring`, None, project, result, NA, NA, NA, NA, NA, NA, NA, NA, NA, S, NA, NA, NS, NA, NA, NA, NA, NA, NS, NS

cindyyuanjiang · 2024-04-10T22:49:57Z

I have a question. I tested the new changes on the most recent code and I got the report below.

In the generated files, The row InMemoryTableScanExec actually became None which is the correct value. Still, the report does not list it as one of the changes that has been done. I am asking to see if this intentional? or did you miss that corner case?

Thanks @amahussein!

I didn't add this change in the report, but I added a note:
3. The "Notes" column for rows with "S" for "Supported" will be updated to "None" in the final output.

I was thinking of the report as a documentation of changes "from tools to plugin" without our own post-processing info. I can update this if we want to include those changes in the report.

amahussein · 2024-04-11T14:29:24Z

I have a question. I tested the new changes on the most recent code and I got the report below.
In the generated files, The row InMemoryTableScanExec actually became None which is the correct value. Still, the report does not list it as one of the changes that has been done. I am asking to see if this intentional? or did you miss that corner case?

Thanks @amahussein!

I didn't add this change in the report, but I added a note: 3. The "Notes" column for rows with "S" for "Supported" will be updated to "None" in the final output.

I was thinking of the report as a documentation of changes "from tools to plugin" without our own post-processing info. I can update this if we want to include those changes in the report.

mmm, isn't the report supposed to document the changes going into tools?
In other words it shows the difference between the newly generated files and the CSV files in tools.
This way, we can tell what are the changes being introduced by the new sync. It is like a "diff" but non-textual one.

Signed-off-by: cindyyuanjiang <[email protected]>

cindyyuanjiang · 2024-04-11T18:43:12Z

mmm, isn't the report supposed to document the changes going into tools? In other words it shows the difference between the newly generated files and the CSV files in tools. This way, we can tell what are the changes being introduced by the new sync. It is like a "diff" but non-textual one.

Thanks @amahussein! I have updated the script to reflect the changes to the report.

amahussein

Thanks @cindyyuanjiang !

I like it better now.
It reports several rows with updated "Notes" that we could not have seen before.

**supportedDataSource.csv (FROM TOOLS TO PLUGIN)**

**supportedExecs.csv (FROM TOOLS TO PLUGIN)**
Row is changed: InMemoryTableScanExec, S, This is disabled by default because there could be complications when using it with AQE with Spark-3.5.0 and Spark-3.5.1. For more details please check https://github.com/NVIDIA/spark-rapids/issues/10603, Input/Output, S, S, S, S, S, S, S, S, PS, S, S, NS, NS, NS, PS, PS, PS, NS, S, S
    Notes: This is disabled by default because there could be complications when using it with AQE with Spark-3.5.0 and Spark-3.5.1. For more details please check https://github.com/NVIDIA/spark-rapids/issues/10603 -> None
Row is removed: MapInArrowExec, S, None, Input/Output, S, S, S, S, S, S, S, S, PS, S, NS, NS, NS, NS, PS, NS, PS, NS, NS, NS

**supportedExprs.csv (FROM TOOLS TO PLUGIN)**
Row is changed: ArrayExcept, S, `array_except`, This is not 100% compatible with the Spark version because the GPU implementation treats -0.0 and 0.0 as equal; but the CPU implementation currently does not (see SPARK-39845). Also; Apache Spark 3.1.3 fixed issue SPARK-36741 where NaNs in these set like operators were not treated as being equal. We have chosen to break with compatibility for the older versions of Spark in this instance and handle NaNs the same as 3.1.3+, project, array1, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, PS, NA, NA, NA, NS, NS
    Notes: This is not 100% compatible with the Spark version because the GPU implementation treats -0.0 and 0.0 as equal; but the CPU implementation currently does not (see SPARK-39845). Also; Apache Spark 3.1.3 fixed issue SPARK-36741 where NaNs in these set like operators were not treated as being equal. We have chosen to break with compatibility for the older versions of Spark in this instance and handle NaNs the same as 3.1.3+ -> None
Row is changed: ArrayExcept, S, `array_except`, This is not 100% compatible with the Spark version because the GPU implementation treats -0.0 and 0.0 as equal; but the CPU implementation currently does not (see SPARK-39845). Also; Apache Spark 3.1.3 fixed issue SPARK-36741 where NaNs in these set like operators were not treated as being equal. We have chosen to break with compatibility for the older versions of Spark in this instance and handle NaNs the same as 3.1.3+, project, array2, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, PS, NA, NA, NA, NS, NS
    Notes: This is not 100% compatible with the Spark version because the GPU implementation treats -0.0 and 0.0 as equal; but the CPU implementation currently does not (see SPARK-39845). Also; Apache Spark 3.1.3 fixed issue SPARK-36741 where NaNs in these set like operators were not treated as being equal. We have chosen to break with compatibility for the older versions of Spark in this instance and handle NaNs the same as 3.1.3+ -> None
Row is changed: ArrayExcept, S, `array_except`, This is not 100% compatible with the Spark version because the GPU implementation treats -0.0 and 0.0 as equal; but the CPU implementation currently does not (see SPARK-39845). Also; Apache Spark 3.1.3 fixed issue SPARK-36741 where NaNs in these set like operators were not treated as being equal. We have chosen to break with compatibility for the older versions of Spark in this instance and handle NaNs the same as 3.1.3+, project, result, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, PS, NA, NA, NA, NS, NS
    Notes: This is not 100% compatible with the Spark version because the GPU implementation treats -0.0 and 0.0 as equal; but the CPU implementation currently does not (see SPARK-39845). Also; Apache Spark 3.1.3 fixed issue SPARK-36741 where NaNs in these set like operators were not treated as being equal. We have chosen to break with compatibility for the older versions of Spark in this instance and handle NaNs the same as 3.1.3+ -> None
Row is changed: ArrayIntersect, S, `array_intersect`, This is not 100% compatible with the Spark version because the GPU implementation treats -0.0 and 0.0 as equal; but the CPU implementation currently does not (see SPARK-39845). Also; Apache Spark 3.1.3 fixed issue SPARK-36741 where NaNs in these set like operators were not treated as being equal. We have chosen to break with compatibility for the older versions of Spark in this instance and handle NaNs the same as 3.1.3+, project, array1, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, PS, NA, NA, NA, NS, NS
    Notes: This is not 100% compatible with the Spark version because the GPU implementation treats -0.0 and 0.0 as equal; but the CPU implementation currently does not (see SPARK-39845). Also; Apache Spark 3.1.3 fixed issue SPARK-36741 where NaNs in these set like operators were not treated as being equal. We have chosen to break with compatibility for the older versions of Spark in this instance and handle NaNs the same as 3.1.3+ -> None
Row is changed: ArrayIntersect, S, `array_intersect`, This is not 100% compatible with the Spark version because the GPU implementation treats -0.0 and 0.0 as equal; but the CPU implementation currently does not (see SPARK-39845). Also; Apache Spark 3.1.3 fixed issue SPARK-36741 where NaNs in these set like operators were not treated as being equal. We have chosen to break with compatibility for the older versions of Spark in this instance and handle NaNs the same as 3.1.3+, project, array2, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, PS, NA, NA, NA, NS, NS
    Notes: This is not 100% compatible with the Spark version because the GPU implementation treats -0.0 and 0.0 as equal; but the CPU implementation currently does not (see SPARK-39845). Also; Apache Spark 3.1.3 fixed issue SPARK-36741 where NaNs in these set like operators were not treated as being equal. We have chosen to break with compatibility for the older versions of Spark in this instance and handle NaNs the same as 3.1.3+ -> None
Row is changed: ArrayIntersect, S, `array_intersect`, This is not 100% compatible with the Spark version because the GPU implementation treats -0.0 and 0.0 as equal; but the CPU implementation currently does not (see SPARK-39845). Also; Apache Spark 3.1.3 fixed issue SPARK-36741 where NaNs in these set like operators were not treated as being equal. We have chosen to break with compatibility for the older versions of Spark in this instance and handle NaNs the same as 3.1.3+, project, result, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, PS, NA, NA, NA, NS, NS
    Notes: This is not 100% compatible with the Spark version because the GPU implementation treats -0.0 and 0.0 as equal; but the CPU implementation currently does not (see SPARK-39845). Also; Apache Spark 3.1.3 fixed issue SPARK-36741 where NaNs in these set like operators were not treated as being equal. We have chosen to break with compatibility for the older versions of Spark in this instance and handle NaNs the same as 3.1.3+ -> None
Row is changed: ArrayUnion, S, `array_union`, This is not 100% compatible with the Spark version because the GPU implementation treats -0.0 and 0.0 as equal; but the CPU implementation currently does not (see SPARK-39845). Also; Apache Spark 3.1.3 fixed issue SPARK-36741 where NaNs in these set like operators were not treated as being equal. We have chosen to break with compatibility for the older versions of Spark in this instance and handle NaNs the same as 3.1.3+, project, array1, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, PS, NA, NA, NA, NS, NS
    Notes: This is not 100% compatible with the Spark version because the GPU implementation treats -0.0 and 0.0 as equal; but the CPU implementation currently does not (see SPARK-39845). Also; Apache Spark 3.1.3 fixed issue SPARK-36741 where NaNs in these set like operators were not treated as being equal. We have chosen to break with compatibility for the older versions of Spark in this instance and handle NaNs the same as 3.1.3+ -> None
Row is changed: ArrayUnion, S, `array_union`, This is not 100% compatible with the Spark version because the GPU implementation treats -0.0 and 0.0 as equal; but the CPU implementation currently does not (see SPARK-39845). Also; Apache Spark 3.1.3 fixed issue SPARK-36741 where NaNs in these set like operators were not treated as being equal. We have chosen to break with compatibility for the older versions of Spark in this instance and handle NaNs the same as 3.1.3+, project, array2, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, PS, NA, NA, NA, NS, NS
    Notes: This is not 100% compatible with the Spark version because the GPU implementation treats -0.0 and 0.0 as equal; but the CPU implementation currently does not (see SPARK-39845). Also; Apache Spark 3.1.3 fixed issue SPARK-36741 where NaNs in these set like operators were not treated as being equal. We have chosen to break with compatibility for the older versions of Spark in this instance and handle NaNs the same as 3.1.3+ -> None
Row is changed: ArrayUnion, S, `array_union`, This is not 100% compatible with the Spark version because the GPU implementation treats -0.0 and 0.0 as equal; but the CPU implementation currently does not (see SPARK-39845). Also; Apache Spark 3.1.3 fixed issue SPARK-36741 where NaNs in these set like operators were not treated as being equal. We have chosen to break with compatibility for the older versions of Spark in this instance and handle NaNs the same as 3.1.3+, project, result, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, PS, NA, NA, NA, NS, NS
    Notes: This is not 100% compatible with the Spark version because the GPU implementation treats -0.0 and 0.0 as equal; but the CPU implementation currently does not (see SPARK-39845). Also; Apache Spark 3.1.3 fixed issue SPARK-36741 where NaNs in these set like operators were not treated as being equal. We have chosen to break with compatibility for the older versions of Spark in this instance and handle NaNs the same as 3.1.3+ -> None
Row is changed: ArraysOverlap, S, `arrays_overlap`, This is not 100% compatible with the Spark version because the GPU implementation treats -0.0 and 0.0 as equal; but the CPU implementation currently does not (see SPARK-39845). Also; Apache Spark 3.1.3 fixed issue SPARK-36741 where NaNs in these set like operators were not treated as being equal. We have chosen to break with compatibility for the older versions of Spark in this instance and handle NaNs the same as 3.1.3+, project, array1, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, PS, NA, NA, NA, NS, NS
    Notes: This is not 100% compatible with the Spark version because the GPU implementation treats -0.0 and 0.0 as equal; but the CPU implementation currently does not (see SPARK-39845). Also; Apache Spark 3.1.3 fixed issue SPARK-36741 where NaNs in these set like operators were not treated as being equal. We have chosen to break with compatibility for the older versions of Spark in this instance and handle NaNs the same as 3.1.3+ -> None
Row is changed: ArraysOverlap, S, `arrays_overlap`, This is not 100% compatible with the Spark version because the GPU implementation treats -0.0 and 0.0 as equal; but the CPU implementation currently does not (see SPARK-39845). Also; Apache Spark 3.1.3 fixed issue SPARK-36741 where NaNs in these set like operators were not treated as being equal. We have chosen to break with compatibility for the older versions of Spark in this instance and handle NaNs the same as 3.1.3+, project, array2, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, PS, NA, NA, NA, NS, NS
    Notes: This is not 100% compatible with the Spark version because the GPU implementation treats -0.0 and 0.0 as equal; but the CPU implementation currently does not (see SPARK-39845). Also; Apache Spark 3.1.3 fixed issue SPARK-36741 where NaNs in these set like operators were not treated as being equal. We have chosen to break with compatibility for the older versions of Spark in this instance and handle NaNs the same as 3.1.3+ -> None
Row is changed: ArraysOverlap, S, `arrays_overlap`, This is not 100% compatible with the Spark version because the GPU implementation treats -0.0 and 0.0 as equal; but the CPU implementation currently does not (see SPARK-39845). Also; Apache Spark 3.1.3 fixed issue SPARK-36741 where NaNs in these set like operators were not treated as being equal. We have chosen to break with compatibility for the older versions of Spark in this instance and handle NaNs the same as 3.1.3+, project, result, S, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NS, NS
    Notes: This is not 100% compatible with the Spark version because the GPU implementation treats -0.0 and 0.0 as equal; but the CPU implementation currently does not (see SPARK-39845). Also; Apache Spark 3.1.3 fixed issue SPARK-36741 where NaNs in these set like operators were not treated as being equal. We have chosen to break with compatibility for the older versions of Spark in this instance and handle NaNs the same as 3.1.3+ -> None
Row is changed: InitCap, S, `initcap`, This is not 100% compatible with the Spark version because the Unicode version used by cuDF and the JVM may differ; resulting in some corner-case characters not changing case correctly., project, input, NA, NA, NA, NA, NA, NA, NA, NA, NA, S, NA, NA, NA, NA, NA, NA, NA, NA, NS, NS
    Notes: This is not 100% compatible with the Spark version because the Unicode version used by cuDF and the JVM may differ; resulting in some corner-case characters not changing case correctly. -> None
Row is changed: InitCap, S, `initcap`, This is not 100% compatible with the Spark version because the Unicode version used by cuDF and the JVM may differ; resulting in some corner-case characters not changing case correctly., project, result, NA, NA, NA, NA, NA, NA, NA, NA, NA, S, NA, NA, NA, NA, NA, NA, NA, NA, NS, NS
    Notes: This is not 100% compatible with the Spark version because the Unicode version used by cuDF and the JVM may differ; resulting in some corner-case characters not changing case correctly. -> None
Row is changed: Lower, S, `lcase`; `lower`, This is not 100% compatible with the Spark version because the Unicode version used by cuDF and the JVM may differ; resulting in some corner-case characters not changing case correctly., project, input, NA, NA, NA, NA, NA, NA, NA, NA, NA, S, NA, NA, NA, NA, NA, NA, NA, NA, NS, NS
    Notes: This is not 100% compatible with the Spark version because the Unicode version used by cuDF and the JVM may differ; resulting in some corner-case characters not changing case correctly. -> None
Row is changed: Lower, S, `lcase`; `lower`, This is not 100% compatible with the Spark version because the Unicode version used by cuDF and the JVM may differ; resulting in some corner-case characters not changing case correctly., project, result, NA, NA, NA, NA, NA, NA, NA, NA, NA, S, NA, NA, NA, NA, NA, NA, NA, NA, NS, NS
    Notes: This is not 100% compatible with the Spark version because the Unicode version used by cuDF and the JVM may differ; resulting in some corner-case characters not changing case correctly. -> None
Row is changed: StringTranslate, S, `translate`, This is not 100% compatible with the Spark version because the GPU implementation supports all unicode code points. In Spark versions < 3.2.0; translate() does not support unicode characters with code point >= U+10000 (See SPARK-34094), project, input, NA, NA, NA, NA, NA, NA, NA, NA, NA, S, NA, NA, NA, NA, NA, NA, NA, NA, NS, NS
    Notes: This is not 100% compatible with the Spark version because the GPU implementation supports all unicode code points. In Spark versions < 3.2.0; translate() does not support unicode characters with code point >= U+10000 (See SPARK-34094) -> None
Row is changed: StringTranslate, S, `translate`, This is not 100% compatible with the Spark version because the GPU implementation supports all unicode code points. In Spark versions < 3.2.0; translate() does not support unicode characters with code point >= U+10000 (See SPARK-34094), project, from, NA, NA, NA, NA, NA, NA, NA, NA, NA, PS, NA, NA, NA, NA, NA, NA, NA, NA, NS, NS
    Notes: This is not 100% compatible with the Spark version because the GPU implementation supports all unicode code points. In Spark versions < 3.2.0; translate() does not support unicode characters with code point >= U+10000 (See SPARK-34094) -> None
Row is changed: StringTranslate, S, `translate`, This is not 100% compatible with the Spark version because the GPU implementation supports all unicode code points. In Spark versions < 3.2.0; translate() does not support unicode characters with code point >= U+10000 (See SPARK-34094), project, to, NA, NA, NA, NA, NA, NA, NA, NA, NA, PS, NA, NA, NA, NA, NA, NA, NA, NA, NS, NS
    Notes: This is not 100% compatible with the Spark version because the GPU implementation supports all unicode code points. In Spark versions < 3.2.0; translate() does not support unicode characters with code point >= U+10000 (See SPARK-34094) -> None
Row is changed: StringTranslate, S, `translate`, This is not 100% compatible with the Spark version because the GPU implementation supports all unicode code points. In Spark versions < 3.2.0; translate() does not support unicode characters with code point >= U+10000 (See SPARK-34094), project, result, NA, NA, NA, NA, NA, NA, NA, NA, NA, S, NA, NA, NA, NA, NA, NA, NA, NA, NS, NS
    Notes: This is not 100% compatible with the Spark version because the GPU implementation supports all unicode code points. In Spark versions < 3.2.0; translate() does not support unicode characters with code point >= U+10000 (See SPARK-34094) -> None
Row is changed: Upper, S, `ucase`; `upper`, This is not 100% compatible with the Spark version because the Unicode version used by cuDF and the JVM may differ; resulting in some corner-case characters not changing case correctly., project, input, NA, NA, NA, NA, NA, NA, NA, NA, NA, S, NA, NA, NA, NA, NA, NA, NA, NA, NS, NS
    Notes: This is not 100% compatible with the Spark version because the Unicode version used by cuDF and the JVM may differ; resulting in some corner-case characters not changing case correctly. -> None
Row is changed: Upper, S, `ucase`; `upper`, This is not 100% compatible with the Spark version because the Unicode version used by cuDF and the JVM may differ; resulting in some corner-case characters not changing case correctly., project, result, NA, NA, NA, NA, NA, NA, NA, NA, NA, S, NA, NA, NA, NA, NA, NA, NA, NA, NS, NS
    Notes: This is not 100% compatible with the Spark version because the Unicode version used by cuDF and the JVM may differ; resulting in some corner-case characters not changing case correctly. -> None
Row is changed: ApproximatePercentile, S, `approx_percentile`; `percentile_approx`, This is not 100% compatible with the Spark version because the GPU implementation of approx_percentile is not bit-for-bit compatible with Apache Spark, aggregation, input, NA, S, S, S, S, S, S, NS, NS, NA, S, NA, NA, NA, NA, NA, NA, NA, NS, NS
    Notes: This is not 100% compatible with the Spark version because the GPU implementation of approx_percentile is not bit-for-bit compatible with Apache Spark -> None
Row is changed: ApproximatePercentile, S, `approx_percentile`; `percentile_approx`, This is not 100% compatible with the Spark version because the GPU implementation of approx_percentile is not bit-for-bit compatible with Apache Spark, aggregation, percentage, NA, NA, NA, NA, NA, NA, S, NA, NA, NA, NA, NA, NA, NA, S, NA, NA, NA, NS, NS
    Notes: This is not 100% compatible with the Spark version because the GPU implementation of approx_percentile is not bit-for-bit compatible with Apache Spark -> None
Row is changed: ApproximatePercentile, S, `approx_percentile`; `percentile_approx`, This is not 100% compatible with the Spark version because the GPU implementation of approx_percentile is not bit-for-bit compatible with Apache Spark, aggregation, accuracy, NA, NA, NA, S, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NS, NS
    Notes: This is not 100% compatible with the Spark version because the GPU implementation of approx_percentile is not bit-for-bit compatible with Apache Spark -> None
Row is changed: ApproximatePercentile, S, `approx_percentile`; `percentile_approx`, This is not 100% compatible with the Spark version because the GPU implementation of approx_percentile is not bit-for-bit compatible with Apache Spark, aggregation, result, NA, S, S, S, S, S, S, NS, NS, NA, S, NA, NA, NA, PS, NA, NA, NA, NS, NS
    Notes: This is not 100% compatible with the Spark version because the GPU implementation of approx_percentile is not bit-for-bit compatible with Apache Spark -> None
Row is changed: ApproximatePercentile, S, `approx_percentile`; `percentile_approx`, This is not 100% compatible with the Spark version because the GPU implementation of approx_percentile is not bit-for-bit compatible with Apache Spark, reduction, input, NA, S, S, S, S, S, S, NS, NS, NA, S, NA, NA, NA, NA, NA, NA, NA, NS, NS
    Notes: This is not 100% compatible with the Spark version because the GPU implementation of approx_percentile is not bit-for-bit compatible with Apache Spark -> None
Row is changed: ApproximatePercentile, S, `approx_percentile`; `percentile_approx`, This is not 100% compatible with the Spark version because the GPU implementation of approx_percentile is not bit-for-bit compatible with Apache Spark, reduction, percentage, NA, NA, NA, NA, NA, NA, S, NA, NA, NA, NA, NA, NA, NA, S, NA, NA, NA, NS, NS
    Notes: This is not 100% compatible with the Spark version because the GPU implementation of approx_percentile is not bit-for-bit compatible with Apache Spark -> None
Row is changed: ApproximatePercentile, S, `approx_percentile`; `percentile_approx`, This is not 100% compatible with the Spark version because the GPU implementation of approx_percentile is not bit-for-bit compatible with Apache Spark, reduction, accuracy, NA, NA, NA, S, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NS, NS
    Notes: This is not 100% compatible with the Spark version because the GPU implementation of approx_percentile is not bit-for-bit compatible with Apache Spark -> None
Row is changed: ApproximatePercentile, S, `approx_percentile`; `percentile_approx`, This is not 100% compatible with the Spark version because the GPU implementation of approx_percentile is not bit-for-bit compatible with Apache Spark, reduction, result, NA, S, S, S, S, S, S, NS, NS, NA, S, NA, NA, NA, PS, NA, NA, NA, NS, NS
    Notes: This is not 100% compatible with the Spark version because the GPU implementation of approx_percentile is not bit-for-bit compatible with Apache Spark -> None
Row is removed: EphemeralSubstring, S, `substr`; `substring`, None, project, str, NA, NA, NA, NA, NA, NA, NA, NA, NA, S, NA, NA, NS, NA, NA, NA, NA, NA, NS, NS
Row is removed: EphemeralSubstring, S, `substr`; `substring`, None, project, pos, NA, NA, NA, S, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NS, NS
Row is removed: EphemeralSubstring, S, `substr`; `substring`, None, project, len, NA, NA, NA, S, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NS, NS
Row is removed: EphemeralSubstring, S, `substr`; `substring`, None, project, result, NA, NA, NA, NA, NA, NA, NA, NA, NA, S, NA, NA, NS, NA, NA, NA, NA, NA, NS, NS

cindyyuanjiang added 3 commits April 9, 2024 15:56

preserved removed rows in output and minor fixe

a940b98

Signed-off-by: cindyyuanjiang <[email protected]>

Merge branch 'dev' into spark-rapids-tools-914

ce85a06

fixed report indentation

dc2e8fd

Signed-off-by: cindyyuanjiang <[email protected]>

cindyyuanjiang added bug Something isn't working core_tools Scope the core module (scala) labels Apr 9, 2024

cindyyuanjiang self-assigned this Apr 9, 2024

update comments

5e546f9

Signed-off-by: cindyyuanjiang <[email protected]>

cindyyuanjiang requested review from amahussein, parthosa and nartal1 April 9, 2024 23:31

amahussein reviewed Apr 10, 2024

View reviewed changes

cindyyuanjiang requested a review from amahussein April 10, 2024 23:40

preprocess union df to resolve inconsistencies

b782b47

Signed-off-by: cindyyuanjiang <[email protected]>

amahussein approved these changes Apr 12, 2024

View reviewed changes

amahussein merged commit 26f5f85 into NVIDIA:dev Apr 12, 2024
15 checks passed

cindyyuanjiang deleted the spark-rapids-tools-914 branch April 12, 2024 17:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve sync plugin supported CSV python script #919

Improve sync plugin supported CSV python script #919

cindyyuanjiang commented Apr 9, 2024

amahussein left a comment

cindyyuanjiang commented Apr 10, 2024 •

edited

Loading

amahussein commented Apr 11, 2024

cindyyuanjiang commented Apr 11, 2024

amahussein left a comment

Improve sync plugin supported CSV python script #919

Improve sync plugin supported CSV python script #919

Conversation

cindyyuanjiang commented Apr 9, 2024

amahussein left a comment

Choose a reason for hiding this comment

cindyyuanjiang commented Apr 10, 2024 • edited Loading

amahussein commented Apr 11, 2024

cindyyuanjiang commented Apr 11, 2024

amahussein left a comment

Choose a reason for hiding this comment

cindyyuanjiang commented Apr 10, 2024 •

edited

Loading