-
Notifications
You must be signed in to change notification settings - Fork 236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test parquet predicate pushdown for basic types and fields having dots in names [databricks] #9128
Test parquet predicate pushdown for basic types and fields having dots in names [databricks] #9128
Conversation
build |
withTempPath { path => | ||
withSQLConf( | ||
SQLConf.PARQUET_OUTPUT_TIMESTAMP_TYPE.key -> "TIMESTAMP_MICROS", | ||
"spark.rapids.sql.test.enabled" -> "false", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
val TEST_CONF = conf("spark.rapids.sql.test.enabled")
.doc("Intended to be used by unit tests, if enabled all operations must run on the " +
"GPU or an error happens.")
.internal()
.booleanConf
.createWithDefault(false)
If enable GPU, we should set this as true. Or we may enconter this kind of case:
The test passed, but it's running on CPU silently, and we expect it's running on GPU.
Changes would like:
if (writeGpu || readGpu) {
spark.rapids.sql.test.enabled = true
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I think it should be
"spark.rapdis.slq.test.enabled" -> (!writeGpu).toString
or else if we are reading on the GPU, but writing on the CPU we will run into a problem where the we get an error for having things not be on the GPU.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to
"spark.rapids.sql.test.enabled" -> writeGpu.toString
and
"spark.rapids.sql.test.enabled" -> readGpu.toString
to check all operations running on GPU during GPU read and write separately.
And removed binary test because we don't support binaryType right now.
I also notice that ParquetFilterSuite has testing for the following operators:
And I notice Not sure if it's necessary to add above tests. |
Can we mark #9119 is invalid? |
} | ||
} | ||
|
||
def withAllParquetReaders(code: => Unit): Unit = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: This only matters when reading on the CPU. The GPU ignores this.
withTempPath { path => | ||
withSQLConf( | ||
SQLConf.PARQUET_OUTPUT_TIMESTAMP_TYPE.key -> "TIMESTAMP_MICROS", | ||
"spark.rapids.sql.test.enabled" -> "false", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I think it should be
"spark.rapdis.slq.test.enabled" -> (!writeGpu).toString
or else if we are reading on the GPU, but writing on the CPU we will run into a problem where the we get an error for having things not be on the GPU.
SQLConf.PARQUET_FILTER_PUSHDOWN_DATE_ENABLED.key -> "true", | ||
SQLConf.PARQUET_FILTER_PUSHDOWN_TIMESTAMP_ENABLED.key -> "true", | ||
SQLConf.PARQUET_FILTER_PUSHDOWN_DECIMAL_ENABLED.key -> "true", | ||
"spark.rapids.sql.test.enabled" -> "false", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here we should update this one similarly to the comment above.
"spark.rapdis.slq.test.enabled" -> (!readGpu).toString,
Added [databricks] in the title to also test databricks. |
Signed-off-by: Haoyang Li <[email protected]>
build |
Updated the timestamp test, V2 source also needs |
build |
Closes #9127
Closes #9094
This PR adds some tests to test parquet predicate pushdown for basic types and fields with dots in names. And also a follow on to orc's ppd test, using `assume' instead of commenting out failed cases.
For PPD testing, these cases are tested:
Some contexts: #9119