-
Notifications
You must be signed in to change notification settings - Fork 435
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[GLUTEN-3582] Support PageIndex (#4634)
* Fix typo (cherry picked from commit c3fbf13) * 1. using FutureSetFromTuple instead of FutureSetFromStorage. FutureSetFromTuple can buildOrderedSetInplace automatocally, FutureSetFromStorage need set Sizelimits mannually 2. Support PageIndex, set spark.gluten.sql.columnar.backend.ch.runtime_config.use_local_format to true again. 3. Remove skipped test * refactor gtest * fix build due to #4664 * v2 for finding performance issue * Refactor: add ParquetFileReaderExtBase add readColumnChunkPageBase simpilefy build read remove redundant codes reemove current_row_group_ std::vector<int32_t> row_groups_ => std::deque<int32_t> row_groups_ std::vector<std::unique_ptr<RowRanges>> row_group_row_ranges_ => std::unordered_map<int32_t, std::unique_ptr<RowRanges>> row_group_row_ranges_ std::vector<std::unique_ptr<ColumnIndexStore>> row_group_column_index_stores_ => std::unordered_map<int32_t, std::unique_ptr<ColumnIndexStore>> row_group_column_index_stores_; remove std::vector<std::unique_ptr<parquet::RowGroupMetaData>> row_group_metas_; remove std::vector<std::shared_ptr<parquet::RowGroupPageIndexReader>> row_group_index_readers_ * new loop * Cleanup * Cleanup * Revert: fix build due to #4664 * support case_insensitive_column_matching of parquet (cherry picked from commit bce0c6668d7bb397127eefeac1943d4c02cf79dc) * fix case_insensitive_column_matching issue fix a stupid bug! add testDataPath getTpcdsDataPath() => tpcdsDataPath getClickHouseLibPath() => clickHouseLibPath * add benchmark (cherry picked from commit bb0267135243ff8ad980b0521d8302e150a2c4e4) * lowercase first letter of function name (cherry picked from commit 98dc9a79bf4f372ecabcac9b47aa06cd328f1aa4) * add comments (cherry picked from commit 2fb41831f4e338503ff620ce5eac9917bdb68f6a) * Remove Camel case member variable (cherry picked from commit 1ace73205a033e14ca1659f063eb1df65c3e9969) * Use Int32 instead of int32_t (cherry picked from commit e7d8fbe701fcd92fb6cb167686602561adc26ec4) * Camel case for function name (cherry picked from commit 1ee0516e2eadf045b4aec63de67cf5cb97810217) * add ColumnIndexFilterPtr alias (cherry picked from commit 1e9cdd3b08eb4e026a739ee558e9c2dd0c4c88fb) * using RowRangesMap = absl::flat_hash_map<Int32, std::unique_ptr<RowRanges>>; using ColumnIndexStoreMap = absl::flat_hash_map<Int32, std::unique_ptr<ColumnIndexStore>>; (cherry picked from commit 610fcd038d24d54fa30bcc40ab0d4d39f60dd0c4) * fix style (cherry picked from commit 8d85db48fe1c93dbc05404aa580b3f11de94c51d) * fix benchmark due to #4995 * fix build due to ClickHouse/ClickHouse#61502 * fix assertion failed in Debug Build
- Loading branch information
1 parent
e977d79
commit 85c2d9d
Showing
50 changed files
with
4,341 additions
and
567 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
77 changes: 77 additions & 0 deletions
77
...se/src/test/scala/org/apache/spark/sql/gluten/parquet/GlutenParquetColumnIndexSuite.scala
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,77 @@ | ||
/* | ||
* Licensed to the Apache Software Foundation (ASF) under one or more | ||
* contributor license agreements. See the NOTICE file distributed with | ||
* this work for additional information regarding copyright ownership. | ||
* The ASF licenses this file to You under the Apache License, Version 2.0 | ||
* (the "License"); you may not use this file except in compliance with | ||
* the License. You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, | ||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
package org.apache.spark.sql.gluten.parquet | ||
|
||
import io.glutenproject.execution.{FileSourceScanExecTransformer, GlutenClickHouseWholeStageTransformerSuite} | ||
import io.glutenproject.utils.UTSystemParameters | ||
|
||
import org.apache.spark.SparkConf | ||
import org.apache.spark.internal.Logging | ||
import org.apache.spark.sql.DataFrame | ||
import org.apache.spark.sql.gluten.test.GlutenSQLTestUtils | ||
import org.apache.spark.sql.internal.SQLConf | ||
|
||
case class ParquetData(parquetDir: String, filter: String, scanOutput: Long) | ||
|
||
class GlutenParquetColumnIndexSuite | ||
extends GlutenClickHouseWholeStageTransformerSuite | ||
with GlutenSQLTestUtils | ||
with Logging { | ||
|
||
override protected val fileFormat: String = "parquet" | ||
private val testPath: String = s"${UTSystemParameters.testDataPath}/$fileFormat" | ||
|
||
// TODO: we need refactor compareResultsAgainstVanillaSpark to make customCheck accept | ||
// both gluten and vanilla spark dataframe | ||
private val parquetData = Seq( | ||
ParquetData( | ||
"index/tpch/20003", | ||
"`27` <> '1-URGENT' and `9` >= '1995-01-01' and `9` < '1996-01-01' ", | ||
140000), | ||
ParquetData( | ||
"index/tpch/upper_case", | ||
"c_comment = '! requests wake. (...)ructions. furiousl'", | ||
12853) | ||
) | ||
|
||
parquetData.foreach { | ||
data => | ||
test(s"${data.parquetDir}") { | ||
val parquetDir = s"$testPath/${data.parquetDir}" | ||
val sql1 = s"""|select count(*) from $fileFormat.`$parquetDir` | ||
|where ${data.filter} | ||
|""".stripMargin | ||
compareResultsAgainstVanillaSpark( | ||
sql1, | ||
compareResult = true, | ||
checkScanOutput(data.scanOutput, _)) | ||
} | ||
} | ||
|
||
private def checkScanOutput(scanOutput: Long, df: DataFrame): Unit = { | ||
val chScanPlan = df.queryExecution.executedPlan.collect { | ||
case scan: FileSourceScanExecTransformer => scan | ||
} | ||
assertResult(1)(chScanPlan.length) | ||
val chFileScan = chScanPlan.head | ||
assertResult(scanOutput)(chFileScan.longMetric("numOutputRows").value) | ||
} | ||
override protected def sparkConf: SparkConf = | ||
super.sparkConf | ||
.set(SQLConf.ADAPTIVE_EXECUTION_ENABLED, false) | ||
.set("spark.gluten.sql.columnar.backend.ch.runtime_config.use_local_format", "true") | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.