-
Notifications
You must be signed in to change notification settings - Fork 236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Explore ways to not use HadoopFileLinesReader for CSV parsing #6
Labels
feature request
New feature or request
P1
Nice to have for release
performance
A performance related task/issue
SQL
part of the SQL/Dataframe plugin
Comments
revans2
added
feature request
New feature or request
? - Needs Triage
Need team to review and classify
SQL
part of the SQL/Dataframe plugin
performance
A performance related task/issue
labels
May 28, 2020
sameerz
changed the title
[FEA] explore ways not use HadoopFileLinesReader for CSV parseing
[FEA] Explore ways to not use HadoopFileLinesReader for CSV parsing
Oct 13, 2020
I filed rapidsai/cudf#6572 in cudf to try and support this. |
wjxiz1992
pushed a commit
to wjxiz1992/spark-rapids
that referenced
this issue
Oct 29, 2020
Update scala app version to 0.2.2
gerashegalov
pushed a commit
to gerashegalov/spark-rapids
that referenced
this issue
Nov 18, 2022
…tampNTZEnabled Fix errors caused by 340+ not working on DB
wjxiz1992
referenced
this issue
in nvliyuan/yuali-spark-rapids
Apr 26, 2024
* A hacky approach for regexpr rewrite Signed-off-by: Haoyang Li <[email protected]> * Use contains instead for that case Signed-off-by: Haoyang Li <[email protected]> * add config to switch Signed-off-by: Haoyang Li <[email protected]> * Rewrite some rlike expression to StartsWith/EndsWith/Contains Signed-off-by: Haoyang Li <[email protected]> * clean up Signed-off-by: Haoyang Li <[email protected]> * wip Signed-off-by: Haoyang Li <[email protected]> * wip Signed-off-by: Haoyang Li <[email protected]> * add tests and config Signed-off-by: Haoyang Li <[email protected]> --------- Signed-off-by: Haoyang Li <[email protected]>
wjxiz1992
referenced
this issue
in nvliyuan/yuali-spark-rapids
Apr 26, 2024
* A hacky approach for regexpr rewrite Signed-off-by: Haoyang Li <[email protected]> * Use contains instead for that case Signed-off-by: Haoyang Li <[email protected]> * add config to switch Signed-off-by: Haoyang Li <[email protected]> * Rewrite some rlike expression to StartsWith/EndsWith/Contains Signed-off-by: Haoyang Li <[email protected]> * clean up Signed-off-by: Haoyang Li <[email protected]> * wip Signed-off-by: Haoyang Li <[email protected]> * wip Signed-off-by: Haoyang Li <[email protected]> * add tests and config Signed-off-by: Haoyang Li <[email protected]> --------- Signed-off-by: Haoyang Li <[email protected]>
wjxiz1992
referenced
this issue
in nvliyuan/yuali-spark-rapids
Apr 26, 2024
* A hacky approach for regexpr rewrite Signed-off-by: Haoyang Li <[email protected]> * Use contains instead for that case Signed-off-by: Haoyang Li <[email protected]> * add config to switch Signed-off-by: Haoyang Li <[email protected]> * Rewrite some rlike expression to StartsWith/EndsWith/Contains Signed-off-by: Haoyang Li <[email protected]> * clean up Signed-off-by: Haoyang Li <[email protected]> * wip Signed-off-by: Haoyang Li <[email protected]> * wip Signed-off-by: Haoyang Li <[email protected]> * add tests and config Signed-off-by: Haoyang Li <[email protected]> --------- Signed-off-by: Haoyang Li <[email protected]>
sperlingxx
pushed a commit
to sperlingxx/spark-rapids
that referenced
this issue
May 16, 2024
Signed-off-by: Firestarman <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
feature request
New feature or request
P1
Nice to have for release
performance
A performance related task/issue
SQL
part of the SQL/Dataframe plugin
Is your feature request related to a problem? Please describe.
when parsing CSV currently the CPU will read through the data using the HadoopFileLinesReader and replace the line endings. It would be great from a performance standpoint to do a block copy of most of the data, and skip the line ending translation. This would require that the cudf CSV reader support line endings that are '\r', '\n', or '\r\n'. This is not a simple task but could reduce the CPU utilization significantly.
The text was updated successfully, but these errors were encountered: