-
Notifications
You must be signed in to change notification settings - Fork 907
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Improve exception message when unknown Parquet page encoding detected #14209
Comments
Hi, I'd like to try doing this task. |
This has not yet been fully addressed. Right now the error code for unsupported option is set during header parsing, and available on the host side in decode_page_headers. All that needs to be done is to test the returned error code for the correct bit, then do a transform_reduce on the page encoding field (after turning the encoding into a mask), and then check the set bits to find the offending encoding(s). Soon the only unsupported encoding will be byte_stream_split, so this hasn't been a high priority for me. Good first issue. |
I think @nvdbaranec 's #14360 makes this easy to address; see cudf/cpp/src/io/parquet/reader_impl_preprocess.cu Lines 421 to 426 in 7cd3b83
|
That's zombie code that was removed in #14237 😅 (https://github.com/rapidsai/cudf/pull/14237/files#diff-ebcd42136ddf31f8097631e00ca84c03abf1fa38f1eb1afbc50d3d207c7a7cac). Looks like it was missed in a merge somwhere along the line. |
?? |
I was talking about the |
Hi, from what I can see |
@ZelboK the current state of affairs is that when an unsupported encoding is detected, an error message is printed with an error code, which is a mask comprising the union of all errors detected. The current list of errors is here. What is desired is to print a list of which encodings triggered the message. |
Ah I see. I was fixated on the old Thanks |
@vuule @etseidl Hey folks, just wanted to know if there were more tasks I could do in cuIO, now that this one is done? I imagine most of the team will be on vacation if not already so. Just looking to learn more on my holidays, I don't mind more challenging tasks. Thanks! Also my apologies, I am assuming you two are on the cuIO side of things. Forgive me if I am wrong 😅 |
…oding is detected (#14453) Per #14209 this will list out unsupported encodings that were found. Authors: - Danial Javady (https://github.com/ZelboK) - Vukasin Milovanovic (https://github.com/vuule) - Nghia Truong (https://github.com/ttnghia) Approvers: - Nghia Truong (https://github.com/ttnghia) - Vukasin Milovanovic (https://github.com/vuule) - Ed Seidl (https://github.com/etseidl) URL: #14453
Thank you for offering further help! |
Hm,
Hm, would #14661 be a more challenging one? From what I see in that list, they are all labeled good first issues, I'd actually like to take something more challenging and avoid first issues. Honestly the harder the better. I'm hoping to learn as much as I can in the next few weeks so I aiming to actually do a few more |
None of the above are trivial, and many open issues are still open for a reason 😅 |
Thank you so much! I assumed that the good first issue label was for easier work. In that case, I'll do #14661 and depending on how that goes, hopefully I can pick a few more from the backlog. Thanks again :) |
Is your feature request related to a problem? Please describe.
A user of the RAPIDS Accelerator for Apache Spark reported the following exception:
The exception message from libcudf is not very helpful in that it says an unsupported page encoding was detected but not what that unexpected page encoding was (i.e.: the enum value). Without this information, we're left guessing what encoding was found in the file and usually have to request users to share a sample file to find out. Not all users are willing to share sample files.
Describe the solution you'd like
Exception messages for an unexpected/unsupported value should show the value as part of the exception message.
The text was updated successfully, but these errors were encountered: