-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-16154: [R] Errors which pass through handle_csv_read_error()
and handle_parquet_io_error()
need better error tracing
#12839
Conversation
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, thanks for doing this!
@@ -200,8 +200,8 @@ read_delim_arrow <- function(file, | |||
|
|||
tryCatch( | |||
tab <- reader$Read(), | |||
error = function(e) { | |||
handle_csv_read_error(e, schema) | |||
error = function(e, call = caller_env(n = 4)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it always n = 4
? Is there a more certain way to capture this? (Like, if you define call_env outside of tryCatch, is it just this env?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it always n = 4?
It's always n = 4
here, though I deliberately chose to pass the call
parameter into handle_csv_read_error()
so the function could be used elsewhere in the code where we may want to pass in a different environment.
Is there a more certain way to capture this? (Like, if you define call_env outside of tryCatch, is it just this env?)
I could call rlang::current_env()
above the tryCatch
block - I went for calling caller_env()
here as it felt "cleaner" to keep that code within this block here.
I suppose that if the tryCatch
block was changed to have more functions wrapped round it, then the number would be wrong; however, if we call current_env()
outside of the block, we're unnecessarily calling it every time we call the function, even if there's no error.
Not sure what's better - what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feels brittle but it's probably fine. I'd just leave in some comments explaining why n = 4
, that you could have used caller_env()
but this way is lazy/only does it if there's an error (aside: it's just calling parent.frame(), which on my machine takes in the hundreds of nanoseconds to run, so the cost of calling it every time is not something I'm concerned about).
We can revisit later if/when we want to chain together multiple error handlers. Also looks like rlang is growing some experimental tooling around here (https://rlang.r-lib.org/reference/try_fetch.html) so maybe that will mature and be ready whenever we revisit this.
In sum, seems like you've thought this through, so just leave a note explaining why this non-obvious thing is there and 👍 !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One request to add a version of what you responded as a code comment, but otherwise LGTM, nice work!
@@ -200,8 +200,8 @@ read_delim_arrow <- function(file, | |||
|
|||
tryCatch( | |||
tab <- reader$Read(), | |||
error = function(e) { | |||
handle_csv_read_error(e, schema) | |||
error = function(e, call = caller_env(n = 4)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feels brittle but it's probably fine. I'd just leave in some comments explaining why n = 4
, that you could have used caller_env()
but this way is lazy/only does it if there's an error (aside: it's just calling parent.frame(), which on my machine takes in the hundreds of nanoseconds to run, so the cost of calling it every time is not something I'm concerned about).
We can revisit later if/when we want to chain together multiple error handlers. Also looks like rlang is growing some experimental tooling around here (https://rlang.r-lib.org/reference/try_fetch.html) so maybe that will mature and be ready whenever we revisit this.
In sum, seems like you've thought this through, so just leave a note explaining why this non-obvious thing is there and 👍 !
Benchmark runs are scheduled for baseline = 681ede6 and contender = 5d5cceb. 5d5cceb is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
As discussed on #12826
Not sure how (if) to write tests but tried running it locally using the CSV directory set up in
test-dataset-csv.R
with and without this change, and without it, we get, e.g.and then with it: