-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-4036: [C++] Pluggable Status message, by exposing an abstract delegate class. #4484
Conversation
This also assumes subsystem codes don't map to top level codes (I think I would prefer they do, but I could see them going both ways). |
cpp/src/arrow/status.h
Outdated
@@ -92,6 +92,7 @@ enum class StatusCode : char { | |||
SerializationError = 11, | |||
PythonError = 12, | |||
RError = 13, | |||
SubsystemError = 14, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that's the right approach. If I have a Python ValueError
, I want it both to map to StatusCode::Invalid
and to retain the original Python exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I tend to agree. So two issues to improve on:
- top-level codes should be orthogonal from subsytem codes
- Potentially need the ability to directly link to underlying exception (e.g. PyObject for a python error)
Any other high level concepts/functionallty missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. I also think subsystem codes can be optional (for example, Python would just use the underlying exception instead of defining integer codes).
So instead of having a subsystem code, we could have an opaque optional detail instance, e.g.:
class StatusDetail {
public:
virtual ~StatusDetail();
virtual std::string ToString() const = 0;
};
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I came to the same conclusion after stepping away from the computer. Will try to update tonight.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated to just have a status detail, that wraps has a message accessor and unify the current implementation with a private implementation of the interface. I can see arguments for keeping a separate message member as well, so I'm happy to add it back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be in favour of having a separate message member, so that the detail can remain null if there's no error-specific detail to carry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added msg field. back.
cpp/src/arrow/status.h
Outdated
|
||
}; | ||
|
||
enum class Subsystem : char { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does Flight also warrant a subsystem? It would be nice to express errors like unauthorized, service unavailable/overloaded, etc. (tagging my work account @lihalite)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going to remove this abd go with the approach suggested by Antoine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lihalite when this lands, I think some codes should probably be top level (unavailable seems generic across multiple service type things) but some might be flight specific.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok, great. I started sketching out a set of codes for Flight/Java, I can pick up the C++/Python side too once this lands.
Codecov Report
@@ Coverage Diff @@
## master #4484 +/- ##
===========================================
+ Coverage 67.96% 89.15% +21.19%
===========================================
Files 676 588 -88
Lines 74307 70058 -4249
Branches 1253 0 -1253
===========================================
+ Hits 50501 62460 +11959
+ Misses 23559 7598 -15961
+ Partials 247 0 -247
Continue to review full report at Codecov.
|
If you want me to try to cleanup one of the subsystem (python?) in this PR let me know and I can give it a shot. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks nice on the principle. It would be useful to validate this by migrating the Gandiva, Plasma... status codes.
Will give this a try I'm not sure about downstream dependencies though. |
e20eb9b
to
c78f74d
Compare
ff5fa6e
to
814d55f
Compare
814d55f
to
93737ac
Compare
putting back in WIP status until I can get CI green (python tests passed for me locally) |
e2f9a14
to
12c5c61
Compare
12c5c61
to
8433843
Compare
@pitrou after a lot of debugging I figured out why the builds were failing only to discover you had fixed with with your PR to provide better errors (there was a copy of Status in a macro that I did not find initally). Thank you! I think this is ready to review. I'll file a follow-up JIRA once this is merged to migrade Gandiva away from this as well. |
Will review, thank you :-) |
I'm looking into the lint failure, but I'm guessing i wll need to make some changes anyways. |
(and restore it in check_status())
68c9d62
to
ea56d1e
Compare
I've done my thing on Python. Let's wait for CI a bit. |
@ursabot build test |
|
@ursabot build |
1 similar comment
@ursabot build |
The failed ursabot builds are unrelated, @kszucs is busy doing things on BuildBot. Travis-CI build: https://travis-ci.org/pitrou/arrow/builds/553376058 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple more things to change. I can tackle those tomorrow if you like.
return std::string("Python exception: ") + ty->tp_name; | ||
} | ||
|
||
void RestorePyError() const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for doing this, I started down this path and decided to keep it simpler because there were semantics of the GIL I was hazy on.
I'm going to merge once CI is green. |
…elegate class. This provides less "pluggability" but I think still offers a clean model for extension (subsystems can wrap the constructor for there purposes, and provide external static methods to check for particular types of errors). Author: Micah Kornfield <[email protected]> Author: Antoine Pitrou <[email protected]> Closes #4484 from emkornfield/status_code_proposal and squashes the following commits: 4d1ab8d <Micah Kornfield> don't import plasma errors directly into top level pyarrow module a66f999 <Micah Kornfield> make format 040216d <Micah Kornfield> fixes for comments outside python 729bba1 <Antoine Pitrou> Fix Py2 issues (hopefully) ea56d1e <Antoine Pitrou> Fix PythonErrorDetail to store Python error state (and restore it in check_status()) 21e1b95 <Micah Kornfield> fix compilation 9c905b0 <Micah Kornfield> fix lint 74d563c <Micah Kornfield> fixes 85786ef <Micah Kornfield> change messages 3626a90 <Micah Kornfield> try removing message a4e6a1f <Micah Kornfield> add logging for debug 4586fd1 <Micah Kornfield> fix typo 8f011b3 <Micah Kornfield> fix status propagation 317ea9c <Micah Kornfield> fix complie 9f59160 <Micah Kornfield> don't make_shared inline 484b3a2 <Micah Kornfield> style fix 14e3467 <Micah Kornfield> dont rely on rtti cd22df6 <Micah Kornfield> format dec4585 <Micah Kornfield> not-quite pluggable error codes
…elegate class. This provides less "pluggability" but I think still offers a clean model for extension (subsystems can wrap the constructor for there purposes, and provide external static methods to check for particular types of errors). Author: Micah Kornfield <[email protected]> Author: Antoine Pitrou <[email protected]> Closes #4484 from emkornfield/status_code_proposal and squashes the following commits: 4d1ab8d <Micah Kornfield> don't import plasma errors directly into top level pyarrow module a66f999 <Micah Kornfield> make format 040216d <Micah Kornfield> fixes for comments outside python 729bba1 <Antoine Pitrou> Fix Py2 issues (hopefully) ea56d1e <Antoine Pitrou> Fix PythonErrorDetail to store Python error state (and restore it in check_status()) 21e1b95 <Micah Kornfield> fix compilation 9c905b0 <Micah Kornfield> fix lint 74d563c <Micah Kornfield> fixes 85786ef <Micah Kornfield> change messages 3626a90 <Micah Kornfield> try removing message a4e6a1f <Micah Kornfield> add logging for debug 4586fd1 <Micah Kornfield> fix typo 8f011b3 <Micah Kornfield> fix status propagation 317ea9c <Micah Kornfield> fix complie 9f59160 <Micah Kornfield> don't make_shared inline 484b3a2 <Micah Kornfield> style fix 14e3467 <Micah Kornfield> dont rely on rtti cd22df6 <Micah Kornfield> format dec4585 <Micah Kornfield> not-quite pluggable error codes
revert some changes replace event loop with asio update plasma store protocol fix qualifiers update plasma store client protocol Remove all native socket operations. Implement general io support Fix bugs fix all compiling bugs fix bug Fix all tests. Add license header. try to fix cmake try to make asio standalone simplify code add license update url lint lint & fix fix restore entrypoint remove unused unix headers fix Update LICENSE fix rename handle signal move the function to its original place fix doc hide classes stop installing asio headers fix doc reverse changes minor fix tiny fix fix comments minor fix resolve conflicts fix optimize cmake fix update formatter fix according to github comments lint prevent the store from dying fix ARROW_CHECK Fix ARROW-4036: [C++] Pluggable Status message, by exposing an abstract delegate class. This provides less "pluggability" but I think still offers a clean model for extension (subsystems can wrap the constructor for there purposes, and provide external static methods to check for particular types of errors). Author: Micah Kornfield <[email protected]> Author: Antoine Pitrou <[email protected]> Closes apache#4484 from emkornfield/status_code_proposal and squashes the following commits: 4d1ab8d <Micah Kornfield> don't import plasma errors directly into top level pyarrow module a66f999 <Micah Kornfield> make format 040216d <Micah Kornfield> fixes for comments outside python 729bba1 <Antoine Pitrou> Fix Py2 issues (hopefully) ea56d1e <Antoine Pitrou> Fix PythonErrorDetail to store Python error state (and restore it in check_status()) 21e1b95 <Micah Kornfield> fix compilation 9c905b0 <Micah Kornfield> fix lint 74d563c <Micah Kornfield> fixes 85786ef <Micah Kornfield> change messages 3626a90 <Micah Kornfield> try removing message a4e6a1f <Micah Kornfield> add logging for debug 4586fd1 <Micah Kornfield> fix typo 8f011b3 <Micah Kornfield> fix status propagation 317ea9c <Micah Kornfield> fix complie 9f59160 <Micah Kornfield> don't make_shared inline 484b3a2 <Micah Kornfield> style fix 14e3467 <Micah Kornfield> dont rely on rtti cd22df6 <Micah Kornfield> format dec4585 <Micah Kornfield> not-quite pluggable error codes fix merge fix update update update update fix update fix update update revert some unknown comments rebase CMakeLists rebase eviction_policy.h rebase CMakeLists rebase
revert some changes replace event loop with asio update plasma store protocol fix qualifiers update plasma store client protocol Remove all native socket operations. Implement general io support Fix bugs fix all compiling bugs fix bug Fix all tests. Add license header. try to fix cmake try to make asio standalone simplify code add license update url lint lint & fix fix restore entrypoint remove unused unix headers fix Update LICENSE fix rename handle signal move the function to its original place fix doc hide classes stop installing asio headers fix doc reverse changes minor fix tiny fix fix comments minor fix resolve conflicts fix optimize cmake fix update formatter fix according to github comments lint prevent the store from dying fix ARROW_CHECK Fix ARROW-4036: [C++] Pluggable Status message, by exposing an abstract delegate class. This provides less "pluggability" but I think still offers a clean model for extension (subsystems can wrap the constructor for there purposes, and provide external static methods to check for particular types of errors). Author: Micah Kornfield <[email protected]> Author: Antoine Pitrou <[email protected]> Closes apache#4484 from emkornfield/status_code_proposal and squashes the following commits: 4d1ab8d <Micah Kornfield> don't import plasma errors directly into top level pyarrow module a66f999 <Micah Kornfield> make format 040216d <Micah Kornfield> fixes for comments outside python 729bba1 <Antoine Pitrou> Fix Py2 issues (hopefully) ea56d1e <Antoine Pitrou> Fix PythonErrorDetail to store Python error state (and restore it in check_status()) 21e1b95 <Micah Kornfield> fix compilation 9c905b0 <Micah Kornfield> fix lint 74d563c <Micah Kornfield> fixes 85786ef <Micah Kornfield> change messages 3626a90 <Micah Kornfield> try removing message a4e6a1f <Micah Kornfield> add logging for debug 4586fd1 <Micah Kornfield> fix typo 8f011b3 <Micah Kornfield> fix status propagation 317ea9c <Micah Kornfield> fix complie 9f59160 <Micah Kornfield> don't make_shared inline 484b3a2 <Micah Kornfield> style fix 14e3467 <Micah Kornfield> dont rely on rtti cd22df6 <Micah Kornfield> format dec4585 <Micah Kornfield> not-quite pluggable error codes fix merge fix update update update update fix update fix update update revert some unknown comments rebase CMakeLists rebase eviction_policy.h rebase CMakeLists rebase
revert some changes replace event loop with asio update plasma store protocol fix qualifiers update plasma store client protocol Remove all native socket operations. Implement general io support Fix bugs fix all compiling bugs fix bug Fix all tests. Add license header. try to fix cmake try to make asio standalone simplify code add license update url lint lint & fix fix restore entrypoint remove unused unix headers fix Update LICENSE fix rename handle signal move the function to its original place fix doc hide classes stop installing asio headers fix doc reverse changes minor fix tiny fix fix comments minor fix resolve conflicts fix optimize cmake fix update formatter fix according to github comments lint prevent the store from dying fix ARROW_CHECK Fix ARROW-4036: [C++] Pluggable Status message, by exposing an abstract delegate class. This provides less "pluggability" but I think still offers a clean model for extension (subsystems can wrap the constructor for there purposes, and provide external static methods to check for particular types of errors). Author: Micah Kornfield <[email protected]> Author: Antoine Pitrou <[email protected]> Closes apache#4484 from emkornfield/status_code_proposal and squashes the following commits: 4d1ab8d <Micah Kornfield> don't import plasma errors directly into top level pyarrow module a66f999 <Micah Kornfield> make format 040216d <Micah Kornfield> fixes for comments outside python 729bba1 <Antoine Pitrou> Fix Py2 issues (hopefully) ea56d1e <Antoine Pitrou> Fix PythonErrorDetail to store Python error state (and restore it in check_status()) 21e1b95 <Micah Kornfield> fix compilation 9c905b0 <Micah Kornfield> fix lint 74d563c <Micah Kornfield> fixes 85786ef <Micah Kornfield> change messages 3626a90 <Micah Kornfield> try removing message a4e6a1f <Micah Kornfield> add logging for debug 4586fd1 <Micah Kornfield> fix typo 8f011b3 <Micah Kornfield> fix status propagation 317ea9c <Micah Kornfield> fix complie 9f59160 <Micah Kornfield> don't make_shared inline 484b3a2 <Micah Kornfield> style fix 14e3467 <Micah Kornfield> dont rely on rtti cd22df6 <Micah Kornfield> format dec4585 <Micah Kornfield> not-quite pluggable error codes fix merge fix update update update update fix update fix update update revert some unknown comments rebase CMakeLists rebase eviction_policy.h rebase CMakeLists rebase
…pe (int/string) Pandas dataframe to pyarrow Table This PR homogenizes error messages for mixed-type `Pandas` inputs to `pa.Table`. The message for `Pandas` column with `int` followed by `string` is now ``` In [2]: table = pa.Table.from_pandas(pd.DataFrame({'a': [ 19, 'a']})) (... traceback...) ArrowInvalid: ('Could not convert a with type str: tried to convert to int', 'Conversion failed for column a with type object') ``` the same as for `double` followed by `string`: ``` In [3]: table = pa.Table.from_pandas(pd.DataFrame({'a': [ 19.0, 'a']})) (... traceback...) ArrowInvalid: ('Could not convert a with type str: tried to convert to double', 'Conversion failed for column a with type object') ``` As a side effect, this snippet [xref #5866, ARROW-7168] now throws an `ArrowInvalid` (has been `FutureWarning` since 0.16): ``` In [8]: cat = pd.Categorical.from_codes(np.array([0, 1], dtype='int8'), np.array(['a', 'b'], dtype=object)) ...: typ = pa.dictionary(index_type=pa.int8(), value_type=pa.int64()) ...: result = pa.array(cat, type=typ) (... traceback...) ArrowInvalid: Could not convert a with type str: tried to convert to int ``` Finally, this *does* break a test [xref #4484, ARROW-4036] - see code comment Closes #8044 from arw2019/ARROW-7663 Authored-by: arw2019 <[email protected]> Signed-off-by: Joris Van den Bossche <[email protected]>
…pe (int/string) Pandas dataframe to pyarrow Table This PR homogenizes error messages for mixed-type `Pandas` inputs to `pa.Table`. The message for `Pandas` column with `int` followed by `string` is now ``` In [2]: table = pa.Table.from_pandas(pd.DataFrame({'a': [ 19, 'a']})) (... traceback...) ArrowInvalid: ('Could not convert a with type str: tried to convert to int', 'Conversion failed for column a with type object') ``` the same as for `double` followed by `string`: ``` In [3]: table = pa.Table.from_pandas(pd.DataFrame({'a': [ 19.0, 'a']})) (... traceback...) ArrowInvalid: ('Could not convert a with type str: tried to convert to double', 'Conversion failed for column a with type object') ``` As a side effect, this snippet [xref apache#5866, ARROW-7168] now throws an `ArrowInvalid` (has been `FutureWarning` since 0.16): ``` In [8]: cat = pd.Categorical.from_codes(np.array([0, 1], dtype='int8'), np.array(['a', 'b'], dtype=object)) ...: typ = pa.dictionary(index_type=pa.int8(), value_type=pa.int64()) ...: result = pa.array(cat, type=typ) (... traceback...) ArrowInvalid: Could not convert a with type str: tried to convert to int ``` Finally, this *does* break a test [xref apache#4484, ARROW-4036] - see code comment Closes apache#8044 from arw2019/ARROW-7663 Authored-by: arw2019 <[email protected]> Signed-off-by: Joris Van den Bossche <[email protected]>
…pe (int/string) Pandas dataframe to pyarrow Table This PR homogenizes error messages for mixed-type `Pandas` inputs to `pa.Table`. The message for `Pandas` column with `int` followed by `string` is now ``` In [2]: table = pa.Table.from_pandas(pd.DataFrame({'a': [ 19, 'a']})) (... traceback...) ArrowInvalid: ('Could not convert a with type str: tried to convert to int', 'Conversion failed for column a with type object') ``` the same as for `double` followed by `string`: ``` In [3]: table = pa.Table.from_pandas(pd.DataFrame({'a': [ 19.0, 'a']})) (... traceback...) ArrowInvalid: ('Could not convert a with type str: tried to convert to double', 'Conversion failed for column a with type object') ``` As a side effect, this snippet [xref apache#5866, ARROW-7168] now throws an `ArrowInvalid` (has been `FutureWarning` since 0.16): ``` In [8]: cat = pd.Categorical.from_codes(np.array([0, 1], dtype='int8'), np.array(['a', 'b'], dtype=object)) ...: typ = pa.dictionary(index_type=pa.int8(), value_type=pa.int64()) ...: result = pa.array(cat, type=typ) (... traceback...) ArrowInvalid: Could not convert a with type str: tried to convert to int ``` Finally, this *does* break a test [xref apache#4484, ARROW-4036] - see code comment Closes apache#8044 from arw2019/ARROW-7663 Authored-by: arw2019 <[email protected]> Signed-off-by: Joris Van den Bossche <[email protected]>
This provides less "pluggability" but I think still offers a clean model for extension (subsystems can wrap the constructor for there purposes, and provide external static methods to check for particular types of errors).