Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EXP] ARROW-4036: [C++] Pluggable status codes #3199

Closed

Conversation

pitrou
Copy link
Member

@pitrou pitrou commented Dec 17, 2018

This approach uses static registration. I recommend comparing with other approaches.

This approach uses static registration.
@@ -77,4 +80,39 @@ TEST(StatusTest, AndStatus) {
ASSERT_TRUE(res.IsInvalid());
}

static const char* test_messages1[] = {"First error", "Second error"};
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See example use here.

std::string msg;
};
// OK status has a `NULL` state_. Otherwise, `state_` points to
// a `State` structure containing the error code and message(s)
State* state_;

static std::unordered_map<uint64_t, const StatusRange*> ranges_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we're needing more and more a global arrow::init().

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you afraid of the static initialization here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, seems likely that we'll need more and more controlled static initialization:

  • Utf8 utils make use of call_once
  • This patch (which other static initializers might depend in proper order, but controllable since we're referencing with int32/int64)
  • Gandiva make use of call_once, I think this one can be tricky due to possible failures and the time it takes to init, e.g. it's friendlier to the user if he has visibility (time take, failure, ...)

@emkornfield
Copy link
Contributor

@pitrou @fsaintjacques what is the status of this?

@pitrou
Copy link
Member Author

pitrou commented May 29, 2019

@emkornfield It's completely stale, and noone is actively working on it. I still think it's a good idea, though, because we have more and more subsystems (Python, R, Flight, etc.).

One useful extension of this would be to have optional arbitrary data embedded in the error state, for example if an error comes from a Python exception, it would be useful to record the original Python exception - especially if the Status is converted back into a Python error, so that the traceback doesn't get lost.

@wesm
Copy link
Member

wesm commented May 29, 2019

@pitrou @emkornfield I agree that it's a good idea, as long as it doesn't cause any performance issues. I see this introduces unordered_map into the includes which may want to be avoided based on recent patches =)

@emkornfield
Copy link
Contributor

@wesm @pitrou I agree it could be useful I guess I was asking what the blockers are? Should we close this PR and formulate a new one? @pitrou you mentioned prioritization on an ML thread, if you would like me to take this on, I can put it on my backlog (no performance regression might be challenging). Is there a clear list of requirements.

@pitrou
Copy link
Member Author

pitrou commented May 30, 2019

@emkornfield There are no blockers per se. There are several constraints:

  • make expansion easy (C++ doesn't help us here)
  • make inspection of a Status easy (for example test easily whether it's a Python-originated Status)
  • make instantiation of a specialized Status easy
  • not break performance in the nominal case (i.e. when the Status is ok): this is primarily a speed of construction and destruction issue

So this is a study in C++ API design.

@emkornfield
Copy link
Contributor

Is full pluggabiity necessary? I think it would be good to keep the types of errors limited. Would a status that has the following traits suffice?

  1. Origin language
  2. Arbitrary bytes to store details of origin language exception
  3. Possibly a few more type codes, for things not covered?

@pitrou
Copy link
Member Author

pitrou commented Jun 4, 2019

  1. It's not only "language", but any optional subsystem such as Gandiva, Plasma (or Flight?). Currently there's a bunch of Gandiva-specific-and Plasma-specific codes in status.h. They shouldn't be there.
  2. Hopefully we can do something more structured (even if it's just an opaque base class for subsystem-specific "details") to avoid serialization by hand
  3. Yes, perhaps we can find a comprehensive enough set of general type codes, and then have subsystem-specific subcodes or details where desired. For example, in Python you would have both the generic Arrow code (e.g. a Python ValueError gets turned into an Arrow Invalid), but the Status object would also keep a strong reference to the original Python exception.

If you have something precise in mind, perhaps best is to try and sketch an implementation, and post another PR?

@emkornfield
Copy link
Contributor

@pitrou started a sketch at: #4484 I don't think it handle #3 (strong reference to the exception, but I was imagining it could be serialized instead).

@emkornfield
Copy link
Contributor

@pitrou do you want to keep this open? I hope to have #4484 working some time soon?

@pitrou
Copy link
Member Author

pitrou commented Jun 27, 2019

No, this can be closed.

@pitrou pitrou closed this Jun 27, 2019
@pitrou pitrou deleted the ARROW-4036-status-codes-pluggable branch June 27, 2019 08:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants