-
Notifications
You must be signed in to change notification settings - Fork 519
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[logging] simplify and optimize CHECK_** macro #619
Conversation
include/dmlc/logging.h
Outdated
if (x op y) return LogCheckError(); \ | ||
return LogCheckError("Error."); \ | ||
inline bool LogCheck##name(const X& x, const Y& y) { \ | ||
return (x op y); \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we always inline this check? given that this is now small
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Is it correct that dmlc-core needs to support pre cpp11? In cpp11 it should be easy to put the condition into the macro itself, and guard just that condition with ignored "-Wsign-compare".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We now requires cpp11. However, we do need to make sure that the code is compiler agnostic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
putting it directly to macro appears trickier than i thought, because
- macro doesn't expand to complete statement, so that one can write CHECK_GE(a, b) << "bad thing";
- we probably want to keep
GCC diagnostic ignored "-Wsign-compare"
around condition itself only.
One option could be to create a temporary variable, so the code in CHECK_BINARY_OP changes to something like this:
-if (!(dmlc::LogCheck##name(x, y)))
+_Pragma("GCC diagnostic push");
+_Pragma("GCC diagnostic ignored "-Wsign-compare"");
+ auto some_unique_name##__LINE__ = !((x) op (y));
+_Pragma("GCC diagnostic pop");
+if (some_unique_name##__LINE__)
This doesn't look nice, + name collisions are still possible; __COUNTER__
is not part of the standard.
Another option is to define always_inline attribute, similar to how it's done for noinline here:
Lines 239 to 243 in dc0156a
#if defined(_MSC_VER) | |
#define DMLC_NO_INLINE __declspec(noinline) | |
#else | |
#define DMLC_NO_INLINE __attribute__((noinline)) | |
#endif |
and mark the function with that attribute.
Which would you prefer?
Any other options I'm missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let us go with always inline then
1) avoid expensive function call, but keep LogCheck_ call itself. This one will be inlined by sensible compiler, and it's easier to wrap the comparison in -Wsign-compare pragmas in case of a separately defined function. 2) move full message construction to the macro definition 3) get rid of LogCheckError class entirely 4) remove the preprocessor branch for _LIBCPP_SGX_NO_IOSTREAMS, as nothing was going to be printed anywhere anyway in that case, it would all go to DummyOStream. 5) performance win due to no call. (25% of data loading time for xgboost).
Thanks @okuvshynov ! |
This is an interesting finding, thank you @okuvshynov! So the root of the problem is that |
https://github.com/dmlc/xgboost/blob/91c646392db01714a4f69c33170fb0857b95664d/src/data/data.cc#L653-L658
This CHECK_GE is expensive and makes xgboost data loading ~15% slower. Call to LogCheck_GE was not inlined by GCC, but forcing inlining it would be also problematic, as the function was big. This change makes is more efficient and simplifies the code at the same time. Please let me know if I'm missing some cases affected by this simplification (if any).
Brief summary:
nothing was going to be printed anywhere anyway in that case, it would all go to DummyOStream;
Data loading time, higgs dataset, 10 attempts. Both were based on most recent dmlc-core.
Also see dmlc/xgboost#5722