Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[logging] simplify and optimize CHECK_** macro #619

Merged
merged 1 commit into from
May 29, 2020
Merged

[logging] simplify and optimize CHECK_** macro #619

merged 1 commit into from
May 29, 2020

Conversation

okuvshynov
Copy link
Contributor

https://github.com/dmlc/xgboost/blob/91c646392db01714a4f69c33170fb0857b95664d/src/data/data.cc#L653-L658

This CHECK_GE is expensive and makes xgboost data loading ~15% slower. Call to LogCheck_GE was not inlined by GCC, but forcing inlining it would be also problematic, as the function was big. This change makes is more efficient and simplifies the code at the same time. Please let me know if I'm missing some cases affected by this simplification (if any).

Brief summary:

  1. avoid expensive function call. Keep LogCheck_ function itself, but limit its scope to comparison only, this one will be inlined by sensible compiler. It's still easier to keep the function rather than just do comparison in the macro itself, because of -Wsign-compare pragmas;
  2. move full message construction to the macro definition;
  3. get rid of LogCheckError class entirely;
  4. remove the preprocessor branch for _LIBCPP_SGX_NO_IOSTREAMS, as
    nothing was going to be printed anywhere anyway in that case, it would all go to DummyOStream;
  5. add a test case for signed/unsigned comparison.

Data loading time, higgs dataset, 10 attempts. Both were based on most recent dmlc-core.

base new
9.55223 7.65502
9.34264 7.8101
9.15207 7.7555
8.99823 8.23572
9.20671 7.85532
9.16312 8.95983
9.27071 8.81042
9.27828 8.26593
9.39034 7.90145
9.61453 8.06095

Also see dmlc/xgboost#5722

@tqchen
Copy link
Member

tqchen commented May 29, 2020

cc @junrushao1994

if (x op y) return LogCheckError(); \
return LogCheckError("Error."); \
inline bool LogCheck##name(const X& x, const Y& y) { \
return (x op y); \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we always inline this check? given that this is now small

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Is it correct that dmlc-core needs to support pre cpp11? In cpp11 it should be easy to put the condition into the macro itself, and guard just that condition with ignored "-Wsign-compare".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now requires cpp11. However, we do need to make sure that the code is compiler agnostic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

putting it directly to macro appears trickier than i thought, because

  1. macro doesn't expand to complete statement, so that one can write CHECK_GE(a, b) << "bad thing";
  2. we probably want to keep GCC diagnostic ignored "-Wsign-compare" around condition itself only.

One option could be to create a temporary variable, so the code in CHECK_BINARY_OP changes to something like this:

-if (!(dmlc::LogCheck##name(x, y))) 
+_Pragma("GCC diagnostic push");
+_Pragma("GCC diagnostic ignored "-Wsign-compare"");
+ auto some_unique_name##__LINE__ = !((x) op (y));
+_Pragma("GCC diagnostic pop");
+if (some_unique_name##__LINE__) 

This doesn't look nice, + name collisions are still possible; __COUNTER__ is not part of the standard.

Another option is to define always_inline attribute, similar to how it's done for noinline here:

#if defined(_MSC_VER)
#define DMLC_NO_INLINE __declspec(noinline)
#else
#define DMLC_NO_INLINE __attribute__((noinline))
#endif

and mark the function with that attribute.

Which would you prefer?
Any other options I'm missing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let us go with always inline then

1) avoid expensive function call, but keep LogCheck_ call itself. This one will be inlined by sensible compiler, and it's easier to wrap the comparison in -Wsign-compare pragmas in case of a separately defined function.
2) move full message construction to the macro definition
3) get rid of LogCheckError class entirely
4) remove the preprocessor branch for _LIBCPP_SGX_NO_IOSTREAMS, as
nothing was going to be printed anywhere anyway in that case, it would all go to DummyOStream.
5) performance win due to no call. (25% of data loading time for xgboost).
@tqchen tqchen merged commit ae98c55 into dmlc:master May 29, 2020
@tqchen
Copy link
Member

tqchen commented May 29, 2020

Thanks @okuvshynov !

@junrushao
Copy link
Member

This is an interesting finding, thank you @okuvshynov!

So the root of the problem is that LogCheck_GE didn't get inlined, probably because they are large, and the solution is to change LogCheckXX from class to a simplest function and construct the error messages manually. It makes sense to me, but this is also a bit interesting that GCC doesn't choose to inline those classes...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants