Skip to content

Variadic Preproc

ZengJingtao edited this page Jan 30, 2023 · 2 revisions

1. Introduction

  1. The C language has variadic functions, such as printf
  2. C++ 11 introduces variadic templates

Correspondingly, in the C preprocessor, "variable parameter macros" have been supported very early, for example:

#define MY_LOG(level, fmt, ...) \
  if (level > g_level) printf(fmt, ##__VA_ARGS__)

2. Problems

2.1. Parameter transformation

The aforementioned MY_LOG simply "forwards" its own parameters to printf, so can we make some changes to the parameters and then forward them to printf? For example, for std::string, we forward its .c_str().

#define SmartPrintf(fmt,...) some impl ...

We expect:

std::string dbname = ...;
Status status = DB::Open(dbname, ...);
if (!status.ok())
  SmartPrintf("DB::Open(%s) fail with status code = %d, msg = %s\n",
               dbname, status.code(), status.ToString());

where SmartPrintf can be equivalent to:

printf("DB::Open(%s) fail with status code = %d, msg = %s\n",
       dbname.c_str(), status.code(), status.ToString().c_str());

2.2. default parameters

For example the system call open:

int open(const char *pathname, int flags);
int open(const char *pathname, int flags, mode_t mode);

In fact, its function prototype is:

int open(const char *pathname, int flags, ...);

The mode parameter is only useful when creating a file. Often someone forgets to pass in the third parameter when creating a file, causing the file mode to become an inexplicable value (UB: undefined behavior).

So, can we define a macro SafeOpen to transfer open within the capabilities of the C language, even if two parameters are passed, it will not be UB?

2.3. Superpowers

For example, Enum Reflection uses the parameters of the macro multiple times and expands to a different form each time.

3. Solutions

The first step is to know where the capability boundary of preproc is, and everything must operate within this capability boundary.

3.1. The ability to use variadic macros

#define PP_ARG_X(_0,_1,_2,_3,_4,_5,_6,_7,_8,_9, \
           a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z, \
           A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W,X,Y,Z,XX,...) XX
#define PP_ARG_N(...) \
        PP_ARG_X("ignored", ##__VA_ARGS__, \
            Z,Y,X,W,V,U,T,S,R,Q,P,O,N,M,L,K,J,I,H,G,F,E,D,C,B,A, \
            z,y,x,w,v,u,t,s,r,q,p,o,n,m,l,k,j,i,h,g,f,e,d,c,b,a, \
                                            9,8,7,6,5,4,3,2,1,0)

PP_ARG_N(...) will expand to the number of parameters in the macro call, which uses the PP_ARG_X macro as an aid. PP_ARG_X has M+2 fixed parameters, plus a variable parameter list, which expands to the last of the fixed parameter list A parameter XX. When the variable parameter list __VA_ARGS__ passed to PP_ARG_X through PP_ARG_N has a length of N, the parameter XX of PP_ARG_X will expand to N, so we get the length of the __VA_ARGS__ variable parameter list.

Now, let's define another utility macro PP_VA_NAME:

#define PP_VA_NAME(prefix,...) \
        PP_CAT2(prefix,PP_ARG_N(__VA_ARGS__))
#define PP_CAT2(a,b)      PP_CAT2_1(a,b)
#define PP_CAT2_1(a,b)    a##b

This macro is used as dispatch, that is, if we define a series of macros or functions:

void func_0();
void func_1(int);
void func_2(int,int);
void func_2(int,int,int);
// more func_N ...
#define func(...) PP_VA_NAME(func_,__VA_ARGS__)(__VA_ARGS__)

in that way:

macro call macro expansion
PP_VA_NAME(func_) func_0
PP_VA_NAME(func_, a) func_1
PP_VA_NAME(func_, a, b) func_2
PP_VA_NAME(func_, a, b, c) func_3

Continuing:

macro call macro expansion (intermediate form) macro expansion (finally)
func() PP_VA_NAME(func_)() func_0()
func(a) PP_VA_NAME(func_, a)(a) func_1(a)
func(a, b) PP_VA_NAME(func_, a, b)(a, b) func_2(a, b)
func(a, b, c) PP_VA_NAME(func_, a, b, c)(a, b, c) func_3(a, b, c)

In this way, we can implement overload in C++ based only on the number of parameters within the scope of C language syntax. Therefore, we can directly solve the problem of mispassing 2 parameters when the system calls open to create a file:

// default mode = 0600
#define SafeOpen_2(pathname, flags) open(pathname, flags, 0600)
#define SafeOpen_3(pathname, flags, mode) open(pathname, flags, mode)
#define SafeOpen(...) PP_VA_NAME(SafeOpen_, __VA_ARGS__)(__VA_ARGS__)
// is equivalent to C++:
inline int SafeOpen(const char* pathname, int flags, int mode = 0600) {
  return open(pathname, flags, mode);
}

3.2. Parameter transformation

Now, let's implement a printf that is more convenient to use in C++. First, we implement a parameter transformation macro:

#define PP_MAP_0(m,c)
#define PP_MAP_1(m,c,x)     m(c,x)
#define PP_MAP_2(m,c,x,y)   m(c,x),m(c,y)
#define PP_MAP_3(m,c,x,y,z) m(c,x),m(c,y),m(c,z)
#define PP_MAP_4(m,c,x,...) m(c,x),PP_MAP_3(m,c,__VA_ARGS__)
#define PP_MAP_5(m,c,x,...) m(c,x),PP_MAP_4(m,c,__VA_ARGS__)
// more PP_MAP_...

#define PP_MAP(map,ctx,...) \
        PP_VA_NAME(PP_MAP_,__VA_ARGS__)(map,ctx,##__VA_ARGS__)

This PP_MAP transforms each macro argument x into m(c,x), assuming we have a function:

int map(void* context,int);
macro call macro expansion
PP_MAP(map, ctx, a) map(ctx, a)
PP_MAP(map, ctx, a, b) map(ctx, a), map(ctx, a, b)
PP_MAP(map, ctx, a, b, c) map(ctx, a), map(ctx, a, b), map(ctx, a, c)

Now, we can implement SmartPrint:

3.2. SmartPrintf (must use C++)

template<class T>
inline typename std::enable_if<std::is_fundamental<T>::value, T>::type
SmartData(T x) { return x; }

template<class Seq>
inline auto
SmartData(const Seq& s) -> decltype(s.data()) { return s.data(); }

template<class StdException>
inline auto
SmartData(const StdException& e) -> decltype(e.what()) { return e.what(); }

template<class T>
inline const T* SmartData(const T* x) { return x; }

#define PP_SmartList(...) \
    PP_MAP(PP_APPLY, SmartDataForPrintf, __VA_ARGS__)

#define SmartPrintf(fmt, ...) printf(fmt, PP_SmartList(__VA_ARGS__))
macro call macro expansion
SmartPrintf("str=%s\n", str) printf("num=%s\n", SmartData(str))
SmartPrintf("str=%s, num=%d\n", str, num) printf("str=%s, num=%d\n", SmartData(str), SmartData(num))

4. Use C++ to play tricks

The implementation of ToplingDB Enum Reflection uses this series of techniques.

Topling-zip also makes good use of [these tricks]https://github.com/topling/topling-zip/blob/7c574f1bb8a7d9f62283034e4697ef419a0d0e82/src/terark/util/function.hpp#L578), for example, we can use it like this:

struct MyData {
  string str;
  int num;
  double score;
  // more ...
};
vector<MyData> vec;
// read data to vec
auto beg = vec.begin(), end = vec.end();
sort(beg, end, TERARK_CMP(str, <, num, >, score, >));

This code is very intuitive. To sort vec, the sorting rules are:

  1. Firstly, according to str lexicographical order from small to large
  2. If the str fields are the same, according to num from large to small
  3. If both str and num are the same, according to score from big to small
Clone this wiki locally