Skip to content

Understanding UDF Holder Scalar Replacement

Paul Rogers edited this page Jan 14, 2018 · 7 revisions

Why Scalar Replacement?

Drill uses holder objects to pass values around the generated code, and to pass data into or out of functions (both Drill built-in functions and UDFs.) The holders require that Java create a new instance for each holder for each data row. While Java is very efficient in this operation, the original Drill designers thought they could do even better by rewriting byte codes to replace the holders with simple Java variables (so-called "scalar replacement".)

Since Java 8, the Java JIT compiler performs scalar replacement quite effectively. However, Drill was designed for (and still requires) Java 7, and so Drill implements its own, ad-hoc version of scalar replacement. The mechanism takes the compiled generated code as input, then rewrites the Java byte codes to replace the holder objects with simple Java primitives. Here is a highly simplified example.

Example - Before Scalar Replacement

Suppose we start with the example we used earlier. Earlier, we used a useful fiction: that the generated code gets and sets holders. In reality, although such methods exist in Drill, the generated code does not use them. Also, Drill nullable vectors default to NULL, so we need set a value only if it is not NULL. Let's rewrite the example to be closer to the real generated code:

public void eval(int rowIndex) {

  NullableBigIntHolder bHolder = new NullableBigIntHolder();
  NullableBigIntHolder cHolder = new NullableBigIntHolder();
  NullableBigIntHolder dHolder = new NullableBigIntHolder();

  bHolder.isSet = bVector.getAccessor().isSet(rowIndex);
  cHolder.isSet = cVector.getAccessor().isSet(rowIndex);
  if (bHolder.isSet == 0 || cHolder.isSet == 0) {
    dHolder.isSet = 0;
  else {
    bHolder.value = bVector.getAccessor().get(rowIndex);
    cHolder.value = cVector.getAccessor().get(rowIndex);
    BigIntHolder in1 = bHolder;
    BigIntHolder in2 = cHolder;
    BigIntHolder out = dHolder;
    { // Inlined BigIntBigIntAdd
      out.value = (long) (in1.value + in2.value);
    }
    dHolder.isSet = 1;
    dHolder.value = out.value;
    dVector.getMutator().set(rowIndex, dHolder.value);
  }
}

Example - After Scalar Replacement

Drill will rewrite the byte codes to be the similar to the following:

public void eval(int rowIndex) {

  long bHolder_value;
  int bHolder_isSet;
  long cHolder_value;
  int cHolder_isSet;
  long dHolder_value;
  int dHolder_isSet;

  bHolder_isSet = bVector.getAccessor().isSet(rowIndex);
  cHolder_isSet = cVector.getAccessor().isSet(rowIndex);
  if (bHolder_isSet == 0 || cHolder_isSet == 0) {
    dHolder_isSet = 0;
  else {
    bHolder_value = bVector.getAccessor().get(rowIndex);
    cHolder_value = cVector.getAccessor().get(rowIndex);
    int in1_value = bHolder_value;
    int in2_value = cHolder_value;
    int out_value;
    { // Inlined BigIntBigIntAdd
      out_value = (long) (in1_value + in2_value);
    }
    dHolder_value = out_value;
    dHolder_isSet = 1;
    dVector.getMutator().set(rowIndex, dHolder_value);
  }
}

The code shown above has quite a bit of redundancy: variables copied from one set of variables to another. The Java JIT optimizer will detect and remove this redundancy.

Lessons for UDF Writers

Note that you can never see the above code: no source code exists; the work was done at the byte-code level. This also partly explains why you can't step through your UDF code in the debugger: the code that Drill executes is two steps removed from the code that you wrote. (Step 1: your code is copied into Drill's generated code. Step 2: the compiled code is rewritten to do scalar replacement.)

The above example shows why we have the two rules for holders (don't call methods, don't pass them to other functions). Doing so will prevent Drill from performing scalar replacement. (The mechanism will detect these references and will skip the replacement if they are found.)

Clone this wiki locally