Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLVM JIT interface #2277

Merged
merged 122 commits into from
May 10, 2018
Merged

LLVM JIT interface #2277

merged 122 commits into from
May 10, 2018

Conversation

pyos
Copy link
Contributor

@pyos pyos commented Apr 24, 2018

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

This pull request adds the capability to compile built-in functions with numeric (possibly nullable) arguments and return type to native code through llvm::IRBuilder. Compilable functions are automatically inlined into each other for a performance boost. The function can then also decide where to evaluate each of its inlined arguments, allowing for some laziness (non-compilable subexpressions still have to be evaluated eagerly).

This PR also includes implementations of the interface for most arithmetic and logic functions. Don't have any performance comparisons yet, though.

Known problems:

  • this code will break with LLVM 7 when it's released due to API incompatibilities;
  • LLVM has a ton of unused parameters in its header files, so I had to add -Wno-unused-parameter to clickhouse_functions;
  • always inlining compilable subexpressions completely undoes all of common subexpression eliminator's work — using a heuristic, e.g. thresholding based on the size of the subexpression's graph multiplied by the number of times it is used, might be better overall;
  • and has somewhat weird semantics in that and(false, null) is null. This means it's impossible for and to be lazy in the second argument if it's nullable. (and(x, non-nullable) and and(null, x) work fine, though.)

pyos added 15 commits April 25, 2018 13:37
Not actually implemented, though. It does print out some jit-compiled stuff,
but that's about it. For example, this query:

    select number from system.numbers where something(cast(number as Float64)) == 4

results in this on server's stderr:

    define double @"something(CAST(number, 'Float64'))"(void**, i8*, void*) {
    "something(CAST(number, 'Float64'))":
      ret double 1.234500e+04
    }

(and an exception, because that's what the non-jitted method does.)

As one may notice, this function neither reads the input (first argument;
tuple of arrays) nor writes the output (third argument; array), instead
returning some general nonsense.

In addition, `#if USE_EMBEDDED_COMPILER` doesn't work for some reason,
including LLVM headers requires -Wno-unused-parameter, this probably only
works on LLVM 5.0 due to rampant API instability, and I'm definitely
no expert on CMake. In short, there's still a long way to go.
The example from the previous commit doesn't need a cast to Float64 anymore.
It actually seems to work, so long as you only have one row that is. E.g.

    > select something(cast(number + 6 as Float64), cast(number + 2 as Float64)) from system.numbers limit 1';
    8

with this IR:

    define void @"something(CAST(plus(number, 6), 'Float64'), CAST(plus(number, 2), 'Float64'))"(void**, i8*, double*) {
    entry:
      %3 = load void*, void** %0
      %4 = bitcast void* %3 to double*
      %5 = load double, double* %4
      %6 = getelementptr void*, void** %0, i32 1
      %7 = load void*, void** %6
      %8 = bitcast void* %7 to double*
      %9 = load double, double* %8
      %10 = fadd double %5, %9
      store double %10, double* %2
      ret void
    }
I honestly can't tell if they work. LLVM has surprisingly bad API documentation.
Given that the list of supported types is hardcoded in
LLVMContext::Data::toNativeType, this method is redundant because
LLVMPreparedFunction can create a ColumnVector itself.
@pyos pyos force-pushed the llvm-jit branch 2 times, most recently from 8b900cc to 888d97c Compare April 25, 2018 11:23
std::vector<bool> redundant(actions.size());
// an empty optional is a poisoned value prohibiting the column's producer from being removed
// (which it could be, if it was inlined into every dependent function).
std::unordered_map<std::string, std::unordered_set<std::optional<size_t>>> current_dependents;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better use struct { bool is_used; std::unordered_set<size_t> dependents; } instead of std::unordered_set<std::optional<size_t>>


if (MAKE_STATIC_LIBRARIES)
# fix strange static error: undefined reference to 'std::error_category::~error_category()'
target_link_libraries(clickhouse-compiler-lib PUBLIC stdc++)
Copy link
Contributor

@proller proller Apr 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fix nedded only for old distributions (ubuntu trusty, xenial)
In ubuntu artful anything ok.
and this fix adds dependency to shared system lib - it unacceptable for our fully static package.

# TODO: global-disable no-unused-parameter
set_source_files_properties(src/Interpreters/ExpressionJIT.cpp PROPERTIES COMPILE_FLAGS "-Wno-unused-parameter -Wno-non-virtual-dtor")
else ()
list (REMOVE_ITEM dbms_sources src/Interpreters/ExpressionJIT.cpp)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use in this files:

#include <Common/config.h>
#if USE_EMBEDDED_COMPILER
...
#endif

template <typename T>
static bool typeIsA(const DataTypePtr & type)
{
if (auto * nullable = typeid_cast<const DataTypeNullable *>(type.get()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use removeNullable(type)


static MutableColumnPtr createNonNullableColumn(const DataTypePtr & type)
{
if (auto * nullable = typeid_cast<const DataTypeNullable *>(type.get()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removeNullable(type)->createColumn()


void LLVMPreparedFunction::executeImpl(Block & block, const ColumnNumbers & arguments, size_t result)
{
size_t block_size = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

block.rows()?

/// assume the column is a `ColumnVector<T>`. there's probably no good way to actually
/// check that at runtime, so let's just hope it's always true for columns containing types
/// for which `LLVMContext::Data::toNativeType` returns non-null.
columns[i] = column->getDataAt(0).data;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's very dangerous. What you can do is:

  1. Add IColumn::isColumnVector() and StringRef IColumn::getData() implemented for columns which store data in a single continuous memory segment.
    or
  2. Use TypeListNumbers::forEach and check typeid for each ColumnVector

Copy link
Contributor Author

@pyos pyos Apr 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IColumn::isFixedAndContiguous seems to be what I wanted here. The compiled loop could be extended to arbitrary columns for which this method returns true by passing a tuple of strides (e.g. string length for ColumnFixedString) instead of a tuple of "is constant" flags, though I'm not sure how this could affect loop auto-vectorization.

alexey-milovidov added a commit that referenced this pull request May 9, 2018
alexey-milovidov added a commit that referenced this pull request May 9, 2018
alexey-milovidov added a commit that referenced this pull request May 10, 2018
alexey-milovidov added a commit that referenced this pull request May 10, 2018
alexey-milovidov added a commit that referenced this pull request May 10, 2018
alexey-milovidov added a commit that referenced this pull request May 10, 2018
@alexey-milovidov alexey-milovidov merged commit e5ebc24 into ClickHouse:master May 10, 2018
alexey-milovidov added a commit that referenced this pull request May 10, 2018
alexey-milovidov added a commit that referenced this pull request May 10, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants