Skip to content

Commit

Permalink
Support Numba Runtime in RBC (#531)
Browse files Browse the repository at this point in the history
---------

Co-authored-by: Pearu Peterson <[email protected]>
  • Loading branch information
guilhermeleobas and pearu authored Mar 23, 2023
1 parent db80b6c commit f816e23
Show file tree
Hide file tree
Showing 28 changed files with 4,127 additions and 159 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/rbc_test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ jobs:
python-version: '3.10'
numba-version: '0.55'

needs: [lint]
needs: [lint, heavydb]

steps:
- name: Checkout code
Expand Down
8 changes: 4 additions & 4 deletions doc/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -87,10 +87,10 @@
html_theme_options = {
"github_url": "https://github.com/xnd-project/rbc",
"use_edit_page_button": True,
"logo": {
"image_light": html_logo,
"image_dark": html_logo,
},
"logo": {
"image_light": html_logo,
"image_dark": html_logo,
},
# https://github.com/pydata/pydata-sphinx-theme/issues/1220
"icon_links": [],
}
Expand Down
5 changes: 5 additions & 0 deletions doc/envvars.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,8 @@ Debugging

If set to non-zero, prints out all possible debug information
during function compilation and remote execution.

.. envvar:: RBC_DEBUG_NRT

If set to non-zero, insert debug statements to our implementation
of Numba Runtime (NRT)
1 change: 1 addition & 0 deletions doc/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,4 +22,5 @@ and llvmlite tools.
api
releases
developer
nrt
envvars
169 changes: 169 additions & 0 deletions doc/nrt.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,169 @@

Numba Runtime
=============

RBC includes a simplified implementation of the Numba Runtime written in pure
LLVM, which differs from the original C-written implementation. This approach
enables RBC to extend its support to a broader range of Python datatypes,
including lists, sets, and strings, which was not possible with the previous
implementation. Python ``dict`` is not supported as its implementation is
written in C.


List of NRT functions implemented
---------------------------------

The RBC project includes some functions from the NRT module, with changes made
as necessary to fit our specific use case. Here is a list of the functions we
have implemented, along with any relevant comments.

* ✔️ ``NRT_MemInfo_init``
* ✔️ ``NRT_MemInfo_alloc``
* ✔️ ``NRT_MemInfo_alloc_safe``
* ✔️ ``NRT_MemInfo_alloc_dtor``
* ✔️ ``NRT_MemInfo_alloc_dtor_safe``
* ✔️ ``NRT_Allocate``
* ✔️ ``NRT_Allocate_External``
* ✔️ ``NRT_MemInfo_call_dtor``
* ✔️ ``NRT_incref`` - Memory is managed by HeavyDB server. Thus, this function has an empty body
* ✔️ ``NRT_decref`` - Similarly, this function also has an empty body
* ✔️ ``NRT_MemInfo_data_fast``
* ✔️ ``NRT_MemInfo_new``
* ✔️ ``NRT_MemInfo_new_varsize_dtor``
* ✔️ ``NRT_MemInfo_varsize_alloc``
* ✔️ ``NRT_MemInfo_varsize_realloc``
* ✔️ ``NRT_MemInfo_varsize_free``
* ✔️ ``NRT_MemInfo_new_varsize``
* ✔️ ``NRT_MemInfo_alloc_safe_aligned`` - Calls the unaligned version for now
* ✔️ ``NRT_Reallocate`` - ``realloc`` is implemented using ``allocate_varlen_buffer`` as the former may free previous allocated memory, which result in a double-free once the UD[T]F finishes execution
* ✔️ ``NRT_dealloc``
* ✔️ ``NRT_Free`` - Memory is freed upon UD[T]F return. Thus, this function does not free any memory
* ✔️ ``NRT_MemInfo_destroy``


How to debug NRT methods
------------------------

By defining the environment variable ``RBC_DEBUG_NRT``, RBC will enable debug
statements to each function call made to NRT.


How to generate ``unicodetype_db.ll``
-------------------------------------

Alongside with NRT, unicode strings also required functions which are
implemented using C, in Numba. In RBC, those functions were copied to
``unicodetype_db.h``. To generate the ``.ll`` file, run the clang command below
to generate the bitcode file:

.. code-block:: bash
$ clang -S -emit-llvm -O2 -Ipath/to/numba unicodetype_db.h -o unicodetype_db.ll
Supported Python Containers
===========================

List
----

* ✔️ ``list.append``
* ✔️ ``list.extend``
* ✔️ ``list.insert``
* ✔️ ``list.remove``
* ✔️ ``list.pop``
* ✔️ ``list.clear``
* ✔️ ``list.index``
* ✔️ ``list.count``
* ✔️ ``list.sort``
* ✔️ ``list.reverse``
* ✔️ ``list.copy``

Additionally, one can convert an array to a list with ``Array.to_list`` method.


Set
---

* ✔️ ``set.add``
* ✔️ ``set.clear``
* ✔️ ``set.copy``
* ✔️ ``set.difference``
* ✔️ ``set.difference_update``
* ✔️ ``set.discard``
* ✔️ ``set.intersection``
* ✔️ ``set.intersection_update``
* ✔️ ``set.isdisjoint``
* ✔️ ``set.issubset``
* ✔️ ``set.issuperset``
* ✔️ ``set.pop``
* ✔️ ``set.remove``
* ✔️ ``set.symmetric_difference``
* ✔️ ``set.symmetric_difference_update``
* ❌ ``set.union``
* ✔️ ``set.update``


String
------

* ✔️ ``string.capitalize``
* ✔️ ``string.casefold``
* ✔️ ``string.center``
* ✔️ ``string.count``
* ❌ ``string.encode``
* ✔️ ``string.endswith``
* ✔️ ``string.expandtabs``
* ✔️ ``string.find``
* ❌ ``string.format``
* ❌ ``string.format_map``
* ✔️ ``string.index``
* ✔️ ``string.isalnum``
* ✔️ ``string.isalpha``
* ✔️ ``string.isascii``
* ✔️ ``string.isdecimal``
* ✔️ ``string.isdigit``
* ✔️ ``string.isidentifier``
* ✔️ ``string.islower``
* ✔️ ``string.isnumeric``
* ✔️ ``string.isprintable``
* ✔️ ``string.isspace``
* ✔️ ``string.istitle``
* ✔️ ``string.isupper``
* ✔️ ``string.join``
* ✔️ ``string.ljust``
* ✔️ ``string.lower``
* ✔️ ``string.lstrip``
* ❌ ``string.maketrans``
* ❌ ``string.partition``
* ✔️ ``string.removeprefix``
* ✔️ ``string.removesuffix``
* ✔️ ``string.replace``
* ✔️ ``string.rfind``
* ✔️ ``string.rindex``
* ✔️ ``string.rjust``
* ❌ ``string.rpartition``
* ✔️ ``string.rsplit``
* ✔️ ``string.rstrip``
* ✔️ ``string.split``
* ✔️ ``string.splitlines``
* ✔️ ``string.startswith``
* ✔️ ``string.strip``
* ✔️ ``string.swapcase``
* ✔️ ``string.title``
* ❌ ``string.translate``
* ✔️ ``string.upper``
* ✔️ ``string.zfill``

Additionally, one can convert a text encoding none type to a python string using
``TextEncodingNone.to_string`` method.


Examples
--------

Tests are a good reference for using the methods defined above:

* `List <https://github.com/xnd-project/rbc/blob/main/rbc/tests/heavydb/test_nrt_list.py>`_
* `Set <https://github.com/xnd-project/rbc/blob/main/rbc/tests/heavydb/test_nrt_set.py>`_
* `String <https://github.com/xnd-project/rbc/blob/main/rbc/tests/heavydb/test_nrt_string.py>`_
57 changes: 27 additions & 30 deletions notebooks/rbc-heavydb-black-scholes.ipynb

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions notebooks/rbc-simple.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -331,6 +331,7 @@
}
],
"source": [
"# NBVAL_IGNORE_OUTPUT\n",
"rjit.stop_server()"
]
}
Expand Down
1 change: 1 addition & 0 deletions rbc/heavydb/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
from .array import * # noqa: F401, F403
from .allocator import * # noqa: F401, F403
from .column import * # noqa: F401, F403
from .column_array import * # noqa: F401, F403
from .buffer import * # noqa: F401, F403
Expand Down
18 changes: 18 additions & 0 deletions rbc/heavydb/allocator.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
__all__ = ['allocate_varlen_buffer']

from numba.core import cgutils
from llvmlite import ir


def allocate_varlen_buffer(builder, element_count, element_size):
"""
Allocates ``(element_count + 1) * element_size`` bytes
"""
i8p = ir.IntType(8).as_pointer()
i64 = ir.IntType(64)

module = builder.module
name = 'allocate_varlen_buffer'
fnty = ir.FunctionType(i8p, [i64, i64])
fn = cgutils.get_or_insert_function(module, fnty, name)
return builder.call(fn, [element_count, element_size])
62 changes: 56 additions & 6 deletions rbc/heavydb/array.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
heavydb_buffer_constructor)
from numba.core import extending, cgutils
from numba import types as nb_types
from typing import Union
from typing import Union, Optional
from llvmlite import ir


Expand Down Expand Up @@ -58,9 +58,9 @@ def deepcopy(self, context, builder, val, retptr):
with otherwise:
# we can't just copy the pointer here because return buffers need
# to have their own memory, as input buffers are freed upon returning
ptr = memalloc(context, builder, ptr_type, element_count, element_size)
cgutils.raw_memcpy(builder, ptr, src, element_count, element_size)
builder.store(ptr, builder.gep(retptr, [zero, zero]))
dst = memalloc(context, builder, ptr_type, element_count, element_size)
cgutils.raw_memcpy(builder, dst, src, element_count, element_size)
builder.store(dst, builder.gep(retptr, [zero, zero]))
builder.store(element_count, builder.gep(retptr, [zero, one]))
builder.store(is_null, builder.gep(retptr, [zero, two]))

Expand Down Expand Up @@ -101,7 +101,24 @@ def my_arange(size):
def __init__(self, size: int, dtype: Union[str, nb_types.Type]) -> None:
pass

def is_null(self) -> bool:
def is_null(self, index: Optional[int]) -> bool:
"""
Check if array is null. If index is provided, check if the array at
position given by index is null.
"""
pass

def set_null(self, index: Optional[int]) -> None:
"""
Set the array to null. If index is provided, set the array at the
given index to null.
"""
pass

def to_list(self) -> list:
"""
Returns a Python list with elements from the array
"""
pass

@property
Expand Down Expand Up @@ -165,7 +182,20 @@ def T(self):
@extending.lower_builtin(Array, nb_types.Integer, nb_types.StringLiteral)
@extending.lower_builtin(Array, nb_types.Integer, nb_types.NumberClass)
def heavydb_array_constructor(context, builder, sig, args):
return heavydb_buffer_constructor(context, builder, sig, args)._getpointer()
return heavydb_buffer_constructor(context, builder, sig, args)


@extending.lower_builtin(Array, nb_types.List)
def heavydb_array_ctor_list(context, builder, sig, args):
dtype = sig.args[0].dtype

def ctor(lst):
sz = len(lst)
arr = Array(sz, dtype)
for i in range(sz):
arr[i] = lst[i]
return arr
return context.compile_internal(builder, ctor, sig, args)


@extending.type_callable(Array)
Expand All @@ -181,6 +211,15 @@ def typer(size, dtype):
return typer


@extending.type_callable(Array)
def type_heavydb_array_lst(context):
def typer(lst):
if isinstance(lst, nb_types.List):
dtype = lst.dtype
return HeavyDBArrayType((dtype,)).tonumba()
return typer


@extending.overload_attribute(ArrayPointer, 'ndim')
def get_ndim(arr):
def impl(arr):
Expand All @@ -193,3 +232,14 @@ def get_size(arr):
def impl(arr):
return len(arr)
return impl


@extending.overload_method(ArrayPointer, 'to_list')
def ol_to_list(arr):
def impl(arr):
lst = list()
sz = len(arr)
for i in range(sz):
lst.append(arr[i])
return lst
return impl
21 changes: 13 additions & 8 deletions rbc/heavydb/buffer.py
Original file line number Diff line number Diff line change
Expand Up @@ -206,18 +206,23 @@ def heavydb_buffer_constructor(context, builder, sig, args):

ptr = memalloc(context, builder, ptr_type, element_count, element_size)

fa = cgutils.create_struct_proxy(sig.return_type.dtype)(context, builder)
fa.ptr = ptr # T*
fa.sz = element_count # size_t
llty = context.get_value_type(sig.return_type.dtype)
st_ptr = builder.alloca(llty)

zero, one, two = int32_t(0), int32_t(1), int32_t(2)
builder.store(ptr, builder.gep(st_ptr, [zero, zero]))
builder.store(element_count, builder.gep(st_ptr, [zero, one]))

if null_type is not None:
is_zero = builder.icmp_signed('==', element_count, int64_t(0))
with builder.if_else(is_zero) as (then, orelse):
with then:
is_null = context.get_value_type(null_type)(1)
builder.store(is_null, builder.gep(st_ptr, [zero, two]))
with orelse:
is_null = context.get_value_type(null_type)(0)
fa.is_null = is_null # int8_t
return fa
builder.store(is_null, builder.gep(st_ptr, [zero, two]))
return st_ptr


@extending.intrinsic
Expand Down Expand Up @@ -422,9 +427,9 @@ def heavydb_buffer_set_null_(typingctx, data):
sig = types.none(data)

def codegen(context, builder, sig, args):
rawptr = cgutils.alloca_once_value(builder, value=args[0])
ptr = builder.load(rawptr)
builder.store(int8_t(1), builder.gep(ptr, [int32_t(0), int32_t(2)]))
[ptr] = args
zero, two = int32_t(0), int32_t(2)
builder.store(int8_t(1), builder.gep(ptr, [zero, two]))

return sig, codegen

Expand Down
1 change: 1 addition & 0 deletions rbc/heavydb/remoteheavydb.py
Original file line number Diff line number Diff line change
Expand Up @@ -1101,6 +1101,7 @@ def retrieve_targets(self):
target_info.add_library('stdio')
target_info.add_library('stdlib')
target_info.add_library('heavydb')
target_info.add_library('NRT')
elif target_info.is_gpu:
if self.version < (6, 2):
# BC note: older heavydb versions do not define
Expand Down
Loading

0 comments on commit f816e23

Please sign in to comment.