Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Numba Runtime in RBC #531

Merged
merged 68 commits into from
Mar 23, 2023
Merged
Show file tree
Hide file tree
Changes from 53 commits
Commits
Show all changes
68 commits
Select commit Hold shift + click to select a range
0cdb108
wip
guilhermeleobas Feb 6, 2023
e8d6c08
wip
guilhermeleobas Feb 9, 2023
cd39414
wip
guilhermeleobas Feb 13, 2023
5b79545
list working
guilhermeleobas Feb 14, 2023
dadf9b1
wip
guilhermeleobas Feb 15, 2023
0ad5271
upd
guilhermeleobas Feb 21, 2023
e1e42f9
remove a.ll
guilhermeleobas Feb 21, 2023
d7d75de
Add list and set tests
guilhermeleobas Feb 21, 2023
85e4e5a
fix flake8 issues
guilhermeleobas Feb 21, 2023
a0d80f2
Fix for Numba 0.55
guilhermeleobas Feb 22, 2023
a7f7cdd
Add test case for string methods
guilhermeleobas Feb 22, 2023
af36e29
document RBC_DEBUG_NRT environment variable
guilhermeleobas Feb 22, 2023
f7826d9
add to_string docstring
guilhermeleobas Feb 22, 2023
127277f
remove printf stmt
guilhermeleobas Feb 22, 2023
bd31916
remove test
guilhermeleobas Feb 22, 2023
6752341
update TextEncodingNone::deepcopy
guilhermeleobas Feb 22, 2023
2fe73ae
Use target context and target info to correctly define MemInfo struct
guilhermeleobas Feb 24, 2023
4a20d6b
Fix test failure
guilhermeleobas Feb 24, 2023
9e5cd73
add noinline only if it is debugging NRT
guilhermeleobas Feb 25, 2023
ff97202
ignore failing test
guilhermeleobas Feb 27, 2023
ffa40ac
string.capitalize crashes HeavyDB server
guilhermeleobas Feb 27, 2023
6a8fd24
Implement NRT_MemInfo_alloc_safe_aligned
guilhermeleobas Feb 27, 2023
2ebf919
compile code with opt_level=1
guilhermeleobas Feb 27, 2023
0e0d708
change pytest command
guilhermeleobas Feb 27, 2023
02c44ce
undo changes to rbc_test.yml
guilhermeleobas Feb 27, 2023
d4e8d9a
add xfail(strict=False) to remotejit test
guilhermeleobas Feb 27, 2023
2bb9850
debug test_split
guilhermeleobas Feb 27, 2023
b3fc8a2
undo changes to rbc_test.yml
guilhermeleobas Feb 28, 2023
9662804
mark test as xfail
guilhermeleobas Feb 28, 2023
7081936
mark string tests as possible failures
guilhermeleobas Feb 28, 2023
8781c43
upd
guilhermeleobas Feb 28, 2023
7fb8259
upd
guilhermeleobas Feb 28, 2023
98ebfa3
document how unicodetype_db is generated
guilhermeleobas Feb 28, 2023
040598b
add test for to_string
guilhermeleobas Feb 28, 2023
0d0ec25
postpone remotejit tests
guilhermeleobas Feb 28, 2023
123dbdf
Attempt to fix odd bug where LLVM optimize some instructions due to SROA
guilhermeleobas Feb 28, 2023
77d60b4
add to_list method
guilhermeleobas Feb 28, 2023
b0140da
add test to ensure unicodedbtype.ll contains the required variables
guilhermeleobas Feb 28, 2023
2e6895c
flake8
guilhermeleobas Feb 28, 2023
8440065
add unicodetype_db.ll to package_data
guilhermeleobas Feb 28, 2023
8497ab5
Fix read_unicodetype_db function
guilhermeleobas Feb 28, 2023
6246cc7
Fix readthedocs build
guilhermeleobas Feb 28, 2023
6d197ed
add notes on Numba Runtime + supported/tested types
guilhermeleobas Feb 28, 2023
4b33edd
upd
guilhermeleobas Feb 28, 2023
47dec6b
include page about NRT to the index
guilhermeleobas Feb 28, 2023
3aac947
Update nrt.rst
guilhermeleobas Feb 28, 2023
3854f7d
upd
guilhermeleobas Feb 28, 2023
78bb61d
Merge branch 'main' into guilhermeleobas/nrt
guilhermeleobas Mar 1, 2023
1305c4a
Use allocate_varlen_buffer instead of malloc
guilhermeleobas Mar 3, 2023
e25f651
add docs for NRT to include which functions were implemented
guilhermeleobas Mar 8, 2023
2d6dcbf
flake8
guilhermeleobas Mar 8, 2023
8652c06
Merge branch 'main' into guilhermeleobas/nrt
guilhermeleobas Mar 8, 2023
a3d1287
Update docs
guilhermeleobas Mar 9, 2023
9b7807c
Add more tests for array.to_list() and fix a bug where LLVM SROA dele…
guilhermeleobas Mar 10, 2023
f77609d
Merge branch 'guilhermeleobas/nrt' of github.com:guilhermeleobas/rbc …
guilhermeleobas Mar 10, 2023
6447081
uncomment code
guilhermeleobas Mar 10, 2023
00cb623
Add function name
guilhermeleobas Mar 10, 2023
f29db2c
add missing NRT function to libfuncs
guilhermeleobas Mar 10, 2023
86ada05
try to recognize function name in a call to a bitcast instruction
guilhermeleobas Mar 10, 2023
698fe8f
remove printf call
guilhermeleobas Mar 10, 2023
cbc3b19
remove unused function
guilhermeleobas Mar 10, 2023
9bf81aa
ignore output for nbval
guilhermeleobas Mar 10, 2023
8fb6af4
flake8
guilhermeleobas Mar 10, 2023
b99d4ee
partially address reviewer comments
guilhermeleobas Mar 22, 2023
6bed8d1
address remaining comments
guilhermeleobas Mar 22, 2023
228f657
only emit warning if debug=True
guilhermeleobas Mar 22, 2023
9495a35
Use unique server ports in remotejit tests.
pearu Mar 23, 2023
c7a50f5
Fix nbval
pearu Mar 23, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/rbc_test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ jobs:
python-version: '3.10'
numba-version: '0.55'

needs: [lint]
needs: [lint, heavydb]

steps:
- name: Checkout code
Expand Down
8 changes: 4 additions & 4 deletions doc/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -87,10 +87,10 @@
html_theme_options = {
"github_url": "https://github.com/xnd-project/rbc",
"use_edit_page_button": True,
"logo": {
"image_light": html_logo,
"image_dark": html_logo,
},
"logo": {
"image_light": html_logo,
"image_dark": html_logo,
},
# https://github.com/pydata/pydata-sphinx-theme/issues/1220
"icon_links": [],
}
Expand Down
5 changes: 5 additions & 0 deletions doc/envvars.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,8 @@ Debugging

If set to non-zero, prints out all possible debug information
during function compilation and remote execution.

.. envvar:: RBC_DEBUG_NRT

If set to non-zero, insert debug statements to our implementation
of Numba Runtime (NRT)
1 change: 1 addition & 0 deletions doc/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,4 +22,5 @@ and llvmlite tools.
api
releases
developer
nrt
envvars
170 changes: 170 additions & 0 deletions doc/nrt.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,170 @@

Numba Runtime
=============

RBC includes a simplified implementation of the Numba Runtime written in pure
LLVM, which differs from the original C-written implementation. This approach
enables RBC to extend its support to a broader range of Python datatypes,
including lists, sets, and strings, which was not possible with the previous
implementation. Python ``dict`` is not supported as its implementation is
written in C.


List of NRT functions implemented
---------------------------------

The RBC project includes some functions from the NRT module, with changes made
as necessary to fit our specific use case. Here is a list of the functions we
have implemented, along with any relevant comments.

* ✔️ ``NRT_MemInfo_init``
* ✔️ ``NRT_MemInfo_alloc``
* ✔️ ``NRT_MemInfo_alloc_safe``
* ✔️ ``NRT_MemInfo_alloc_dtor``
* ✔️ ``NRT_MemInfo_alloc_dtor_safe``
* ✔️ ``NRT_Allocate``
* ✔️ ``NRT_Allocate_External``
* ✔️ ``NRT_MemInfo_call_dtor``
* ✔️ ``NRT_incref`` - Memory is managed by HeavyDB server. Thus, this function has an empty body
* ✔️ ``NRT_decref`` - Similarly, this function also has an empty body
* ✔️ ``NRT_MemInfo_data_fast``
* ✔️ ``NRT_MemInfo_new``
* ✔️ ``NRT_MemInfo_new_varsize_dtor``
* ✔️ ``NRT_MemInfo_varsize_alloc``
* ✔️ ``NRT_MemInfo_varsize_realloc``
* ✔️ ``NRT_MemInfo_varsize_free``
* ✔️ ``NRT_MemInfo_new_varsize``
* ✔️ ``NRT_MemInfo_alloc_safe_aligned`` - Calls the unaligned version for now
* ✔️ ``NRT_Reallocate`` - ``realloc`` is implemented using ``allocate_varlen_buffer`` as the former may free previous allocated memory, which result in a double-free once the UD[T]F finishes execution
* ✔️ ``NRT_dealloc``
* ✔️ ``NRT_Free`` - Memory is freed upon UD[T]F return. Thus, this function does not free any memory
* ✔️ ``NRT_MemInfo_destroy``


How to debug NRT methods
------------------------

By defining the environment variable ``RBC_DEBUG_NRT``, RBC will enable debug
statements to each function call made to NRT.


How to generate ``unicodetype_db.ll``
-------------------------------------

Alongside with NRT, unicode strings also required functions which are
implemented using C, in Numba. In RBC, those functions were copied to
``unicodetype_db.h``. To generate the ``.ll`` file, simply update the
``#include`` statement on line 2, and run the clang to generate the bitcode
file:

.. code-block:: bash

$ clang -S -emit-llvm -O2 unicodetype_db.h -o unicodetype_db.ll


Supported Python Containers
===========================

List
----

* ✔️ ``list.append``
* ✔️ ``list.extend``
* ✔️ ``list.insert``
* ✔️ ``list.remove``
* ✔️ ``list.pop``
* ✔️ ``list.clear``
* ✔️ ``list.index``
* ✔️ ``list.count``
* ✔️ ``list.sort``
* ✔️ ``list.reverse``
* ✔️ ``list.copy``

Additionally, one can convert an array to a list with ``Array.to_list`` method.


Set
---

* ✔️ ``set.add``
* ✔️ ``set.clear``
* ✔️ ``set.copy``
* ✔️ ``set.difference``
* ✔️ ``set.difference_update``
* ✔️ ``set.discard``
* ✔️ ``set.intersection``
* ✔️ ``set.intersection_update``
* ✔️ ``set.isdisjoint``
* ✔️ ``set.issubset``
* ✔️ ``set.issuperset``
* ✔️ ``set.pop``
* ✔️ ``set.remove``
* ✔️ ``set.symmetric_difference``
* ✔️ ``set.symmetric_difference_update``
* ❌ ``set.union``
* ✔️ ``set.update``


String
------

* ✔️ ``string.capitalize``
* ✔️ ``string.casefold``
* ✔️ ``string.center``
* ✔️ ``string.count``
* ❌ ``string.encode``
* ✔️ ``string.endswith``
* ✔️ ``string.expandtabs``
* ✔️ ``string.find``
* ❌ ``string.format``
* ❌ ``string.format_map``
* ✔️ ``string.index``
* ✔️ ``string.isalnum``
* ✔️ ``string.isalpha``
* ✔️ ``string.isascii``
* ✔️ ``string.isdecimal``
* ✔️ ``string.isdigit``
* ✔️ ``string.isidentifier``
* ✔️ ``string.islower``
* ✔️ ``string.isnumeric``
* ✔️ ``string.isprintable``
* ✔️ ``string.isspace``
* ✔️ ``string.istitle``
* ✔️ ``string.isupper``
* ✔️ ``string.join``
* ✔️ ``string.ljust``
* ✔️ ``string.lower``
* ✔️ ``string.lstrip``
* ❌ ``string.maketrans``
* ❌ ``string.partition``
* ✔️ ``string.removeprefix``
* ✔️ ``string.removesuffix``
* ✔️ ``string.replace``
* ✔️ ``string.rfind``
* ✔️ ``string.rindex``
* ✔️ ``string.rjust``
* ❌ ``string.rpartition``
* ✔️ ``string.rsplit``
* ✔️ ``string.rstrip``
* ✔️ ``string.split``
* ✔️ ``string.splitlines``
* ✔️ ``string.startswith``
* ✔️ ``string.strip``
* ✔️ ``string.swapcase``
* ✔️ ``string.title``
* ❌ ``string.translate``
* ✔️ ``string.upper``
* ✔️ ``string.zfill``

Additionally, one can convert a text encoding none type to a python string using
``TextEncodingNone.to_string`` method.


Examples
--------

Tests are a good reference for using the methods defined above:

* `List <https://github.com/xnd-project/rbc/blob/main/rbc/tests/heavydb/test_nrt_list.py>`_
* `Set <https://github.com/xnd-project/rbc/blob/main/rbc/tests/heavydb/test_nrt_set.py>`_
* `String <https://github.com/xnd-project/rbc/blob/main/rbc/tests/heavydb/test_nrt_string.py>`_
1 change: 1 addition & 0 deletions rbc/heavydb/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
from .array import * # noqa: F401, F403
from .allocator import * # noqa: F401, F403
from .column import * # noqa: F401, F403
from .column_array import * # noqa: F401, F403
from .buffer import * # noqa: F401, F403
Expand Down
15 changes: 15 additions & 0 deletions rbc/heavydb/allocator.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
__all__ = ['allocate_varlen_buffer']

from numba.core import cgutils
from llvmlite import ir


def allocate_varlen_buffer(builder, element_count, element_size):
guilhermeleobas marked this conversation as resolved.
Show resolved Hide resolved
i8p = ir.IntType(8).as_pointer()
i64 = ir.IntType(64)

module = builder.module
name = 'allocate_varlen_buffer'
fnty = ir.FunctionType(i8p, [i64, i64])
fn = cgutils.get_or_insert_function(module, fnty, name)
return builder.call(fn, [element_count, element_size])
45 changes: 42 additions & 3 deletions rbc/heavydb/array.py
Original file line number Diff line number Diff line change
Expand Up @@ -58,9 +58,9 @@ def deepcopy(self, context, builder, val, retptr):
with otherwise:
# we can't just copy the pointer here because return buffers need
# to have their own memory, as input buffers are freed upon returning
ptr = memalloc(context, builder, ptr_type, element_count, element_size)
cgutils.raw_memcpy(builder, ptr, src, element_count, element_size)
builder.store(ptr, builder.gep(retptr, [zero, zero]))
dst = memalloc(context, builder, ptr_type, element_count, element_size)
cgutils.raw_memcpy(builder, dst, src, element_count, element_size)
builder.store(dst, builder.gep(retptr, [zero, zero]))
builder.store(element_count, builder.gep(retptr, [zero, one]))
builder.store(is_null, builder.gep(retptr, [zero, two]))

Expand Down Expand Up @@ -104,6 +104,12 @@ def __init__(self, size: int, dtype: Union[str, nb_types.Type]) -> None:
def is_null(self) -> bool:
pass

def to_list(self) -> list:
"""
Returns a Python list with elements from the array
"""
pass

@property
def dtype(self):
"""
Expand Down Expand Up @@ -168,6 +174,19 @@ def heavydb_array_constructor(context, builder, sig, args):
return heavydb_buffer_constructor(context, builder, sig, args)._getpointer()


@extending.lower_builtin(Array, nb_types.List)
def heavydb_array_ctor_list(context, builder, sig, args):
dtype = sig.args[0].dtype

def ctor(lst):
sz = len(lst)
arr = Array(sz, dtype)
for i in range(sz):
arr[i] = lst[i]
return arr
Comment on lines +192 to +197
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably this constructor could be replaced by memcpying the data, but I'll leave any optimization to the future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending on how complicated the IR becomes, there's a chance that LLVM will optimize it on its own with -memcpyopt: https://llvm.org/docs/Passes.html#memcpyopt-memcpy-optimization

return context.compile_internal(builder, ctor, sig, args)


@extending.type_callable(Array)
def type_heavydb_array(context):
def typer(size, dtype):
Expand All @@ -181,6 +200,15 @@ def typer(size, dtype):
return typer


@extending.type_callable(Array)
def type_heavydb_array_lst(context):
def typer(lst):
if isinstance(lst, nb_types.List):
dtype = lst.dtype
return HeavyDBArrayType((dtype,)).tonumba()
return typer


@extending.overload_attribute(ArrayPointer, 'ndim')
def get_ndim(arr):
def impl(arr):
Expand All @@ -193,3 +221,14 @@ def get_size(arr):
def impl(arr):
return len(arr)
return impl


@extending.overload_method(ArrayPointer, 'to_list')
def ol_to_list(arr):
def impl(arr):
lst = list()
sz = len(arr)
for i in range(sz):
lst.append(arr[i])
return lst
return impl
1 change: 1 addition & 0 deletions rbc/heavydb/remoteheavydb.py
Original file line number Diff line number Diff line change
Expand Up @@ -1101,6 +1101,7 @@ def retrieve_targets(self):
target_info.add_library('stdio')
target_info.add_library('stdlib')
target_info.add_library('heavydb')
target_info.add_library('NRT')
elif target_info.is_gpu:
if self.version < (6, 2):
# BC note: older heavydb versions do not define
Expand Down
Loading