Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Numba Runtime in RBC #531

Merged
merged 68 commits into from
Mar 23, 2023
Merged

Conversation

guilhermeleobas
Copy link
Contributor

This PR rewrites the Numba Runtime using LLVM.

@guilhermeleobas guilhermeleobas self-assigned this Feb 17, 2023
@guilhermeleobas guilhermeleobas added the enhancement New feature or request label Feb 17, 2023
Copy link
Contributor Author

@guilhermeleobas guilhermeleobas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notes to myself

rbc/nrt.py Outdated Show resolved Hide resolved
rbc/nrt.py Outdated
# MemInfo struct
MemInfo_t = ir.global_context.get_identified_type('Struct.MemInfo')
# First argument ought to be "atomic_i64_t" instead of i64
MemInfo_t.set_body(i64, i8p, i8p, i8p, i64, i8p)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace the first i64 by atomic_i64_t

rbc/nrt.py Outdated Show resolved Hide resolved
rbc/nrt.py Outdated Show resolved Hide resolved
rbc/nrt.py Outdated Show resolved Hide resolved
Comment on lines +325 to +329
with otherwise:
# there is a specific case where ptr is NULL and doesn't
# crash Numba
bb_else = builder.basic_block
data_null = NULL
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Document this odd corner case

rbc/tests/heavydb/test_nrt_list.py Show resolved Hide resolved
rbc/tests/heavydb/test_nrt_string.py Show resolved Hide resolved
rbc/unicodetype_db.ll Show resolved Hide resolved
rbc/nrt.py Outdated Show resolved Hide resolved
Copy link
Contributor

@brenocfg brenocfg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, this is awesome stuff @guilhermeleobas!

I have a few questions/comments, mostly related to the API of the NRT functions. Will wait for your feedback on those before giving explicit approval.

Comment on lines +181 to +186
def ctor(lst):
sz = len(lst)
arr = Array(sz, dtype)
for i in range(sz):
arr[i] = lst[i]
return arr
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending on how complicated the IR becomes, there's a chance that LLVM will optimize it on its own with -memcpyopt: https://llvm.org/docs/Passes.html#memcpyopt-memcpy-optimization

rbc/heavydb/text_encoding_none.py Outdated Show resolved Hide resolved
rbc/heavydb/text_encoding_none.py Show resolved Hide resolved
rbc/heavydb/text_encoding_none.py Show resolved Hide resolved
rbc/irtools.py Outdated Show resolved Hide resolved
rbc/nrt.py Show resolved Hide resolved
rbc/nrt.py Outdated
Comment on lines 404 to 414
def define_NRT_Reallocate(self, builder, args):
with self.nrt_debug_ctx():
# Because the previously reallocated memory might be freed by
# realloc, RBC cannot call it in this function. Therefore, instead
# of reallocation, RBC allocates a new block of memory and returns
# it to the caller. The caller is responsible for copying memory
# from the old buffer into the most recent allocated one."
[ptr, size] = args
new_ptr = builder.call(self.NRT_Allocate_External, [size, NULL])
self.NRT_Debug("ptr=%p size=%zu -> new_ptr=%p", ptr, size, new_ptr)
builder.ret(new_ptr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason why realloc can't just be a wrapper around malloc + memcpy then? That would remove the weirdness about the caller being responsible for copying, which isn't the case for realloc. I think it's preferable to keep to the original function's API as much as possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For that to work, RBC would need to know somehow the size of the previous allocated buffer, and this info is not available in NRT_Reallocate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh that makes sense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use malloc_usable_size for determining the size of the malloc-allocated buffer. It would be Linux specific function though, see https://stackoverflow.com/questions/1281686/determine-size-of-dynamically-allocated-memory-in-c for Windows and Mac equivalents. Recall, heavydb can be built on Linux as well as on Windows.

It would be preferable to keep the original realloc API. Could you create an issue about this and emit an (only-once) warning in define_NRT_Reallocate call about the current behavior with a reference to the issue?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See also NRT_Reallocate_NoCopy suggestion below. This would allow postponing resolving the realloc API issue.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@guilhermeleobas
Copy link
Contributor Author

Current CI failures are not related to this PR.

@guilhermeleobas
Copy link
Contributor Author

Pinging @mflaxman10 for visibility

Copy link
Contributor

@brenocfg brenocfg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Contributor

@pearu pearu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have a number of nits and questions.

Otherwise, it is an amazing PR, thanks, @guilhermeleobas!

rbc/heavydb/allocator.py Show resolved Hide resolved
rbc/heavydb/text_encoding_none.py Show resolved Hide resolved
rbc/heavydb/text_encoding_none.py Show resolved Hide resolved
rbc/irtools.py Outdated Show resolved Hide resolved
rbc/irtools.py Outdated Show resolved Hide resolved
rbc/nrt.py Outdated
Comment on lines 404 to 414
def define_NRT_Reallocate(self, builder, args):
with self.nrt_debug_ctx():
# Because the previously reallocated memory might be freed by
# realloc, RBC cannot call it in this function. Therefore, instead
# of reallocation, RBC allocates a new block of memory and returns
# it to the caller. The caller is responsible for copying memory
# from the old buffer into the most recent allocated one."
[ptr, size] = args
new_ptr = builder.call(self.NRT_Allocate_External, [size, NULL])
self.NRT_Debug("ptr=%p size=%zu -> new_ptr=%p", ptr, size, new_ptr)
builder.ret(new_ptr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See also NRT_Reallocate_NoCopy suggestion below. This would allow postponing resolving the realloc API issue.

rbc/nrt.py Show resolved Hide resolved
rbc/tests/heavydb/test_nrt_list.py Outdated Show resolved Hide resolved
# [X] set.remove
# [X] set.symmetric_difference
# [X] set.symmetric_difference_update
# [ ] set.union
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What blocks supporting set.union?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't seem to work in Numba:

self = <numba.core.typeinfer.TypeInferer object at 0x7ff2e5965f30>, raise_errors = True

    def propagate(self, raise_errors=True):
        newtoken = self.get_state_token()
        oldtoken = None
        # Since the number of types are finite, the typesets will eventually
        # stop growing.

        while newtoken != oldtoken:
            self.debug.propagate_started()
            oldtoken = newtoken
            # Errors can appear when the type set is incomplete; only
            # raise them when there is no progress anymore.
            errors = self.constraints.propagate(self)
            newtoken = self.get_state_token()
            self.debug.propagate_finished()
        if errors:
            if raise_errors:
                force_lit_args = [e for e in errors
                                  if isinstance(e, ForceLiteralArg)]
                if not force_lit_args:
>                   raise errors[0]
E                   numba.core.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend)
E                   Invalid use of BoundFunction(set.union for set(int64)) with parameters (set(Literal[int](0)))
E
E                   During: resolving callee type: BoundFunction(set.union for set(int64))
E                   During: typing of call at /home/guilhermeleobas/git/rbc/rbc/tests/heavydb/test_nrt_set.py (84)
E
E
E                   File "rbc/tests/heavydb/test_nrt_set.py", line 84:
E                       def test_set(t, method):
E                           <source elided>
E                           elif method_str == 'union':
E                               return len(s.union({0})) == 6
E                               ^

../../miniconda3/envs/rbc/lib/python3.10/site-packages/numba/core/typeinfer.py:1086: TypingError

rbc/unicodetype_db.h Outdated Show resolved Hide resolved
Copy link
Contributor

@pearu pearu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks, @guilhermeleobas!

@pearu pearu merged commit f816e23 into heavyai:main Mar 23, 2023
@guilhermeleobas guilhermeleobas deleted the guilhermeleobas/nrt branch March 23, 2023 13:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants