Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make sure that unoptimised functions calls ONLY unoptimised functions. #216

Merged

Conversation

Pavel-Durov
Copy link

@Pavel-Durov Pavel-Durov commented Nov 24, 2024

Ensure that calls in original functions are referencing only original functions.

I encountered this issue when I was building yklua with latest yk YKB_TRACER=swt.
The main issue here is with serialization in YkIRWriter, where we don't serialize cloned functions but we still might end up with original functions calling cloned functions.

Lua build error:

Function not found in function index map: __yk_clone_luaO_tostring
LLVM ERROR: Function not found in function index map

source

* - Function `f` calls function `g`.
* - Function `g` is cloned as `__yk_clone_g`.
* - Function `f` is not cloned because its address is taken.
* - As a result, function `f` calls `__yk_clone_g` instead of `g`.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC the idea here is "we need to be able to trace f, but its address has taken, so we will always call f with tracing calls inserted"?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More like:
We need to be able to trace f, but its address has taken.
We need to make sure that any functions called from f are also non-cloned functions (functions that we can trace).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated tests 👉 0f3e89c

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a result, function f calls __yk_clone_g instead of g.

Is that what's already happening or what this function does? I feel like there's another bullet point missing.

Also, please remind me which functions have the yk_record_block calls, the original or the cloned ones?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that what's already happening or what this function does? I feel like there's another bullet point missing.

The comment example scenario is about what's currently happening (before this function in this PR)
I can update it to make it clearer though.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a result, function f calls __yk_clone_g instead of g.

If f isn't cloned, why is it calling __yk_clone_g. We must have altered f somewhere. So it appears, we first alter f to call _yk_clone_g instead of g and then in this function we revert this change. Is that correct? If so, why don't we only update function calls to their cloned version if the caller function is also cloned?

Copy link
Author

@Pavel-Durov Pavel-Durov Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what's happening here, is that we rename all functions in the cloned module before linking.
But then some of them are not renamed cause their address is taken, while sill referencing functions that were renamed.

let's say we have:

func a { }

// function which address is taken
func b { 
  a();
}

After calling renameFunctions on a cloned module we have:

func __yk_clone_a { }

// let b a function which address is taken
func b { 
  __yk_clone_a();
}

function a was renamed but function b wasn't (because address is taken).

And then when we link both modules,b from cloned module overrides the original b definition.

Example of actual module IR generated (before this PR):

; Function Attrs: noinline nounwind optnone uwtable
define internal i32 @luaB_assert(ptr noundef %0) #0 {
  %2 = getelementptr inbounds %struct.lua_State, ptr %0, i64 0, i32 8
  %3 = load ptr, ptr %2, align 8, !tbaa !27
  %4 = load ptr, ptr %3, align 8, !tbaa !22
  %5 = getelementptr inbounds %union.StackValue, ptr %4, i64 1
  %6 = getelementptr inbounds %struct.lua_State, ptr %0, i64 0, i32 6
  %7 = load ptr, ptr %6, align 8, !tbaa !22
  %8 = icmp ult ptr %5, %7
  br i1 %8, label %13, label %9

9:                                                ; preds = %1
  %10 = getelementptr inbounds %struct.lua_State, ptr %0, i64 0, i32 7
  %11 = load ptr, ptr %10, align 8, !tbaa !10
  %12 = getelementptr inbounds %struct.global_State, ptr %11, i64 0, i32 8
  br label %13

13:                                               ; preds = %9, %1
  %14 = phi ptr [ %12, %9 ], [ %5, %1 ]
  %15 = getelementptr inbounds %struct.TValue, ptr %14, i64 0, i32 1
  %16 = load i8, ptr %15, align 8, !tbaa !23
  %17 = icmp eq i8 %16, 1
  %18 = and i8 %16, 15
  %19 = icmp eq i8 %18, 0
  %20 = or i1 %17, %19
  br i1 %20, label %21, label %24, !prof !5

21:                                               ; preds = %13
  tail call void @__yk_clone_luaL_checkany(ptr noundef %0, i32 noundef 1) #77
  tail call void @__yk_clone_lua_rotate(ptr noundef %0, i32 noundef 1, i32 noundef -1) #77
  tail call void @__yk_clone_lua_settop(ptr noundef %0, i32 noundef -2) #77
  %22 = tail call ptr @__yk_clone_lua_pushstring(ptr noundef %0, ptr noundef nonnull @.str.26.676) #77
  tail call void @__yk_clone_lua_settop(ptr noundef %0, i32 noundef 1) #77
  %23 = tail call i32 @luaB_error(ptr noundef %0)
  unreachable

Copy link
Author

@Pavel-Durov Pavel-Durov Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe the question is how these call sites in the function that is not cloned got renamed in the first place?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like F.setName is actually updating the callsites as well as the function name.
I can see:
Before F.setName is called for luaL_checkany function:

21:                                               ; preds = %13
  tail call void @luaL_checkany(ptr noundef %0, i32 noundef 1) #77
  ...
  unreachable

And just after F.setName is called for luaL_checkany function:

21:                                               ; preds = %13
  tail call void @__yk_clone_luaL_checkany(ptr noundef %0, i32 noundef 1) #77
  ....
  unreachable

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that what's already happening or what this function does? I feel like there's another bullet point missing.

Updated comment for clarity 👉 3462947

@Pavel-Durov
Copy link
Author

Added test in 👉 ykjit/yk#1477

@Pavel-Durov Pavel-Durov reopened this Nov 25, 2024
* - As a result, function `f` calls `__yk_clone_g` instead of `g`.
*
* **Reasoning:**
* In `YkIRWriter` we only serialise non-cloned functions. We want
Copy link

@ptersilie ptersilie Nov 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICS this isn't really related to the serialiser. The real reason we want unoptimised functions to only call other unoptimised functions is because otherwise block recording won't work. In other words, if f calls g_opt, then f will record the blocks, but g_opt won't, we get incorrect traces.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

youre right.

@ptersilie
Copy link

ptersilie commented Nov 26, 2024

Okay, I had to remind myself how this all works, but I think I get it now. Please let me know if I said anything wrong. The way we make the optimised module is:

  1. We use LLVM's CloneModule API to create a copy of the current module.
  2. We then iterate over all functions in the cloned module and rename them, unless they have their address taken.
  3. Annoyingly, this also updates the callees of unoptimised functions, which means we end up with unoptimised functions calling optimised functions. Which is bad, because it messes with the block recording.
  4. So we need to go back and reverse the callees inside unoptimised functions to only call unoptimised functions.
  5. Finally, we link the two modules together to get one module.

Question:
Was there a reason why we use LLVM's CloneModule? From what I can see, we are only really interested in cloning some functions to make optimised versions of them. Wouldn't it be easier to do the following:

  • Iterate over the original module
  • Clone and rename (in one step) each function that doesn't have it's address taken, and add it to the same module.
  • Built up a VMap for the clones.
  • Once all functions have been cloned, iterate over all cloned functions and update the callees using the vmap.

This seems a bit more straight forward then what we are doing now, where things get updated that we don't want to and then we have to get back and fix it, due to the way CloneModule works. But there might be a valid reason we are using CloneModule that I have forgotten.

Separate suggestion (for a separate PR): I keep forgetting what the cloned and uncloned version of a function is. We should use unoptimised and optimised instead. I.e. the unoptimised function is the original (containing the yk_record_block calls). After cloning we get an optimised version without those calls. We should also rename the prefix to reflect that, maybe something like __yk_opt_foo.

@Pavel-Durov
Copy link
Author

Okay, I had to remind myself how this all works, but I think I get it now. Please let me know if I said anything wrong. The way we make the optimised module is:

1. We use LLVM's `CloneModule` API to create a copy of the current module.

2. We then iterate over all functions in the cloned module and rename them, unless they have their address taken.

3. Annoyingly, this also updates the callees of unoptimised functions, which means we end up with unoptimised functions calling optimised functions. Which is bad, because it messes with the block recording.

4. So we need to go back and reverse the callees inside unoptimised functions to only call unoptimised functions.

5. Finally, we link the two modules together to get one module.

Question: Was there a reason why we use LLVM's CloneModule? From what I can see, we are only really interested in cloning some functions to make optimised versions of them. Wouldn't it be easier to do the following:

* Iterate over the original module

* Clone and rename (in one step) each function that doesn't have it's address taken, and add it to the same module.

* Built up a VMap for the clones.

* Once all functions have been cloned, iterate over all cloned functions and update the callees using the vmap.

This seems a bit more straight forward then what we are doing now, where things get updated that we don't want to and then we have to get back and fix it, due to the way CloneModule works. But there might be a valid reason we are using CloneModule that I have forgotten.

Separate suggestion (for a separate PR): I keep forgetting what the cloned and uncloned version of a function is. We should use unoptimised and optimised instead. I.e. the unoptimised function is the original (containing the yk_record_block calls). After cloning we get an optimised version without those calls. We should also rename the prefix to reflect that, maybe something like __yk_opt_foo.

Addressed here 👉 12fc9d7

for (const Argument &OrigArg : F.args()) {
DestArgIt->setName(OrigArg.getName());
VMap[&OrigArg] = &*DestArgIt++;
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed? From what I can see llvm::CloneFunction already copies the arguments. And the VMap isn't actually read anywhere here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok will remove.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed 👉 4563f62f6b46eca4d91cd71192c4a32356690560

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the commit.

@ptersilie
Copy link

Aha, this looks good. I actually wasn't sure if there was a reason we did things the other way (and was half expecting you to tell me why my approach won't work). But I'm glad this works as it's a bit easier to think about and we don't need to go back and fix things up as with the earlier approach.

Do we need to update the PR title now since our approach has changed, e.g. something like "Avoid updating unoptimised functions with optimised callees by processing optimised functions in isolation."?

@Pavel-Durov
Copy link
Author

Aha, this looks good. I actually wasn't sure if there was a reason we did things the other way (and was half expecting you to tell me why my approach won't work). But I'm glad this works as it's a bit easier to think about and we don't need to go back and fix things up as with the earlier approach.

Do we need to update the PR title now since our approach has changed, e.g. something like "Avoid updating unoptimised functions with optimised callees by processing optimised functions in isolation."?

I think we went with cloning approach just because that's what we initially drawn on the whiteboard :)

@Pavel-Durov Pavel-Durov changed the title Update function call references in YkModuleClone. Make sure that unoptiised functions calls only unoptiised functions. Dec 2, 2024
@Pavel-Durov Pavel-Durov changed the title Make sure that unoptiised functions calls only unoptiised functions. Make sure that unoptimised functions calls ONLY unoptimised functions. Dec 2, 2024
@Pavel-Durov
Copy link
Author

ready for review :)
@ptersilie

@ptersilie
Copy link

Please squash.

SWT.

Switch from module linking to function cloning for simplicity and
changed the function cloning prefix from `__yk_clone_` to
`__yk_opt_` to better reflect the optimization intent in ModuleClone
pass.
@Pavel-Durov Pavel-Durov force-pushed the fix-function-reference-yk-module-clone-pass branch from 94a1f50 to 453f961 Compare December 6, 2024 17:59
@Pavel-Durov
Copy link
Author

Squashed 👉 453f961

@ltratt ltratt added this pull request to the merge queue Dec 7, 2024
Merged via the queue into ykjit:main with commit b5ce1fa Dec 7, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants