-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
jl_create_native generates bad code for *(Float16,Float16) #34993
Comments
This is generally what it means to be non-recursive in codegen. Now |
OK, I thought |
I've used #33955 for that kind of thing in experiments, which I suspect may be the correct solution eventually, but obviously it would be nice to have a workaround for now. |
It's one thing to emit placeholder functions that later will be replaced with (a call to) the final function, it's another to have these placeholders alloc GC frames and perform non-specsig calls. There also seem to be some caching happening:
Why is there an additional pointer argument here? Anyway, after calling this function on the CPU,
Calls to MWE without CUDAnative: using Core.Compiler: MethodInstance
using Base: _methods_by_ftype
function codegen(f,tt)
# get the method instance
world = typemax(UInt)
sig = Base.signature_type(f, tt)
mthds = _methods_by_ftype(sig, -1, world)
Base.isdispatchtuple(tt) || return(:(error("$tt is not a dispatch tuple")))
length(mthds) == 1 || return (:(throw(MethodError(f,tt))))
mtypes, msp, m = mthds[1]
method_instance = ccall(:jl_specializations_get_linfo, Ref{MethodInstance}, (Any, Any, Any), m, mtypes, msp)
# generate ir
params = Base.CodegenParams()
native_code = ccall(:jl_create_native, Ptr{Cvoid},
(Vector{Core.MethodInstance}, Base.CodegenParams),
[method_instance], params)
@assert native_code != C_NULL
llvm_mod = ccall(:jl_get_llvm_module, Ptr{Cvoid},
(Ptr{Cvoid},), native_code)
@assert llvm_mod != C_NULL
# get the top-level code
code = Core.Compiler.inf_for_methodinstance(method_instance, world, world)
# get the top-level function index
llvm_func_idx = Ref{Int32}(-1)
llvm_specfunc_idx = Ref{Int32}(-1)
ccall(:jl_breakpoint, Nothing, ())
ccall(:jl_get_function_id, Nothing,
(Ptr{Cvoid}, Any, Ptr{Int32}, Ptr{Int32}),
native_code, code, llvm_func_idx, llvm_specfunc_idx)
@assert llvm_func_idx[] != -1
@assert llvm_specfunc_idx[] != -1
# get the top-level function)
llvm_func = ccall(:jl_get_llvm_function, Ptr{Cvoid},
(Ptr{Cvoid}, UInt32), native_code, llvm_func_idx[]-1)
llvm_specfunc = ccall(:jl_get_llvm_function, Ptr{Cvoid},
(Ptr{Cvoid}, UInt32), native_code, llvm_specfunc_idx[]-1)
@assert llvm_specfunc != C_NULL
# dump ir
ccall(:jl_dump_function_ir, Ref{String},
(Ptr{Cvoid}, Bool, Bool, Ptr{UInt8}),
llvm_specfunc, true, true, :none)
end |
The codegen restructuring has regressed some GPU code, where we used to get static code we now get invokes and calls to jfptr functions:
I'm using CUDAnative.code_llvm here, which calls jl_create_native, as InteractiveUtils.code_llvm only shows the IR of the outer function even when dumping the entire module. I guess that may be a red herring though, since generating code for the constructor itself yields the expected IR (see below).
Before the refactor:
Generating code for the constructor directly:
Ref #25984, JuliaGPU/CUDAnative.jl#162 (comment)
The text was updated successfully, but these errors were encountered: