From 9614d8629bb1355fc3aaffdcd1e55362e06f93e4 Mon Sep 17 00:00:00 2001 From: Nathan Daly Date: Sun, 30 Jul 2023 22:27:11 -0400 Subject: [PATCH] Allocation Profiler: Types for all allocations (#50337) Pass the types to the allocator functions. ------- Before this PR, we were missing the types for allocations in two cases: 1. allocations from codegen 2. allocations in `gc_managed_realloc_` The second one is easy: those are always used for buffers, right? For the first one: we extend the allocation functions called from codegen, to take the type as a parameter, and set the tag there. I kept the old interfaces around, since I think that they cannot be removed due to supporting legacy code? ------ An example of the generated code: ```julia %ptls_field6 = getelementptr inbounds {}**, {}*** %4, i64 2 %13 = bitcast {}*** %ptls_field6 to i8** %ptls_load78 = load i8*, i8** %13, align 8 %box = call noalias nonnull dereferenceable(32) {}* @ijl_gc_pool_alloc_typed(i8* %ptls_load78, i32 1184, i32 32, i64 4366152144) #7 ``` Fixes #43688. Fixes #45268. Co-authored-by: Valentin Churavy --- doc/src/manual/profile.md | 90 +++++++++++++++++++-- src/gc-alloc-profiler.h | 1 + src/gc.c | 19 ++++- src/jl_exported_funcs.inc | 2 + src/llvm-final-gc-lowering.cpp | 9 ++- src/llvm-late-gc-lowering.cpp | 46 +++++++---- src/llvm-pass-helpers.cpp | 14 ++-- stdlib/Profile/test/allocs.jl | 31 +++++++ test/llvmpasses/alloc-opt-gcframe.ll | 16 ++-- test/llvmpasses/final-lower-gc.ll | 14 ++-- test/llvmpasses/late-lower-gc-addrspaces.ll | 8 +- test/llvmpasses/late-lower-gc.ll | 8 +- 12 files changed, 199 insertions(+), 59 deletions(-) diff --git a/doc/src/manual/profile.md b/doc/src/manual/profile.md index e5f1d6c417fa60..6e9f98d3f17806 100644 --- a/doc/src/manual/profile.md +++ b/doc/src/manual/profile.md @@ -338,15 +338,91 @@ argument can be passed to speed it up by making it skip some allocations. Passing `sample_rate=1.0` will make it record everything (which is slow); `sample_rate=0.1` will record only 10% of the allocations (faster), etc. -!!! note +!!! compat "Julia 1.11" + + Older versions of Julia could not capture types in all cases. In older versions of + Julia, if you see an allocation of type `Profile.Allocs.UnknownType`, it means that + the profiler doesn't know what type of object was allocated. This mainly happened when + the allocation was coming from generated code produced by the compiler. See + [issue #43688](https://github.com/JuliaLang/julia/issues/43688) for more info. + + Since Julia 1.11, all allocations should have a type reported. + +For more details on how to use this tool, please see the following talk from JuliaCon 2022: +https://www.youtube.com/watch?v=BFvpwC8hEWQ + +##### Allocation Profiler Example - The current implementation of the Allocations Profiler _does not - capture types for all allocations._ Allocations for which the profiler - could not capture the type are represented as having type - `Profile.Allocs.UnknownType`. +In this simple example, we use PProf to visualize the alloc profile. You could use another +visualization tool instead. We collect the profile (specifying a sample rate), then we visualize it. +```julia +using Profile, PProf +Profile.Allocs.clear() +Profile.Allocs.@profile sample_rate=0.0001 my_function() +PProf.Allocs.pprof() +``` + +Here is a more in-depth example, showing how we can tune the sample rate. A +good number of samples to aim for is around 1 - 10 thousand. Too many, and the +profile visualizer can get overwhelmed, and profiling will be slow. Too few, +and you don't have a representative sample. - You can read more about the missing types and the plan to improve this, here: - [issue #43688](https://github.com/JuliaLang/julia/issues/43688). + +```julia-repl +julia> import Profile + +julia> @time my_function() # Estimate allocations from a (second-run) of the function + 0.110018 seconds (1.50 M allocations: 58.725 MiB, 17.17% gc time) +500000 + +julia> Profile.Allocs.clear() + +julia> Profile.Allocs.@profile sample_rate=0.001 begin # 1.5 M * 0.001 = ~1.5K allocs. + my_function() + end +500000 + +julia> prof = Profile.Allocs.fetch(); # If you want, you can also manually inspect the results. + +julia> length(prof.allocs) # Confirm we have expected number of allocations. +1515 + +julia> using PProf # Now, visualize with an external tool, like PProf or ProfileCanvas. + +julia> PProf.Allocs.pprof(prof; from_c=false) # You can optionally pass in a previously fetched profile result. +Analyzing 1515 allocation samples... 100%|████████████████████████████████| Time: 0:00:00 +Main binary filename not available. +Serving web UI on http://localhost:62261 +"alloc-profile.pb.gz" +``` +Then you can view the profile by navigating to http://localhost:62261, and the profile is saved to disk. +See PProf package for more options. + +##### Allocation Profiling Tips + +As stated above, aim for around 1-10 thousand samples in your profile. + +Note that we are uniformly sampling in the space of _all allocations_, and are not weighting +our samples by the size of the allocation. So a given allocation profile may not give a +representative profile of where most bytes are allocated in your program, unless you had set +`sample_rate=1`. + +Allocations can come from users directly constructing objects, but can also come from inside +the runtime or be inserted into compiled code to handle type instability. Looking at the +"source code" view can be helpful to isolate them, and then other external tools such as +[`Cthulhu.jl`](https://github.com/JuliaDebug/Cthulhu.jl) can be useful for identifying the +cause of the allocation. + +##### Allocation Profile Visualization Tools + +There are several profiling visualization tools now that can all display Allocation +Profiles. Here is a small list of some of the main ones we know about: +- [PProf.jl](https://github.com/JuliaPerf/PProf.jl) +- [ProfileCanvas.jl](https://github.com/pfitzseb/ProfileCanvas.jl) +- VSCode's built-in profile visualizer (`@profview_allocs`) [docs needed] +- Viewing the results directly in the REPL + - You can inspect the results in the REPL via [`Profile.Allocs.fetch()`](@ref), to view + the stacktrace and type of each allocation. #### Line-by-Line Allocation Tracking diff --git a/src/gc-alloc-profiler.h b/src/gc-alloc-profiler.h index 3fd8bf4388a0a8..fcd8e45caa2d8f 100644 --- a/src/gc-alloc-profiler.h +++ b/src/gc-alloc-profiler.h @@ -35,6 +35,7 @@ void _maybe_record_alloc_to_profile(jl_value_t *val, size_t size, jl_datatype_t extern int g_alloc_profile_enabled; +// This should only be used from _deprecated_ code paths. We shouldn't see UNKNOWN anymore. #define jl_gc_unknown_type_tag ((jl_datatype_t*)0xdeadaa03) static inline void maybe_record_alloc_to_profile(jl_value_t *val, size_t size, jl_datatype_t *typ) JL_NOTSAFEPOINT { diff --git a/src/gc.c b/src/gc.c index 58b59bb1d1d33a..a5b4998a370a4c 100644 --- a/src/gc.c +++ b/src/gc.c @@ -1030,13 +1030,20 @@ STATIC_INLINE jl_value_t *jl_gc_big_alloc_inner(jl_ptls_t ptls, size_t sz) return jl_valueof(&v->header); } -// Instrumented version of jl_gc_big_alloc_inner, called into by LLVM-generated code. +// Deprecated version, supported for legacy code. JL_DLLEXPORT jl_value_t *jl_gc_big_alloc(jl_ptls_t ptls, size_t sz) { jl_value_t *val = jl_gc_big_alloc_inner(ptls, sz); maybe_record_alloc_to_profile(val, sz, jl_gc_unknown_type_tag); return val; } +// Instrumented version of jl_gc_big_alloc_inner, called into by LLVM-generated code. +JL_DLLEXPORT jl_value_t *jl_gc_big_alloc_instrumented(jl_ptls_t ptls, size_t sz, jl_value_t *type) +{ + jl_value_t *val = jl_gc_big_alloc_inner(ptls, sz); + maybe_record_alloc_to_profile(val, sz, (jl_datatype_t*)type); + return val; +} // This wrapper exists only to prevent `jl_gc_big_alloc_inner` from being inlined into // its callers. We provide an external-facing interface for callers, and inline `jl_gc_big_alloc_inner` @@ -1334,7 +1341,7 @@ STATIC_INLINE jl_value_t *jl_gc_pool_alloc_inner(jl_ptls_t ptls, int pool_offset return jl_valueof(v); } -// Instrumented version of jl_gc_pool_alloc_inner, called into by LLVM-generated code. +// Deprecated version, supported for legacy code. JL_DLLEXPORT jl_value_t *jl_gc_pool_alloc(jl_ptls_t ptls, int pool_offset, int osize) { @@ -1342,6 +1349,14 @@ JL_DLLEXPORT jl_value_t *jl_gc_pool_alloc(jl_ptls_t ptls, int pool_offset, maybe_record_alloc_to_profile(val, osize, jl_gc_unknown_type_tag); return val; } +// Instrumented version of jl_gc_pool_alloc_inner, called into by LLVM-generated code. +JL_DLLEXPORT jl_value_t *jl_gc_pool_alloc_instrumented(jl_ptls_t ptls, int pool_offset, + int osize, jl_value_t* type) +{ + jl_value_t *val = jl_gc_pool_alloc_inner(ptls, pool_offset, osize); + maybe_record_alloc_to_profile(val, osize, (jl_datatype_t*)type); + return val; +} // This wrapper exists only to prevent `jl_gc_pool_alloc_inner` from being inlined into // its callers. We provide an external-facing interface for callers, and inline `jl_gc_pool_alloc_inner` diff --git a/src/jl_exported_funcs.inc b/src/jl_exported_funcs.inc index ee255a0e1a8761..40ad295b30e587 100644 --- a/src/jl_exported_funcs.inc +++ b/src/jl_exported_funcs.inc @@ -158,6 +158,7 @@ XX(jl_gc_alloc_3w) \ XX(jl_gc_alloc_typed) \ XX(jl_gc_big_alloc) \ + XX(jl_gc_big_alloc_instrumented) \ XX(jl_gc_collect) \ XX(jl_gc_conservative_gc_support_enabled) \ XX(jl_gc_counted_calloc) \ @@ -185,6 +186,7 @@ XX(jl_gc_new_weakref_th) \ XX(jl_gc_num) \ XX(jl_gc_pool_alloc) \ + XX(jl_gc_pool_alloc_instrumented) \ XX(jl_gc_queue_multiroot) \ XX(jl_gc_queue_root) \ XX(jl_gc_safepoint) \ diff --git a/src/llvm-final-gc-lowering.cpp b/src/llvm-final-gc-lowering.cpp index e31bcb21199f5b..5defee9bd80bed 100644 --- a/src/llvm-final-gc-lowering.cpp +++ b/src/llvm-final-gc-lowering.cpp @@ -187,12 +187,13 @@ Value *FinalLowerGC::lowerSafepoint(CallInst *target, Function &F) Value *FinalLowerGC::lowerGCAllocBytes(CallInst *target, Function &F) { ++GCAllocBytesCount; - assert(target->arg_size() == 2); + assert(target->arg_size() == 3); CallInst *newI; IRBuilder<> builder(target); builder.SetCurrentDebugLocation(target->getDebugLoc()); auto ptls = target->getArgOperand(0); + auto type = target->getArgOperand(2); Attribute derefAttr; if (auto CI = dyn_cast(target->getArgOperand(1))) { @@ -203,19 +204,19 @@ Value *FinalLowerGC::lowerGCAllocBytes(CallInst *target, Function &F) if (offset < 0) { newI = builder.CreateCall( bigAllocFunc, - { ptls, ConstantInt::get(T_size, sz + sizeof(void*)) }); + { ptls, ConstantInt::get(T_size, sz + sizeof(void*)), type }); derefAttr = Attribute::getWithDereferenceableBytes(F.getContext(), sz + sizeof(void*)); } else { auto pool_offs = ConstantInt::get(Type::getInt32Ty(F.getContext()), offset); auto pool_osize = ConstantInt::get(Type::getInt32Ty(F.getContext()), osize); - newI = builder.CreateCall(poolAllocFunc, { ptls, pool_offs, pool_osize }); + newI = builder.CreateCall(poolAllocFunc, { ptls, pool_offs, pool_osize, type }); derefAttr = Attribute::getWithDereferenceableBytes(F.getContext(), osize); } } else { auto size = builder.CreateZExtOrTrunc(target->getArgOperand(1), T_size); size = builder.CreateAdd(size, ConstantInt::get(T_size, sizeof(void*))); - newI = builder.CreateCall(allocTypedFunc, { ptls, size, ConstantPointerNull::get(Type::getInt8PtrTy(F.getContext())) }); + newI = builder.CreateCall(allocTypedFunc, { ptls, size, type }); derefAttr = Attribute::getWithDereferenceableBytes(F.getContext(), sizeof(void*)); } newI->setAttributes(newI->getCalledFunction()->getAttributes()); diff --git a/src/llvm-late-gc-lowering.cpp b/src/llvm-late-gc-lowering.cpp index 63a71a9e180ac7..19dfb7c5a5be1a 100644 --- a/src/llvm-late-gc-lowering.cpp +++ b/src/llvm-late-gc-lowering.cpp @@ -2348,22 +2348,6 @@ bool LateLowerGCFrame::CleanupIR(Function &F, State *S, bool *CFGModified) { IRBuilder<> builder(CI); builder.SetCurrentDebugLocation(CI->getDebugLoc()); - // Create a call to the `julia.gc_alloc_bytes` intrinsic, which is like - // `julia.gc_alloc_obj` except it doesn't set the tag. - auto allocBytesIntrinsic = getOrDeclare(jl_intrinsics::GCAllocBytes); - auto ptlsLoad = get_current_ptls_from_task(builder, T_size, CI->getArgOperand(0), tbaa_gcframe); - auto ptls = builder.CreateBitCast(ptlsLoad, Type::getInt8PtrTy(builder.getContext())); - auto newI = builder.CreateCall( - allocBytesIntrinsic, - { - ptls, - builder.CreateIntCast( - CI->getArgOperand(1), - allocBytesIntrinsic->getFunctionType()->getParamType(1), - false) - }); - newI->takeName(CI); - // LLVM alignment/bit check is not happy about addrspacecast and refuse // to remove write barrier because of it. // We pretty much only load using `T_size` so try our best to strip @@ -2401,7 +2385,35 @@ bool LateLowerGCFrame::CleanupIR(Function &F, State *S, bool *CFGModified) { builder.CreateAlignmentAssumption(DL, tag, 16); } } - // Set the tag. + + // Create a call to the `julia.gc_alloc_bytes` intrinsic, which is like + // `julia.gc_alloc_obj` except it specializes the call based on the constant + // size of the object to allocate, to save one indirection, and doesn't set + // the type tag. (Note that if the size is not a constant, it will call + // gc_alloc_obj, and will redundantly set the tag.) + auto allocBytesIntrinsic = getOrDeclare(jl_intrinsics::GCAllocBytes); + auto ptlsLoad = get_current_ptls_from_task(builder, T_size, CI->getArgOperand(0), tbaa_gcframe); + auto ptls = builder.CreateBitCast(ptlsLoad, Type::getInt8PtrTy(builder.getContext())); + auto newI = builder.CreateCall( + allocBytesIntrinsic, + { + ptls, + builder.CreateIntCast( + CI->getArgOperand(1), + allocBytesIntrinsic->getFunctionType()->getParamType(1), + false), + builder.CreatePtrToInt(tag, T_size), + }); + newI->takeName(CI); + + // Now, finally, set the tag. We do this in IR instead of in the C alloc + // function, to provide possible optimization opportunities. (I think? TBH + // the most recent editor of this code is not entirely clear on why we + // prefer to set the tag in the generated code. Providing optimziation + // opportunities is the most likely reason; the tradeoff is slightly + // larger code size and increased compilation time, compiling this + // instruction at every allocation site, rather than once in the C alloc + // function.) auto &M = *builder.GetInsertBlock()->getModule(); StoreInst *store = builder.CreateAlignedStore( tag, EmitTagPtr(builder, tag_type, T_size, newI), M.getDataLayout().getPointerABIAlignment(0)); diff --git a/src/llvm-pass-helpers.cpp b/src/llvm-pass-helpers.cpp index b006f191937f54..39d37cee3928bd 100644 --- a/src/llvm-pass-helpers.cpp +++ b/src/llvm-pass-helpers.cpp @@ -151,7 +151,9 @@ namespace jl_intrinsics { auto intrinsic = Function::Create( FunctionType::get( T_prjlvalue, - { Type::getInt8PtrTy(ctx), T_size }, + { Type::getInt8PtrTy(ctx), + T_size, + T_size }, // type false), Function::ExternalLinkage, GC_ALLOC_BYTES_NAME); @@ -236,8 +238,8 @@ namespace jl_intrinsics { } namespace jl_well_known { - static const char *GC_BIG_ALLOC_NAME = XSTR(jl_gc_big_alloc); - static const char *GC_POOL_ALLOC_NAME = XSTR(jl_gc_pool_alloc); + static const char *GC_BIG_ALLOC_NAME = XSTR(jl_gc_big_alloc_instrumented); + static const char *GC_POOL_ALLOC_NAME = XSTR(jl_gc_pool_alloc_instrumented); static const char *GC_QUEUE_ROOT_NAME = XSTR(jl_gc_queue_root); static const char *GC_ALLOC_TYPED_NAME = XSTR(jl_gc_alloc_typed); @@ -251,7 +253,7 @@ namespace jl_well_known { auto bigAllocFunc = Function::Create( FunctionType::get( T_prjlvalue, - { Type::getInt8PtrTy(ctx), T_size }, + { Type::getInt8PtrTy(ctx), T_size , T_size}, false), Function::ExternalLinkage, GC_BIG_ALLOC_NAME); @@ -267,7 +269,7 @@ namespace jl_well_known { auto poolAllocFunc = Function::Create( FunctionType::get( T_prjlvalue, - { Type::getInt8PtrTy(ctx), Type::getInt32Ty(ctx), Type::getInt32Ty(ctx) }, + { Type::getInt8PtrTy(ctx), Type::getInt32Ty(ctx), Type::getInt32Ty(ctx), T_size }, false), Function::ExternalLinkage, GC_POOL_ALLOC_NAME); @@ -301,7 +303,7 @@ namespace jl_well_known { T_prjlvalue, { Type::getInt8PtrTy(ctx), T_size, - Type::getInt8PtrTy(ctx) }, + T_size }, // type false), Function::ExternalLinkage, GC_ALLOC_TYPED_NAME); diff --git a/stdlib/Profile/test/allocs.jl b/stdlib/Profile/test/allocs.jl index c2ec7d2f6cb542..ae0cbab945f01b 100644 --- a/stdlib/Profile/test/allocs.jl +++ b/stdlib/Profile/test/allocs.jl @@ -121,3 +121,34 @@ end @test length(prof.allocs) >= 1 @test length([a for a in prof.allocs if a.type == String]) >= 1 end + +@testset "alloc profiler catches allocs from codegen" begin + @eval begin + struct MyType x::Int; y::Int end + Base.:(+)(n::Number, x::MyType) = n + x.x + x.y + foo(a, x) = a[1] + x + wrapper(a) = foo(a, MyType(0,1)) + end + a = Any[1,2,3] + # warmup + wrapper(a) + + @eval Allocs.@profile sample_rate=1 wrapper($a) + + prof = Allocs.fetch() + Allocs.clear() + + @test length(prof.allocs) >= 1 + @test length([a for a in prof.allocs if a.type == MyType]) >= 1 +end + +@testset "alloc profiler catches allocs from buffer resize" begin + a = Int[] + Allocs.@profile sample_rate=1 for _ in 1:100; push!(a, 1); end + + prof = Allocs.fetch() + Allocs.clear() + + @test length(prof.allocs) >= 1 + @test length([a for a in prof.allocs if a.type == Profile.Allocs.BufferType]) >= 1 +end diff --git a/test/llvmpasses/alloc-opt-gcframe.ll b/test/llvmpasses/alloc-opt-gcframe.ll index a04d6566cec0a2..f600399ac2a7a4 100644 --- a/test/llvmpasses/alloc-opt-gcframe.ll +++ b/test/llvmpasses/alloc-opt-gcframe.ll @@ -14,17 +14,17 @@ target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128" ; CHECK-NOT: @julia.gc_alloc_obj ; TYPED: %current_task = getelementptr inbounds {}*, {}** %gcstack, i64 -12 -; TYPED-NEXT: [[ptls_field:%.*]] = getelementptr inbounds {}*, {}** %current_task, i64 16 +; TYPED: [[ptls_field:%.*]] = getelementptr inbounds {}*, {}** %current_task, i64 16 ; TYPED-NEXT: [[ptls_load:%.*]] = load {}*, {}** [[ptls_field]], align 8, !tbaa !0 ; TYPED-NEXT: [[ppjl_ptls:%.*]] = bitcast {}* [[ptls_load]] to {}** ; TYPED-NEXT: [[ptls_i8:%.*]] = bitcast {}** [[ppjl_ptls]] to i8* -; TYPED-NEXT: %v = call noalias nonnull dereferenceable({{[0-9]+}}) {} addrspace(10)* @ijl_gc_pool_alloc(i8* [[ptls_i8]], i32 [[SIZE_T:[0-9]+]], i32 16) +; TYPED-NEXT: %v = call noalias nonnull dereferenceable({{[0-9]+}}) {} addrspace(10)* @ijl_gc_pool_alloc_instrumented(i8* [[ptls_i8]], i32 [[SIZE_T:[0-9]+]], i32 16, i64 {{.*}} @tag {{.*}}) ; TYPED: store atomic {} addrspace(10)* @tag, {} addrspace(10)* addrspace(10)* {{.*}} unordered, align 8, !tbaa !4 ; OPAQUE: %current_task = getelementptr inbounds ptr, ptr %gcstack, i64 -12 -; OPAQUE-NEXT: [[ptls_field:%.*]] = getelementptr inbounds ptr, ptr %current_task, i64 16 +; OPAQUE: [[ptls_field:%.*]] = getelementptr inbounds ptr, ptr %current_task, i64 16 ; OPAQUE-NEXT: [[ptls_load:%.*]] = load ptr, ptr [[ptls_field]], align 8, !tbaa !0 -; OPAQUE-NEXT: %v = call noalias nonnull dereferenceable({{[0-9]+}}) ptr addrspace(10) @ijl_gc_pool_alloc(ptr [[ptls_load]], i32 [[SIZE_T:[0-9]+]], i32 16) +; OPAQUE-NEXT: %v = call noalias nonnull dereferenceable({{[0-9]+}}) ptr addrspace(10) @ijl_gc_pool_alloc_instrumented(ptr [[ptls_load]], i32 [[SIZE_T:[0-9]+]], i32 16, i64 {{.*}} @tag {{.*}}) ; OPAQUE: store atomic ptr addrspace(10) @tag, ptr addrspace(10) {{.*}} unordered, align 8, !tbaa !4 define {} addrspace(10)* @return_obj() { @@ -270,11 +270,11 @@ L3: } ; CHECK-LABEL: }{{$}} -; TYPED: declare noalias nonnull {} addrspace(10)* @ijl_gc_pool_alloc(i8*, -; TYPED: declare noalias nonnull {} addrspace(10)* @ijl_gc_big_alloc(i8*, +; TYPED: declare noalias nonnull {} addrspace(10)* @ijl_gc_pool_alloc_instrumented(i8*, +; TYPED: declare noalias nonnull {} addrspace(10)* @ijl_gc_big_alloc_instrumented(i8*, -; OPAQUE: declare noalias nonnull ptr addrspace(10) @ijl_gc_pool_alloc(ptr, -; OPAQUE: declare noalias nonnull ptr addrspace(10) @ijl_gc_big_alloc(ptr, +; OPAQUE: declare noalias nonnull ptr addrspace(10) @ijl_gc_pool_alloc_instrumented(ptr, +; OPAQUE: declare noalias nonnull ptr addrspace(10) @ijl_gc_big_alloc_instrumented(ptr, declare void @external_function() declare {}*** @julia.get_pgcstack() declare noalias nonnull {} addrspace(10)* @julia.gc_alloc_obj({}**, i64, {} addrspace(10)*) diff --git a/test/llvmpasses/final-lower-gc.ll b/test/llvmpasses/final-lower-gc.ll index 5bbaa2f4d81ea9..64d2c06e534b22 100644 --- a/test/llvmpasses/final-lower-gc.ll +++ b/test/llvmpasses/final-lower-gc.ll @@ -18,7 +18,7 @@ declare noalias nonnull {} addrspace(10)** @julia.new_gc_frame(i32) declare void @julia.push_gc_frame({} addrspace(10)**, i32) declare {} addrspace(10)** @julia.get_gc_frame_slot({} addrspace(10)**, i32) declare void @julia.pop_gc_frame({} addrspace(10)**) -declare noalias nonnull {} addrspace(10)* @julia.gc_alloc_bytes(i8*, i64) #0 +declare noalias nonnull {} addrspace(10)* @julia.gc_alloc_bytes(i8*, i64, i64) #0 attributes #0 = { allocsize(1) } @@ -80,9 +80,9 @@ top: %pgcstack = call {}*** @julia.get_pgcstack() %ptls = call {}*** @julia.ptls_states() %ptls_i8 = bitcast {}*** %ptls to i8* -; TYPED: %v = call noalias nonnull dereferenceable({{[0-9]+}}) {} addrspace(10)* @ijl_gc_pool_alloc -; OPAQUE: %v = call noalias nonnull dereferenceable({{[0-9]+}}) ptr addrspace(10) @ijl_gc_pool_alloc - %v = call {} addrspace(10)* @julia.gc_alloc_bytes(i8* %ptls_i8, i64 8) +; TYPED: %v = call noalias nonnull dereferenceable({{[0-9]+}}) {} addrspace(10)* @ijl_gc_pool_alloc_instrumented +; OPAQUE: %v = call noalias nonnull dereferenceable({{[0-9]+}}) ptr addrspace(10) @ijl_gc_pool_alloc_instrumented + %v = call {} addrspace(10)* @julia.gc_alloc_bytes(i8* %ptls_i8, i64 8, i64 12341234) %0 = bitcast {} addrspace(10)* %v to {} addrspace(10)* addrspace(10)* %1 = getelementptr {} addrspace(10)*, {} addrspace(10)* addrspace(10)* %0, i64 -1 store {} addrspace(10)* @tag, {} addrspace(10)* addrspace(10)* %1, align 8, !tbaa !0 @@ -96,9 +96,9 @@ top: %ptls = call {}*** @julia.ptls_states() %ptls_i8 = bitcast {}*** %ptls to i8* ; CHECK: %0 = add i64 %size, 8 -; TYPED: %v = call noalias nonnull dereferenceable(8) {} addrspace(10)* @ijl_gc_alloc_typed(i8* %ptls_i8, i64 %0, i8* null) -; OPAQUE: %v = call noalias nonnull dereferenceable(8) ptr addrspace(10) @ijl_gc_alloc_typed(ptr %ptls_i8, i64 %0, ptr null) - %v = call {} addrspace(10)* @julia.gc_alloc_bytes(i8* %ptls_i8, i64 %size) +; TYPED: %v = call noalias nonnull dereferenceable(8) {} addrspace(10)* @ijl_gc_alloc_typed(i8* %ptls_i8, i64 %0, i64 12341234) +; OPAQUE: %v = call noalias nonnull dereferenceable(8) ptr addrspace(10) @ijl_gc_alloc_typed(ptr %ptls_i8, i64 %0, i64 12341234) + %v = call {} addrspace(10)* @julia.gc_alloc_bytes(i8* %ptls_i8, i64 %size, i64 12341234) %0 = bitcast {} addrspace(10)* %v to {} addrspace(10)* addrspace(10)* %1 = getelementptr {} addrspace(10)*, {} addrspace(10)* addrspace(10)* %0, i64 -1 store {} addrspace(10)* @tag, {} addrspace(10)* addrspace(10)* %1, align 8, !tbaa !0 diff --git a/test/llvmpasses/late-lower-gc-addrspaces.ll b/test/llvmpasses/late-lower-gc-addrspaces.ll index 9849f432fb9a71..8021aab6c99a3f 100644 --- a/test/llvmpasses/late-lower-gc-addrspaces.ll +++ b/test/llvmpasses/late-lower-gc-addrspaces.ll @@ -69,7 +69,7 @@ top: ; TYPED-NEXT: [[ptls_load:%.*]] = load {}*, {}** [[ptls_field]], align 8, !tbaa !0 ; TYPED-NEXT: [[ppjl_ptls:%.*]] = bitcast {}* [[ptls_load]] to {}** ; TYPED-NEXT: [[ptls_i8:%.*]] = bitcast {}** [[ppjl_ptls]] to i8* -; TYPED-NEXT: %v = call {} addrspace(10)* @julia.gc_alloc_bytes(i8* [[ptls_i8]], [[SIZE_T:i.[0-9]+]] 8) +; TYPED-NEXT: %v = call {} addrspace(10)* @julia.gc_alloc_bytes(i8* [[ptls_i8]], [[SIZE_T:i.[0-9]+]] 8, i64 {{.*}} @tag {{.*}}) ; TYPED-NEXT: [[V2:%.*]] = bitcast {} addrspace(10)* %v to {} addrspace(10)* addrspace(10)* ; TYPED-NEXT: [[V_HEADROOM:%.*]] = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(10)* [[V2]], i64 -1 ; TYPED-NEXT: store atomic {} addrspace(10)* @tag, {} addrspace(10)* addrspace(10)* [[V_HEADROOM]] unordered, align 8, !tbaa !4 @@ -77,7 +77,7 @@ top: ; OPAQUE: %current_task = getelementptr inbounds ptr, ptr %0, i64 -12 ; OPAQUE-NEXT: [[ptls_field:%.*]] = getelementptr inbounds ptr, ptr %current_task, i64 16 ; OPAQUE-NEXT: [[ptls_load:%.*]] = load ptr, ptr [[ptls_field]], align 8, !tbaa !0 -; OPAQUE-NEXT: %v = call ptr addrspace(10) @julia.gc_alloc_bytes(ptr [[ptls_load]], [[SIZE_T:i.[0-9]+]] 8) +; OPAQUE-NEXT: %v = call ptr addrspace(10) @julia.gc_alloc_bytes(ptr [[ptls_load]], [[SIZE_T:i.[0-9]+]] 8, i64 {{.*}} @tag {{.*}}) ; OPAQUE-NEXT: [[V_HEADROOM:%.*]] = getelementptr inbounds ptr addrspace(10), ptr addrspace(10) %v, i64 -1 ; OPAQUE-NEXT: store atomic ptr addrspace(10) @tag, ptr addrspace(10) [[V_HEADROOM]] unordered, align 8, !tbaa !4 %v = call noalias {} addrspace(10)* @julia.gc_alloc_obj({}** %current_task, i64 8, {} addrspace(10)* @tag) @@ -102,7 +102,7 @@ top: ; TYPED-NEXT: [[ptls_load:%.*]] = load {}*, {}** [[ptls_field]], align 8, !tbaa !0 ; TYPED-NEXT: [[ppjl_ptls:%.*]] = bitcast {}* [[ptls_load]] to {}** ; TYPED-NEXT: [[ptls_i8:%.*]] = bitcast {}** [[ppjl_ptls]] to i8* -; TYPED-NEXT: %v = call {} addrspace(10)* @julia.gc_alloc_bytes(i8* [[ptls_i8]], [[SIZE_T:i.[0-9]+]] 8) +; TYPED-NEXT: %v = call {} addrspace(10)* @julia.gc_alloc_bytes(i8* [[ptls_i8]], [[SIZE_T:i.[0-9]+]] 8, i64 {{.*}} @tag {{.*}}) ; TYPED-NEXT: [[V2:%.*]] = bitcast {} addrspace(10)* %v to {} addrspace(10)* addrspace(10)* ; TYPED-NEXT: [[V_HEADROOM:%.*]] = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(10)* [[V2]], i64 -1 ; TYPED-NEXT: store atomic {} addrspace(10)* @tag, {} addrspace(10)* addrspace(10)* [[V_HEADROOM]] unordered, align 8, !tbaa !4 @@ -110,7 +110,7 @@ top: ; OPAQUE: %current_task = getelementptr inbounds ptr, ptr %0, i64 -12 ; OPAQUE-NEXT: [[ptls_field:%.*]] = getelementptr inbounds ptr, ptr %current_task, i64 16 ; OPAQUE-NEXT: [[ptls_load:%.*]] = load ptr, ptr [[ptls_field]], align 8, !tbaa !0 -; OPAQUE-NEXT: %v = call ptr addrspace(10) @julia.gc_alloc_bytes(ptr [[ptls_load]], [[SIZE_T:i.[0-9]+]] 8) +; OPAQUE-NEXT: %v = call ptr addrspace(10) @julia.gc_alloc_bytes(ptr [[ptls_load]], [[SIZE_T:i.[0-9]+]] 8, i64 {{.*}} @tag {{.*}}) ; OPAQUE-NEXT: [[V_HEADROOM:%.*]] = getelementptr inbounds ptr addrspace(10), ptr addrspace(10) %v, i64 -1 ; OPAQUE-NEXT: store atomic ptr addrspace(10) @tag, ptr addrspace(10) [[V_HEADROOM]] unordered, align 8, !tbaa !4 %v = call noalias {} addrspace(10)* @julia.gc_alloc_obj({}** %current_task, i64 8, {} addrspace(10)* @tag) diff --git a/test/llvmpasses/late-lower-gc.ll b/test/llvmpasses/late-lower-gc.ll index 36e581993c1760..26eec245c2de9b 100644 --- a/test/llvmpasses/late-lower-gc.ll +++ b/test/llvmpasses/late-lower-gc.ll @@ -66,7 +66,7 @@ top: ; TYPED-NEXT: [[ptls_load:%.*]] = load {}*, {}** [[ptls_field]], align 8, !tbaa !0 ; TYPED-NEXT: [[ppjl_ptls:%.*]] = bitcast {}* [[ptls_load]] to {}** ; TYPED-NEXT: [[ptls_i8:%.*]] = bitcast {}** [[ppjl_ptls]] to i8* -; TYPED-NEXT: %v = call {} addrspace(10)* @julia.gc_alloc_bytes(i8* [[ptls_i8]], [[SIZE_T:i.[0-9]+]] 8) +; TYPED-NEXT: %v = call {} addrspace(10)* @julia.gc_alloc_bytes(i8* [[ptls_i8]], [[SIZE_T:i.[0-9]+]] 8, i64 {{.*}} @tag {{.*}}) ; TYPED-NEXT: [[V2:%.*]] = bitcast {} addrspace(10)* %v to {} addrspace(10)* addrspace(10)* ; TYPED-NEXT: [[V_HEADROOM:%.*]] = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(10)* [[V2]], i64 -1 ; TYPED-NEXT: store atomic {} addrspace(10)* @tag, {} addrspace(10)* addrspace(10)* [[V_HEADROOM]] unordered, align 8, !tbaa !4 @@ -74,7 +74,7 @@ top: ; OPAQUE: %current_task = getelementptr inbounds ptr, ptr %0, i64 -12 ; OPAQUE-NEXT: [[ptls_field:%.*]] = getelementptr inbounds ptr, ptr %current_task, i64 16 ; OPAQUE-NEXT: [[ptls_load:%.*]] = load ptr, ptr [[ptls_field]], align 8, !tbaa !0 -; OPAQUE-NEXT: %v = call ptr addrspace(10) @julia.gc_alloc_bytes(ptr [[ptls_load]], [[SIZE_T:i.[0-9]+]] 8) +; OPAQUE-NEXT: %v = call ptr addrspace(10) @julia.gc_alloc_bytes(ptr [[ptls_load]], [[SIZE_T:i.[0-9]+]] 8, i64 {{.*}} @tag {{.*}}) ; OPAQUE-NEXT: [[V_HEADROOM:%.*]] = getelementptr inbounds ptr addrspace(10), ptr addrspace(10) %v, i64 -1 ; OPAQUE-NEXT: store atomic ptr addrspace(10) @tag, ptr addrspace(10) [[V_HEADROOM]] unordered, align 8, !tbaa !4 %v = call noalias {} addrspace(10)* @julia.gc_alloc_obj({}** %current_task, i64 8, {} addrspace(10)* @tag) @@ -99,7 +99,7 @@ top: ; TYPED-NEXT: [[ptls_load:%.*]] = load {}*, {}** [[ptls_field]], align 8, !tbaa !0 ; TYPED-NEXT: [[ppjl_ptls:%.*]] = bitcast {}* [[ptls_load]] to {}** ; TYPED-NEXT: [[ptls_i8:%.*]] = bitcast {}** [[ppjl_ptls]] to i8* -; TYPED-NEXT: %v = call {} addrspace(10)* @julia.gc_alloc_bytes(i8* [[ptls_i8]], [[SIZE_T:i.[0-9]+]] 8) +; TYPED-NEXT: %v = call {} addrspace(10)* @julia.gc_alloc_bytes(i8* [[ptls_i8]], [[SIZE_T:i.[0-9]+]] 8, i64 {{.*}} @tag {{.*}}) ; TYPED-NEXT: [[V2:%.*]] = bitcast {} addrspace(10)* %v to {} addrspace(10)* addrspace(10)* ; TYPED-NEXT: [[V_HEADROOM:%.*]] = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(10)* [[V2]], i64 -1 ; TYPED-NEXT: store atomic {} addrspace(10)* @tag, {} addrspace(10)* addrspace(10)* [[V_HEADROOM]] unordered, align 8, !tbaa !4 @@ -107,7 +107,7 @@ top: ; OPAQUE: %current_task = getelementptr inbounds ptr, ptr %0, i64 -12 ; OPAQUE-NEXT: [[ptls_field:%.*]] = getelementptr inbounds ptr, ptr %current_task, i64 16 ; OPAQUE-NEXT: [[ptls_load:%.*]] = load ptr, ptr [[ptls_field]], align 8, !tbaa !0 -; OPAQUE-NEXT: %v = call ptr addrspace(10) @julia.gc_alloc_bytes(ptr [[ptls_load]], [[SIZE_T:i.[0-9]+]] 8) +; OPAQUE-NEXT: %v = call ptr addrspace(10) @julia.gc_alloc_bytes(ptr [[ptls_load]], [[SIZE_T:i.[0-9]+]] 8, i64 {{.*}} @tag {{.*}}) ; OPAQUE-NEXT: [[V_HEADROOM:%.*]] = getelementptr inbounds ptr addrspace(10), ptr addrspace(10) %v, i64 -1 ; OPAQUE-NEXT: store atomic ptr addrspace(10) @tag, ptr addrspace(10) [[V_HEADROOM]] unordered, align 8, !tbaa !4 %v = call noalias {} addrspace(10)* @julia.gc_alloc_obj({}** %current_task, i64 8, {} addrspace(10)* @tag)