Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optixTrace #2

Closed
diwot opened this issue Dec 8, 2021 · 4 comments · Fixed by #8
Closed

optixTrace #2

diwot opened this issue Dec 8, 2021 · 4 comments · Fixed by #8

Comments

@diwot
Copy link

diwot commented Dec 8, 2021

Hi,
this project looks extremely interesting. I was wondering if everything required to ray trace a triangle mesh is available. I searched for optixTrace which seems to be used for ray generation in the corresponding shader and I did not find any matches. I'm not an Optix expert, maybe there are other ray generation techniques. I was also wondering if this project was just a one time experiment or if further development might happen. Last but not least, would it be possible to fully benefit from the hardware ray tracing acceleration that the latest GPU provide?

@NullandKale
Copy link
Contributor

As far as I know optixTrace is not implemented yet.

Once completed ILGPU.optix should run at the same performance level as the c++ optix library.

@MoFtZ
Copy link
Owner

MoFtZ commented Dec 9, 2021

hi @diwot, thanks for your interest in this project. I initially started this project to help the community, because the bindings in OptiX are much more complex than the other Cuda libraries. This in turn has led to the implementation of new features in the core ILGPU library in order to make OptiX support easier.

I am not an OptiX expert either, and have not actually written an OptiX program beyond the sample projects. Unfortunately, the OptiX API is particularly large, so if you find missing APIs (e.g. optixTrace) that are important, we can look at adding support.

@MoFtZ
Copy link
Owner

MoFtZ commented Dec 9, 2021

OK, looks like we may need to make a change to the core ILGPU library to fully support this.

The optixTrace macro has several overloads. The largest overload expands to an ASM call with 44 arguments - 32 of these arguments are passed by reference.
https://raytracing-docs.nvidia.com/optix7/api/group__optix__device__api.html#ga98210ff2d6a9287a7371facd99802500

ILGPU currently only supports one output argument, and a total of 10 arguments.

@diwot
Copy link
Author

diwot commented Jan 12, 2022

Thanks for all this information. I tried a bit getting to run optixTrace myself but it's more complicated than I thought. To get started I implemented optixGetPayloadX but the generated ptx call puts constant method call arguments directly into the call but ptx seems to expect them in registers. A ptx file generated by a c++ cuda project emitted

mov.u32 	%r2, 0;
// inline asm
call (%r1), _optix_get_payload, (%r2);

while my code led to
call ( %r4), _optix_get_payload, (0);
I found a work around that compiled but generated some unnecessary instructions

public static uint OptixGetPayload(uint playloadIndex)
{
     //Produced ptx is not optimal because of unnecessary st.local.b32 and ld.local.b32 but at least it works. Without mov, there is a compiler error
    uint result;
    uint tmp;
    CudaAsm.Emit("mov.u32 %0, %1;", out tmp, playloadIndex);            
    CudaAsm.Emit("call (%0), _optix_get_payload, (%1);", out result, tmp);
    return result;
}

With optixTrace I ran into the register problem too, but with a lot more arguments my work around is not useful anymore. Also CudaAsm.Emit seems to take out or in arguments but the p0, p1, p2... arguments of optixTrace seem to be passed by reference (The file optix_7_device_impl.h from the optix SDK served as reference to learn how to compose the ptx instruction). Maybe ref arguments would help there

public static void OptixTrace(IntPtr handle,
                                                   float3 rayOrigin,
                                                   float3 rayDirection,
                                                   float tmin,
                                                   float tmax,
                                                   float rayTime,
                                                   uint visibilityMask,
                                                   uint rayFlags,
                                                   uint SBToffset,
                                                   uint SBTstride,
                                                   uint missSBTIndex)
        {
            float ox = rayOrigin.X, oy = rayOrigin.Y, oz = rayOrigin.Z;
            float dx = rayDirection.X, dy = rayDirection.Y, dz = rayDirection.Z;
            uint p0 = 0, p1 = 0, p2 = 0, p3 = 0, p4 = 0, p5 = 0, p6 = 0, p7 = 0, p8 = 0, p9 = 0, p10 = 0, p11 = 0, p12 = 0, p13 = 0, p14 = 0, p15 = 0, p16 = 0, p17 = 0, p18 = 0, p19 = 0, p20 = 0, p21 = 0,
                p22 = 0, p23 = 0, p24 = 0, p25 = 0, p26 = 0, p27 = 0, p28 = 0, p29 = 0, p30 = 0, p31 = 0;

            CudaAsm.Emit("call" +
                "(%0,%1,%2,%3,%4,%5,%6,%7,%8,%9,%10,%11,%12,%13,%14,%15,%16,%17,%18,%19,%20,%21,%22,%23,%24,%25,%26,%27,%28,%" +
                "29,%30,%31)," +
                "_optix_trace_typed_32," +
                "(%32,%33,%34,%35,%36,%37,%38,%39,%40,%41,%42,%43,%44,%45,%46,%47,%48,%49,%50,%51,%52,%53,%54,%55,%56,%57,%58,%" +
                "59,%60,%61,%62,%63,%64,%65,%66,%67,%68,%69,%70,%71,%72,%73,%74,%75,%76,%77,%78,%79,%80);",
                out p0, out p1, out p2, out p3, out p4, out p5, out p6, out p7, out p8, out p9, out p10, out p11, out p12, out p13, out p14, out p15, out p16, out p17, out p18, out p19, out p20, out p21,
                out p22, out p23, out p24, out p25, out p26, out p27, out p28, out p29, out p30, out p31,
                0, handle, ox, oy, oz, dx, dy, dz, tmin, tmax, rayTime, visibilityMask, rayFlags, SBToffset, SBTstride,
                missSBTIndex, 0, p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, p10, p11, p12, p13, p14, p15, p16, p17, p18, p19, p20, p21, p22, p23, p24, p25, p26, p27, p28, p29, p30, p31);
        }

Anyway it's not a big deal that this is not working yet. I just wanted to try if I can get it to run and thought I would share my findings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants