Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[sokol_gfx d3d11] update_image is wrong for 3d textures #1063

Closed
jakubtomsu opened this issue Jun 14, 2024 · 10 comments
Closed

[sokol_gfx d3d11] update_image is wrong for 3d textures #1063

jakubtomsu opened this issue Jun 14, 2024 · 10 comments

Comments

@jakubtomsu
Copy link
Contributor

I ran into a bug where the uploaded 3d texture was wrong in a standalone application, but it was fine in renderdoc. I believe it's related to how the data is copied into the mapped subresource in update_image. There is also this comment:

sokol/sokol_gfx.h

Lines 11461 to 11465 in c54523c

_sg_stats_add(d3d11.num_map, 1);
if (SUCCEEDED(hr)) {
// FIXME: need to handle difference in depth-pitch for 3D textures as well!
if (src_pitch == (int)d3d11_msr.RowPitch) {
memcpy(d3d11_msr.pData, slice_ptr, slice_size);

I think I'm gonna work around this issue in my renderer code with sg.d3d11_query_image_info and calling D3D11 mysel, as a quick fix for now. But I could look into fixing this issue later.

@floooh
Copy link
Owner

floooh commented Jun 14, 2024

Deleted my previous comment because I wrote rubbish :D But maybe the tex3d sample (https://github.com/floooh/sokol-samples/blob/master/sapp/tex3d-sapp.c) can still be helpful even if it doesn't dynamically upload data each frame, but maybe it can still be a good playground to investigate the issue.

One thing that's different from other 3D backends is that there's separate code to populate a new texture, or update an existing texture. For new textures this is the important code:

sokol/sokol_gfx.h

Lines 10408 to 10434 in c54523c

_SOKOL_PRIVATE void _sg_d3d11_fill_subres_data(const _sg_image_t* img, const sg_image_data* data) {
const int num_faces = (img->cmn.type == SG_IMAGETYPE_CUBE) ? 6:1;
const int num_slices = (img->cmn.type == SG_IMAGETYPE_ARRAY) ? img->cmn.num_slices:1;
int subres_index = 0;
for (int face_index = 0; face_index < num_faces; face_index++) {
for (int slice_index = 0; slice_index < num_slices; slice_index++) {
for (int mip_index = 0; mip_index < img->cmn.num_mipmaps; mip_index++, subres_index++) {
SOKOL_ASSERT(subres_index < (SG_MAX_MIPMAPS * SG_MAX_TEXTUREARRAY_LAYERS));
D3D11_SUBRESOURCE_DATA* subres_data = &_sg.d3d11.subres_data[subres_index];
const int mip_width = _sg_miplevel_dim(img->cmn.width, mip_index);
const int mip_height = _sg_miplevel_dim(img->cmn.height, mip_index);
const sg_range* subimg_data = &(data->subimage[face_index][mip_index]);
const size_t slice_size = subimg_data->size / (size_t)num_slices;
const size_t slice_offset = slice_size * (size_t)slice_index;
const uint8_t* ptr = (const uint8_t*) subimg_data->ptr;
subres_data->pSysMem = ptr + slice_offset;
subres_data->SysMemPitch = (UINT)_sg_row_pitch(img->cmn.pixel_format, mip_width, 1);
if (img->cmn.type == SG_IMAGETYPE_3D) {
// FIXME? const int mip_depth = _sg_miplevel_dim(img->depth, mip_index);
subres_data->SysMemSlicePitch = (UINT)_sg_surface_pitch(img->cmn.pixel_format, mip_width, mip_height, 1);
} else {
subres_data->SysMemSlicePitch = 0;
}
}
}
}
}

...while for updating an existing texture this code is used:

sokol/sokol_gfx.h

Lines 11448 to 11483 in c54523c

D3D11_MAPPED_SUBRESOURCE d3d11_msr;
for (int face_index = 0; face_index < num_faces; face_index++) {
for (int slice_index = 0; slice_index < num_slices; slice_index++) {
for (int mip_index = 0; mip_index < img->cmn.num_mipmaps; mip_index++, subres_index++) {
SOKOL_ASSERT(subres_index < (SG_MAX_MIPMAPS * SG_MAX_TEXTUREARRAY_LAYERS));
const int mip_width = _sg_miplevel_dim(img->cmn.width, mip_index);
const int mip_height = _sg_miplevel_dim(img->cmn.height, mip_index);
const int src_pitch = _sg_row_pitch(img->cmn.pixel_format, mip_width, 1);
const sg_range* subimg_data = &(data->subimage[face_index][mip_index]);
const size_t slice_size = subimg_data->size / (size_t)num_slices;
const size_t slice_offset = slice_size * (size_t)slice_index;
const uint8_t* slice_ptr = ((const uint8_t*)subimg_data->ptr) + slice_offset;
hr = _sg_d3d11_Map(_sg.d3d11.ctx, img->d3d11.res, subres_index, D3D11_MAP_WRITE_DISCARD, 0, &d3d11_msr);
_sg_stats_add(d3d11.num_map, 1);
if (SUCCEEDED(hr)) {
// FIXME: need to handle difference in depth-pitch for 3D textures as well!
if (src_pitch == (int)d3d11_msr.RowPitch) {
memcpy(d3d11_msr.pData, slice_ptr, slice_size);
} else {
SOKOL_ASSERT(src_pitch < (int)d3d11_msr.RowPitch);
const uint8_t* src_ptr = slice_ptr;
uint8_t* dst_ptr = (uint8_t*) d3d11_msr.pData;
for (int row_index = 0; row_index < mip_height; row_index++) {
memcpy(dst_ptr, src_ptr, (size_t)src_pitch);
src_ptr += src_pitch;
dst_ptr += d3d11_msr.RowPitch;
}
}
_sg_d3d11_Unmap(_sg.d3d11.ctx, img->d3d11.res, subres_index);
_sg_stats_add(d3d11.num_unmap, 1);
} else {
_SG_ERROR(D3D11_MAP_FOR_UPDATE_IMAGE_FAILED);
}
}
}
}

...both loop over face_index, slice_index and mip_index.

It could be that the bug is in both places, or only in the update-image code, maybe it makes sense to share a bit of code between those two places (not sure yet if that makes sense though).

In the Metal and WebGPU backends, creating and updating a texture uses a common helper function:

  • Metal:

    sokol/sokol_gfx.h

    Lines 12288 to 12334 in c54523c

    _SOKOL_PRIVATE void _sg_mtl_copy_image_data(const _sg_image_t* img, __unsafe_unretained id<MTLTexture> mtl_tex, const sg_image_data* data) {
    const int num_faces = (img->cmn.type == SG_IMAGETYPE_CUBE) ? 6:1;
    const int num_slices = (img->cmn.type == SG_IMAGETYPE_ARRAY) ? img->cmn.num_slices : 1;
    for (int face_index = 0; face_index < num_faces; face_index++) {
    for (int mip_index = 0; mip_index < img->cmn.num_mipmaps; mip_index++) {
    SOKOL_ASSERT(data->subimage[face_index][mip_index].ptr);
    SOKOL_ASSERT(data->subimage[face_index][mip_index].size > 0);
    const uint8_t* data_ptr = (const uint8_t*)data->subimage[face_index][mip_index].ptr;
    const int mip_width = _sg_miplevel_dim(img->cmn.width, mip_index);
    const int mip_height = _sg_miplevel_dim(img->cmn.height, mip_index);
    // special case PVRTC formats: bytePerRow and bytesPerImage must be 0
    int bytes_per_row = 0;
    int bytes_per_slice = 0;
    if (!_sg_mtl_is_pvrtc(img->cmn.pixel_format)) {
    bytes_per_row = _sg_row_pitch(img->cmn.pixel_format, mip_width, 1);
    bytes_per_slice = _sg_surface_pitch(img->cmn.pixel_format, mip_width, mip_height, 1);
    }
    /* bytesPerImage special case: https://developer.apple.com/documentation/metal/mtltexture/1515679-replaceregion
    "Supply a nonzero value only when you copy data to a MTLTextureType3D type texture"
    */
    MTLRegion region;
    int bytes_per_image;
    if (img->cmn.type == SG_IMAGETYPE_3D) {
    const int mip_depth = _sg_miplevel_dim(img->cmn.num_slices, mip_index);
    region = MTLRegionMake3D(0, 0, 0, (NSUInteger)mip_width, (NSUInteger)mip_height, (NSUInteger)mip_depth);
    bytes_per_image = bytes_per_slice;
    // FIXME: apparently the minimal bytes_per_image size for 3D texture is 4 KByte... somehow need to handle this
    } else {
    region = MTLRegionMake2D(0, 0, (NSUInteger)mip_width, (NSUInteger)mip_height);
    bytes_per_image = 0;
    }
    for (int slice_index = 0; slice_index < num_slices; slice_index++) {
    const int mtl_slice_index = (img->cmn.type == SG_IMAGETYPE_CUBE) ? face_index : slice_index;
    const int slice_offset = slice_index * bytes_per_slice;
    SOKOL_ASSERT((slice_offset + bytes_per_slice) <= (int)data->subimage[face_index][mip_index].size);
    [mtl_tex replaceRegion:region
    mipmapLevel:(NSUInteger)mip_index
    slice:(NSUInteger)mtl_slice_index
    withBytes:data_ptr + slice_offset
    bytesPerRow:(NSUInteger)bytes_per_row
    bytesPerImage:(NSUInteger)bytes_per_image];
    }
    }
    }
    }
  • WebGPU:

    sokol/sokol_gfx.h

    Lines 14484 to 14528 in c54523c

    _SOKOL_PRIVATE void _sg_wgpu_copy_image_data(const _sg_image_t* img, WGPUTexture wgpu_tex, const sg_image_data* data) {
    WGPUTextureDataLayout wgpu_layout;
    _sg_clear(&wgpu_layout, sizeof(wgpu_layout));
    WGPUImageCopyTexture wgpu_copy_tex;
    _sg_clear(&wgpu_copy_tex, sizeof(wgpu_copy_tex));
    wgpu_copy_tex.texture = wgpu_tex;
    wgpu_copy_tex.aspect = WGPUTextureAspect_All;
    WGPUExtent3D wgpu_extent;
    _sg_clear(&wgpu_extent, sizeof(wgpu_extent));
    const int num_faces = (img->cmn.type == SG_IMAGETYPE_CUBE) ? 6 : 1;
    for (int face_index = 0; face_index < num_faces; face_index++) {
    for (int mip_index = 0; mip_index < img->cmn.num_mipmaps; mip_index++) {
    wgpu_copy_tex.mipLevel = (uint32_t)mip_index;
    wgpu_copy_tex.origin.z = (uint32_t)face_index;
    int mip_width = _sg_miplevel_dim(img->cmn.width, mip_index);
    int mip_height = _sg_miplevel_dim(img->cmn.height, mip_index);
    int mip_slices;
    switch (img->cmn.type) {
    case SG_IMAGETYPE_CUBE:
    mip_slices = 1;
    break;
    case SG_IMAGETYPE_3D:
    mip_slices = _sg_miplevel_dim(img->cmn.num_slices, mip_index);
    break;
    default:
    mip_slices = img->cmn.num_slices;
    break;
    }
    const int row_pitch = _sg_row_pitch(img->cmn.pixel_format, mip_width, 1);
    const int num_rows = _sg_num_rows(img->cmn.pixel_format, mip_height);
    if (_sg_is_compressed_pixel_format(img->cmn.pixel_format)) {
    mip_width = _sg_roundup(mip_width, 4);
    mip_height = _sg_roundup(mip_height, 4);
    }
    wgpu_layout.offset = 0;
    wgpu_layout.bytesPerRow = (uint32_t)row_pitch;
    wgpu_layout.rowsPerImage = (uint32_t)num_rows;
    wgpu_extent.width = (uint32_t)mip_width;
    wgpu_extent.height = (uint32_t)mip_height;
    wgpu_extent.depthOrArrayLayers = (uint32_t)mip_slices;
    const sg_range* mip_data = &data->subimage[face_index][mip_index];
    wgpuQueueWriteTexture(_sg.wgpu.queue, &wgpu_copy_tex, mip_data->ptr, mip_data->size, &wgpu_layout, &wgpu_extent);
    }
    }
    }

...which is a lot cleaner (but Metal and WebGPU are different in that the texture object is created without data, and the data is copied into the object after creation), but as I said above, not sure how much the two code paths can be unified in the D3D11 backend.

@floooh floooh added the d3d11 label Jun 14, 2024
@jakubtomsu
Copy link
Contributor Author

Thanks for the example. In my case it's about uploading dynamic texture every few frames but it's still handy.

I just implemented the code for uploading the texture directly with d3d11, and it works correctly now. But I found out update_image works fine with size 128^3 and up, but didn't work for size 32^3.

This is the uploading code I ended up with, note the difference between dst_offset and src_offset. I believe update_image should have something like src_depth_pitch.

        // FIXME:
        // sg.update_image(
        //     ren.blood_volume_img,
        //     sg.Image_Data{subimage = {0 = {0 = {ptr = &game.blood, size = size_of(game.blood)}}}},
        // )

        img := sg.d3d11_query_image_info(ren.blood_volume_img)
        res := transmute(^d3d11.IResource)img.res

        msr: d3d11.MAPPED_SUBRESOURCE
        hr := d3d11_device_context->Map(res, 0, .WRITE_DISCARD, {}, &msr)
        fmt.println(hr)
        fmt.println(msr)
        fmt.println(img)
        if hr == 0 {
            defer d3d11_device_context->Unmap(res, 0)

            // NOTE: this code assumes a pixel is one byte
            for z in 0 ..< BLOOD_BOUNDS_Z {
                for y in 0 ..< BLOOD_BOUNDS_Y {
                    dst_offset := uintptr(z * int(msr.DepthPitch) + y * int(msr.RowPitch))
                    src_offset := uintptr(z * BLOOD_BOUNDS_X * BLOOD_BOUNDS_Y + y * BLOOD_BOUNDS_X)
                    runtime.mem_copy(
                        rawptr(uintptr(msr.pData) + dst_offset),
                        rawptr(uintptr(&game.blood) + src_offset),
                        BLOOD_BOUNDS_X * size_of(u8),
                    )
                }
            }
        }

@floooh
Copy link
Owner

floooh commented Jun 14, 2024

Okidoki, thanks a lot for the investigation and sample code! I'll try to set aside some time next week for a new sample/regression-test and fixing the issue.

In general, the idea for uploading texture data is that the incoming data should be tightly packed (e.g. no pitch-related gaps between rows and slices - because different backends have different requirements for those anyway), and sokol_gfx.h will take care of 'unwrapping' the data, even if that means doing small per-row memcpy's.

At least that's the idea for now.

At some point in the future I want to make the resource update functions more orthogonal, first by providing more orthogonal functions, at least:

  • copy-buffer-to-buffer
  • write-mem-to-buffer
  • copy-image-to-image
  • write-mem-to-image
  • ...?

...and ideally also allowing to update only parts of the resource, and with custom row/slice pitch - and implementing "fast paths" if those things match the backend API expectations (e.g. doing a single big copy instead of many small copies).

...it'll be a while until I get to that though, probably after the computer-shader stuff...

@floooh floooh self-assigned this Jun 14, 2024
@jakubtomsu
Copy link
Contributor Author

Sounds good, thank you! Fixing this would be great. And more modern uploading and writing functions would be nice to have as well, but I agree the compute stuff is higher priority.

@floooh
Copy link
Owner

floooh commented Jun 17, 2024

...starting to look into this now. Might take a couple of days because I'll build a new sample around it.

@floooh
Copy link
Owner

floooh commented Jun 17, 2024

Ok, I can reproduce the issue, and also confirm that it only seems affects the D3D11 backend (not 100% sure though because the texture size where the problem manifests seems to differ between hardware configs, see below).

Interestingly, on my Intel GPU laptop, the cutoff size is 32x32 (with RGBA8 pixels). A RGBA8 3D texture of size 32x32x3 still works (e.g. one depth slice is 32 * 32 * sizeof(uint32_t) = 4096 bytes. The required depth-pitch 4096 matches the slice size.

However going down to a 16x16x3 textures breaks. D3D11 reports a required depth-pitch of 2048, while one slice is 1024 bytes.

I'll work on a fix over the next few evenings (had to squeeze an unexpected PR and fix for sokol-shdc earlier today).

@floooh
Copy link
Owner

floooh commented Jun 18, 2024

Ok, fixed in #1063, but before merging I want to make the new sample in sokol-samples a bit more flexible, and also do a bit more testing (ETA tomorrow or the day after).

@floooh
Copy link
Owner

floooh commented Jun 19, 2024

Ok, I extended the sample to test all sorts of texture sizes and also dynamic vs immutable 3D textures (this is recorded on Mac with the Metal backend, but I also tested on my Windows laptop with Intel GPU, and gaming PC with nvidia rtx2070. Merge and website update incoming (I also stumbled over an unrelated problem in the WebGPU backend, but that's not important enough to delay the merge: #1066)

Screen.Recording.2024-06-19.at.19.24.02.mov

@floooh floooh closed this as completed in 0f582fe Jun 19, 2024
@floooh
Copy link
Owner

floooh commented Jun 19, 2024

Ok, merged. Next up is merging the new sample and updating the sample web pages.

Thanks for the bug report!

@jakubtomsu
Copy link
Contributor Author

Fantastic! Thank you so much for your work:)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants