Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance regression during rclone copy with preallocation enabled #318

Closed
EchterAgo opened this issue Nov 6, 2023 · 51 comments
Closed

Comments

@EchterAgo
Copy link

EchterAgo commented Nov 6, 2023

This seems to be a regression that started with 1ecc0ca. See #283 for some more context.

I just logged zpool iostat 5 for an 8 threaded rclone copy to a newly created pool and it seems like there is some write amplification going on when blksz is not set to 4096.

iostat_512.txt
iostat_4096.txt

Note that in the 512 byte case iostat reports ~80-100MB/s IO with a high write operation count, but rclone itself only reports 5-10MB/s. What I also noticed is that it takes a long time to even get the rclone process to abort the copy.

In the 4096 case iostat reports ~160-240MB/s and a much lower write operation count, which is about what I would expect and have seen before 1ecc0ca. rclone itself also reports a very similar speed range as iostat during the copy. I can also abort rclone within seconds instead of minutes.

Specifying --local-no-preallocate on the rclone command line also makes it go fast again.

@EchterAgo
Copy link
Author

What is weird is that this seems to only happen with rclone so far, everything else does not seem to slow down. I tried a fio sequential write benchmark with 8 jobs and 32KB block size (same as rclone) and got the expected ~200MB/s speed.

@lundman
Copy link

lundman commented Nov 6, 2023

Most likely it uses sync writes, to make sure it has landed on disk. Would be equivalent of sync=always on the dataset for the other tools.

@EchterAgo
Copy link
Author

EchterAgo commented Nov 6, 2023

Wow, when I try rclone with 512 in blksz and preallocation disabled, it instantly hangs the virtual machine, and for some reason really hard, I can't even reset the machine in VMware and the kernel debugger will not break. After a few minutes it will recover. I'm not seeing any IO errors on the hardware though and also tried restarting the machine :\

I'm not sure if this also happens with 4096 in blksz, but this used to work.

@EchterAgo
Copy link
Author

So the system recovers after a while and the debugger breaks, but it did not reveal anything interesting. However, when disabling preallocation in rclone, the copy seems to proceed at a normal speed.

@EchterAgo
Copy link
Author

Hmmm, this only seems to happen when using VMWare, I don't see the same when using libvirt/QEMU. Actually it works as expected when I don't use VMWare Shared Folders as the copy source. It does still crash like in #319 when aborting the copy though.

Can this performance regression be related to preallocation/sparse files? On 1ecc0ca that is the only difference I see between a test run that is fast and one that is slow. When slow, it starts out fast but then falls over the course of a minute to < 10MB/s, aborting takes minutes. When fast, it starts fast and stays fast, aborting takes seconds.

When I set blksz to 4096 it does not matter if the test scripts use preallocation/sparse files or not.

I guess I should find out whether specifying only --local-no-preallocate or --local-no-sparse on the rclone command line makes a difference.

@EchterAgo
Copy link
Author

It is definitely the --local-no-preallocate flag that makes the difference, not --local-no-sparse.

@EchterAgo
Copy link
Author

I also see the same behavior on my second host that uses QEMU instead of VMWare.

@EchterAgo EchterAgo changed the title Performance regression during rclone copy Performance regression during rclone copy with preallocation enabled Nov 6, 2023
@lundman
Copy link

lundman commented Nov 6, 2023

OK so, you feel confident that setting z_blksz to ashift gives you a considerable performance boost. This is when we only set z_blksz to 4096 if it is 0, does it help if we remove the if part, and always set it? Any corruption issues? Otherwise, let me add code to set it nicely.

@lundman
Copy link

lundman commented Nov 7, 2023

Hmm I wonder if it is just zfs_log_write - if we change blocksize there to ashift, and prealloc, will it go faster. That would be a much smaller change to upstream

@EchterAgo
Copy link
Author

OK so, you feel confident that setting z_blksz to ashift gives you a considerable performance boost.

That does definitely improve things. I'm unsure if it is the right thing to do, I just noticed performance drop sharply after 1ecc0ca. What is still weird is that I have been unable to measure this performance drop with anything other than rclone. I tried to recreate the rclone IO pattern using fio, but I'm not sure it actually supports preallocation on Windows. All my tests with fio so far showed expected performance levels. I'll have to check if it actually does preallocation.

This is when we only set z_blksz to 4096 if it is 0, does it help if we remove the if part, and always set it? Any corruption issues? Otherwise, let me add code to set it nicely.

I'll try that.

@lundman
Copy link

lundman commented Nov 7, 2023

Or try the blocksize in zfs_log_write() - then we dont change much on upstream

@EchterAgo
Copy link
Author

Setting blocksize only in zfs_log_write does make the copy work fast. I'll need to let it run for a few hours to confirm the memory issue is still good, but I expect it will be.

diff --git a/module/os/windows/zfs/zfs_znode.c b/module/os/windows/zfs/zfs_znode.c
index f501b3ac1..eff94188f 100644
--- a/module/os/windows/zfs/zfs_znode.c
+++ b/module/os/windows/zfs/zfs_znode.c
@@ -624,9 +624,6 @@ zfs_znode_alloc(zfsvfs_t *zfsvfs, dmu_buf_t *db, int blksz,
        zp->z_zfsvfs = zfsvfs;
        mutex_exit(&zfsvfs->z_znodes_lock);

-       if (zp->z_blksz == 0)
-               zp->z_blksz = zfs_blksz(zp);
-
        return (zp);
 }

diff --git a/module/zfs/zfs_log.c b/module/zfs/zfs_log.c
index 613fcdb23..ad0dfec0d 100644
--- a/module/zfs/zfs_log.c
+++ b/module/zfs/zfs_log.c
@@ -634,6 +634,9 @@ zfs_log_write(zilog_t *zilog, dmu_tx_t *tx, int txtype,
        uint64_t gen = 0;
        ssize_t size = resid;

+       if (blocksize == 0)
+               blocksize = 4096;
+
        if (zil_replaying(zilog, tx) || zp->z_unlinked ||
            zfs_xattr_owner_unlinked(zp)) {
                if (callback != NULL)

@lundman
Copy link

lundman commented Nov 7, 2023

excellent, I'll do a commit to fish out the value properly when I wake

@lundman
Copy link

lundman commented Nov 8, 2023

I moved it back into zfs_znode.c - there's too much change to zfs_log.c otherwise.

f0e86b4

@EchterAgo
Copy link
Author

I think this is fixed.

@lundman
Copy link

lundman commented Nov 10, 2023

I had to change the way we do this, running the same commit on macOS caused panic in the removal tests, it is likely to be a bug in that code, see:
openzfs#12078

but don't want to wait for that to be fixed, so let's handle the performance change differently, in just zfs_log_write(). This passes zfs-tests as before.

@EchterAgo
Copy link
Author

I also just did some more tests, I doubt the performance issue actually comes from zfs_log_write. It is a separate issue when setting the z_blksz to 512 in zfs_znode_alloc. When I just set the blocksize to 512 in zfs_log_write, and remove setting the zp->z_blkszI in zfs_znode_alloc then I also get the expected performance.

Good (as described above), going @ ~220MB/s:

PS C:\Windows\system32> C:\tools\zfs\zpool.exe iostat -r 5

Data          sync_read    sync_write    async_read    async_write      scrub         trim         rebuild
req_size      ind    agg    ind    agg    ind    agg    ind    agg    ind    agg    ind    agg    ind    agg
----------  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----
512             0      0      0      0      0      0      0      0      0      0      0      0      0      0
1K              0      0      0      0      0      0      0      0      0      0      0      0      0      0
2K              0      0      0      0      0      0      0      0      0      0      0      0      0      0
4K              0      0    128      0      0      0    749      0      0      0      0      0      0      0
8K             12      0     28      0      0      0      0    189      0      0      0      0      0      0
16K             0      0      0      0      0      0      0     56      0      0      0      0      0      0
32K             0      0      0      0      0      0      0      9      0      0      0      0      0      0
64K            16      0     32      0      0      0      3      0      0      0      0      0      0      0
128K            0      0     16      0      0      0      8      4      0      0      0      0      0      0
256K            0      0      0      0      0      0     12     15      0      0      0      0      0      0
512K            0      0      0      0      0      0  1.10K     46      0      0      0      0      0      0
1M              0      0      0      0      0      0      0      0      0      0      0      0      0      0
2M              0      0      0      0      0      0      0      0      0      0      0      0      0      0
4M              0      0      0      0      0      0      0      0      0      0      0      0      0      0
8M              0      0      0      0      0      0      0      0      0      0      0      0      0      0
16M             0      0      0      0      0      0      0      0      0      0      0      0      0      0
------------------------------------------------------------------------------------------------------------

Data          sync_read    sync_write    async_read    async_write      scrub         trim         rebuild
req_size      ind    agg    ind    agg    ind    agg    ind    agg    ind    agg    ind    agg    ind    agg
----------  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----
512             0      0      0      0      0      0      0      0      0      0      0      0      0      0
1K              0      0      0      0      0      0      0      0      0      0      0      0      0      0
2K              0      0      0      0      0      0      0      0      0      0      0      0      0      0
4K              0      0     17      0      0      0    105      0      0      0      0      0      0      0
8K              0      0      0      0      0      0      0     20      0      0      0      0      0      0
16K             0      0      0      0      0      0      0     10      0      0      0      0      0      0
32K             0      0      0      0      0      0      0      0      0      0      0      0      0      0
64K             0      0      0      0      0      0      1      0      0      0      0      0      0      0
128K            0      0      0      0      0      0      3      0      0      0      0      0      0      0
256K            0      0      0      0      0      0     28      2      0      0      0      0      0      0
512K            0      0      0      0      0      0    312      9      0      0      0      0      0      0
1M              0      0      0      0      0      0      0      0      0      0      0      0      0      0
2M              0      0      0      0      0      0      0      0      0      0      0      0      0      0
4M              0      0      0      0      0      0      0      0      0      0      0      0      0      0
8M              0      0      0      0      0      0      0      0      0      0      0      0      0      0
16M             0      0      0      0      0      0      0      0      0      0      0      0      0      0
------------------------------------------------------------------------------------------------------------

Data          sync_read    sync_write    async_read    async_write      scrub         trim         rebuild
req_size      ind    agg    ind    agg    ind    agg    ind    agg    ind    agg    ind    agg    ind    agg
----------  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----
512             0      0      0      0      0      0      0      0      0      0      0      0      0      0
1K              0      0      0      0      0      0      0      0      0      0      0      0      0      0
2K              0      0      0      0      0      0      0      0      0      0      0      0      0      0
4K              0      0     22      0      0      0    160      0      0      0      0      0      0      0
8K              0      0      0      0      0      0      0     34      0      0      0      0      0      0
16K             0      0      0      0      0      0      1     11      0      0      0      0      0      0
32K             0      0      0      0      0      0      7      0      0      0      0      0      0      0
64K             0      0      0      0      0      0      8      0      0      0      0      0      0      0
128K            0      0      0      0      0      0      8      3      0      0      0      0      0      0
256K            0      0      0      0      0      0     57      2      0      0      0      0      0      0
512K            0      0      0      0      0      0    365     25      0      0      0      0      0      0
1M              0      0      0      0      0      0      0      0      0      0      0      0      0      0
2M              0      0      0      0      0      0      0      0      0      0      0      0      0      0
4M              0      0      0      0      0      0      0      0      0      0      0      0      0      0
8M              0      0      0      0      0      0      0      0      0      0      0      0      0      0
16M             0      0      0      0      0      0      0      0      0      0      0      0      0      0
------------------------------------------------------------------------------------------------------------

Data          sync_read    sync_write    async_read    async_write      scrub         trim         rebuild
req_size      ind    agg    ind    agg    ind    agg    ind    agg    ind    agg    ind    agg    ind    agg
----------  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----
512             0      0      0      0      0      0      0      0      0      0      0      0      0      0
1K              0      0      0      0      0      0      0      0      0      0      0      0      0      0
2K              0      0      0      0      0      0      0      0      0      0      0      0      0      0
4K              0      0     17      0      0      0    126      0      0      0      0      0      0      0
8K              0      0      0      0      0      0      0     27      0      0      0      0      0      0
16K             0      0      0      0      0      0      0     10      0      0      0      0      0      0
32K             0      0      0      0      0      0      2      2      0      0      0      0      0      0
64K             0      0      0      0      0      0      3      0      0      0      0      0      0      0
128K            0      0      0      0      0      0      2      0      0      0      0      0      0      0
256K            0      0      0      0      0      0      7      1      0      0      0      0      0      0
512K            0      0      0      0      0      0    417      2      0      0      0      0      0      0
1M              0      0      0      0      0      0      0      0      0      0      0      0      0      0
2M              0      0      0      0      0      0      0      0      0      0      0      0      0      0
4M              0      0      0      0      0      0      0      0      0      0      0      0      0      0
8M              0      0      0      0      0      0      0      0      0      0      0      0      0      0
16M             0      0      0      0      0      0      0      0      0      0      0      0      0      0
------------------------------------------------------------------------------------------------------------

Data          sync_read    sync_write    async_read    async_write      scrub         trim         rebuild
req_size      ind    agg    ind    agg    ind    agg    ind    agg    ind    agg    ind    agg    ind    agg
----------  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----
512             0      0      0      0      0      0      0      0      0      0      0      0      0      0
1K              0      0      0      0      0      0      0      0      0      0      0      0      0      0
2K              0      0      0      0      0      0      0      0      0      0      0      0      0      0
4K              0      0     20      0      0      0    144      0      0      0      0      0      0      0
8K              0      0      0      0      0      0      3     26      0      0      0      0      0      0
16K             0      0      0      0      0      0      0      7      0      0      0      0      0      0
32K             0      0      0      0      0      0      1      2      0      0      0      0      0      0
64K             0      0      0      0      0      0      1      0      0      0      0      0      0      0
128K            0      0      0      0      0      0      3      0      0      0      0      0      0      0
256K            0      0      0      0      0      0     55      3      0      0      0      0      0      0
512K            0      0      0      0      0      0    329     37      0      0      0      0      0      0
1M              0      0      0      0      0      0      0      0      0      0      0      0      0      0
2M              0      0      0      0      0      0      0      0      0      0      0      0      0      0
4M              0      0      0      0      0      0      0      0      0      0      0      0      0      0
8M              0      0      0      0      0      0      0      0      0      0      0      0      0      0
16M             0      0      0      0      0      0      0      0      0      0      0      0      0      0
------------------------------------------------------------------------------------------------------------

Now with only

if (zp->z_blksz == 0)
    zp->z_blksz = zfs_blksz(zp);

in zfs_znode_alloc, with setting blocksize removed in zfs_log_write.

Bad (starts at > 200MB/s, slows down over multiple seconds and ends up at a few hundred kilobytes per second):

PS C:\Windows\system32> C:\tools\zfs\zpool.exe iostat -r 5

Data          sync_read    sync_write    async_read    async_write      scrub         trim         rebuild
req_size      ind    agg    ind    agg    ind    agg    ind    agg    ind    agg    ind    agg    ind    agg
----------  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----
512             0      0      0      0      0      0      0      0      0      0      0      0      0      0
1K              0      0      0      0      0      0      0      0      0      0      0      0      0      0
2K              0      0      0      0      0      0      0      0      0      0      0      0      0      0
4K              0      0    128      0      0      0  35.1K      0      0      0      0      0      0      0
8K             12      0     28      0      0      0  1.11K  16.8K      0      0      0      0      0      0
16K             0      0      0      0      0      0  1.07K  9.45K      0      0      0      0      0      0
32K             0      0      0      0      0      0      0  6.04K      0      0      0      0      0      0
64K            16      0     32      0      0      0      0  9.71K      0      0      0      0      0      0
128K            0      0     16      0      0      0      0  7.86K      0      0      0      0      0      0
256K            0      0      0      0      0      0      0    259      0      0      0      0      0      0
512K            0      0      0      0      0      0      0      4      0      0      0      0      0      0
1M              0      0      0      0      0      0      0      0      0      0      0      0      0      0
2M              0      0      0      0      0      0      0      0      0      0      0      0      0      0
4M              0      0      0      0      0      0      0      0      0      0      0      0      0      0
8M              0      0      0      0      0      0      0      0      0      0      0      0      0      0
16M             0      0      0      0      0      0      0      0      0      0      0      0      0      0
------------------------------------------------------------------------------------------------------------

Data          sync_read    sync_write    async_read    async_write      scrub         trim         rebuild
req_size      ind    agg    ind    agg    ind    agg    ind    agg    ind    agg    ind    agg    ind    agg
----------  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----
512             0      0      0      0      0      0      0      0      0      0      0      0      0      0
1K              0      0      0      0      0      0      0      0      0      0      0      0      0      0
2K              0      0      0      0      0      0      0      0      0      0      0      0      0      0
4K              0      0      2      0      0      0  8.03K      0      0      0      0      0      0      0
8K              0      0      0      0      0      0     29  1.95K      0      0      0      0      0      0
16K             0      0      0      0      0      0     29    324      0      0      0      0      0      0
32K             0      0      0      0      0      0      0     21      0      0      0      0      0      0
64K             0      0      0      0      0      0      0     14      0      0      0      0      0      0
128K            0      0      0      0      0      0      0      8      0      0      0      0      0      0
256K            0      0      0      0      0      0      0      1      0      0      0      0      0      0
512K            0      0      0      0      0      0      0      0      0      0      0      0      0      0
1M              0      0      0      0      0      0      0      0      0      0      0      0      0      0
2M              0      0      0      0      0      0      0      0      0      0      0      0      0      0
4M              0      0      0      0      0      0      0      0      0      0      0      0      0      0
8M              0      0      0      0      0      0      0      0      0      0      0      0      0      0
16M             0      0      0      0      0      0      0      0      0      0      0      0      0      0
------------------------------------------------------------------------------------------------------------

Data          sync_read    sync_write    async_read    async_write      scrub         trim         rebuild
req_size      ind    agg    ind    agg    ind    agg    ind    agg    ind    agg    ind    agg    ind    agg
----------  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----
512             0      0      0      0      0      0      0      0      0      0      0      0      0      0
1K              0      0      0      0      0      0      0      0      0      0      0      0      0      0
2K              0      0      0      0      0      0      0      0      0      0      0      0      0      0
4K              0      0      0      0      0      0    314      0      0      0      0      0      0      0
8K              0      0      0      0      0      0     44    138      0      0      0      0      0      0
16K             0      0      0      0      0      0     34    128      0      0      0      0      0      0
32K             0      0      0      0      0      0      0    144      0      0      0      0      0      0
64K             0      0      0      0      0      0      0    289      0      0      0      0      0      0
128K            0      0      0      0      0      0      0    281      0      0      0      0      0      0
256K            0      0      0      0      0      0      0     13      0      0      0      0      0      0
512K            0      0      0      0      0      0      0      0      0      0      0      0      0      0
1M              0      0      0      0      0      0      0      0      0      0      0      0      0      0
2M              0      0      0      0      0      0      0      0      0      0      0      0      0      0
4M              0      0      0      0      0      0      0      0      0      0      0      0      0      0
8M              0      0      0      0      0      0      0      0      0      0      0      0      0      0
16M             0      0      0      0      0      0      0      0      0      0      0      0      0      0
------------------------------------------------------------------------------------------------------------

Data          sync_read    sync_write    async_read    async_write      scrub         trim         rebuild
req_size      ind    agg    ind    agg    ind    agg    ind    agg    ind    agg    ind    agg    ind    agg
----------  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----
512             0      0      0      0      0      0      0      0      0      0      0      0      0      0
1K              0      0      0      0      0      0      0      0      0      0      0      0      0      0
2K              0      0      0      0      0      0      0      0      0      0      0      0      0      0
4K              0      0      5      0      0      0    191      0      0      0      0      0      0      0
8K              0      0      0      0      0      0     22    109      0      0      0      0      0      0
16K             0      0      0      0      0      0     19     79      0      0      0      0      0      0
32K             0      0      0      0      0      0      0     62      0      0      0      0      0      0
64K             0      0      0      0      0      0      0    137      0      0      0      0      0      0
128K            0      0      0      0      0      0      0    444      0      0      0      0      0      0
256K            0      0      0      0      0      0      0      7      0      0      0      0      0      0
512K            0      0      0      0      0      0      0      0      0      0      0      0      0      0
1M              0      0      0      0      0      0      0      0      0      0      0      0      0      0
2M              0      0      0      0      0      0      0      0      0      0      0      0      0      0
4M              0      0      0      0      0      0      0      0      0      0      0      0      0      0
8M              0      0      0      0      0      0      0      0      0      0      0      0      0      0
16M             0      0      0      0      0      0      0      0      0      0      0      0      0      0
------------------------------------------------------------------------------------------------------------

Data          sync_read    sync_write    async_read    async_write      scrub         trim         rebuild
req_size      ind    agg    ind    agg    ind    agg    ind    agg    ind    agg    ind    agg    ind    agg
----------  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----
512             0      0      0      0      0      0      0      0      0      0      0      0      0      0
1K              0      0      0      0      0      0      0      0      0      0      0      0      0      0
2K              0      0      0      0      0      0      0      0      0      0      0      0      0      0
4K              0      0      2      0      0      0    203      0      0      0      0      0      0      0
8K              0      0      0      0      0      0     33    123      0      0      0      0      0      0
16K             0      0      0      0      0      0     28     96      0      0      0      0      0      0
32K             0      0      0      0      0      0      0    114      0      0      0      0      0      0
64K             0      0      0      0      0      0      0    236      0      0      0      0      0      0
128K            0      0      0      0      0      0      0    340      0      0      0      0      0      0
256K            0      0      0      0      0      0      0     11      0      0      0      0      0      0
512K            0      0      0      0      0      0      0      0      0      0      0      0      0      0
1M              0      0      0      0      0      0      0      0      0      0      0      0      0      0
2M              0      0      0      0      0      0      0      0      0      0      0      0      0      0
4M              0      0      0      0      0      0      0      0      0      0      0      0      0      0
8M              0      0      0      0      0      0      0      0      0      0      0      0      0      0
16M             0      0      0      0      0      0      0      0      0      0      0      0      0      0
------------------------------------------------------------------------------------------------------------

Data          sync_read    sync_write    async_read    async_write      scrub         trim         rebuild
req_size      ind    agg    ind    agg    ind    agg    ind    agg    ind    agg    ind    agg    ind    agg
----------  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----
512             0      0      0      0      0      0      0      0      0      0      0      0      0      0
1K              0      0      0      0      0      0      0      0      0      0      0      0      0      0
2K              0      0      0      0      0      0      0      0      0      0      0      0      0      0
4K              0      0      2      0      0      0    216      0      0      0      0      0      0      0
8K              0      0      0      0      0      0     30    115      0      0      0      0      0      0
16K             0      0      0      0      0      0     15     83      0      0      0      0      0      0
32K             0      0      0      0      0      0      0     90      0      0      0      0      0      0
64K             0      0      0      0      0      0      0    157      0      0      0      0      0      0
128K            0      0      0      0      0      0      0    461      0      0      0      0      0      0
256K            0      0      0      0      0      0      0      5      0      0      0      0      0      0
512K            0      0      0      0      0      0      0      0      0      0      0      0      0      0
1M              0      0      0      0      0      0      0      0      0      0      0      0      0      0
2M              0      0      0      0      0      0      0      0      0      0      0      0      0      0
4M              0      0      0      0      0      0      0      0      0      0      0      0      0      0
8M              0      0      0      0      0      0      0      0      0      0      0      0      0      0
16M             0      0      0      0      0      0      0      0      0      0      0      0      0      0
------------------------------------------------------------------------------------------------------------
PS C:\Windows\system32>

@EchterAgo
Copy link
Author

To make it more clear what I mean with good / bad, here are the code diffs to latest zfs-Windows-2.2.0-release branch (eb7f88a):

Good:

diff --git a/module/os/windows/zfs/zfs_znode.c b/module/os/windows/zfs/zfs_znode.c
index f501b3ac1..eff94188f 100644
--- a/module/os/windows/zfs/zfs_znode.c
+++ b/module/os/windows/zfs/zfs_znode.c
@@ -624,9 +624,6 @@ zfs_znode_alloc(zfsvfs_t *zfsvfs, dmu_buf_t *db, int blksz,
        zp->z_zfsvfs = zfsvfs;
        mutex_exit(&zfsvfs->z_znodes_lock);

-       if (zp->z_blksz == 0)
-               zp->z_blksz = zfs_blksz(zp);
-
        return (zp);
 }

diff --git a/module/zfs/zfs_log.c b/module/zfs/zfs_log.c
index f3013ba2d..8210e3a94 100644
--- a/module/zfs/zfs_log.c
+++ b/module/zfs/zfs_log.c
@@ -646,8 +646,8 @@ zfs_log_write(zilog_t *zilog, dmu_tx_t *tx, int txtype,
        }

 #ifdef _WIN32
-       if (zp->z_zfsvfs->z_os->os_spa->spa_min_alloc > blocksize)
-               blocksize = zp->z_zfsvfs->z_os->os_spa->spa_min_alloc;
+       if (blocksize == 0)
+               blocksize = 512;
 #endif

        if (zilog->zl_logbias == ZFS_LOGBIAS_THROUGHPUT)

Bad:

diff --git a/module/zfs/zfs_log.c b/module/zfs/zfs_log.c
index f3013ba2d..70b16e373 100644
--- a/module/zfs/zfs_log.c
+++ b/module/zfs/zfs_log.c
@@ -645,11 +645,6 @@ zfs_log_write(zilog_t *zilog, dmu_tx_t *tx, int txtype,
                return;
        }

-#ifdef _WIN32
-       if (zp->z_zfsvfs->z_os->os_spa->spa_min_alloc > blocksize)
-               blocksize = zp->z_zfsvfs->z_os->os_spa->spa_min_alloc;
-#endif
-
        if (zilog->zl_logbias == ZFS_LOGBIAS_THROUGHPUT)
                write_state = WR_INDIRECT;
        else if (!spa_has_slogs(zilog->zl_spa) &&

I'm pretty sure this is not about zfs_log_write.

@EchterAgo
Copy link
Author

Also, the zfs_log_write infinite loop only happened after having already copied 2.2TB, this performance issue happens pretty much directly after starting the copy, and only when preallocation is enabled.

@lundman
Copy link

lundman commented Nov 11, 2023

OKOK sorry, so you are saying performance wise, you want the change in zfs_znode.c? The zfs_log_write I thought was the slow issue, and shouldnt be 0 if the new code is there, since it should set it

@EchterAgo
Copy link
Author

Yes, it seems performance wise it only makes a difference if set in zfs_znode_alloc.

I also just noticed something else interesting regarding this, give me a few minutes to investigate.

@EchterAgo
Copy link
Author

EchterAgo commented Nov 11, 2023

Ok, nevermind, I though performance recovers, but now after 30 minutes it is still slow. Starts out fast, then perf slowly falls:

  • @ 0m:30s 50MB/s
  • @ 1m:00s 25MB/s
  • @ 1m:30s 12.5MB/s
  • @ 2m:00s 6MB/s (individual copy threads @ < 100KB/s, 8 threads)
  • @ 3m:00s stabilizes @5-6MB/s, continues like this until 30m

Outdated, see #318 (comment)

@lundman
Copy link

lundman commented Nov 11, 2023

ok, so? go back to znode_alloc code, and just make sure blocksize is not 0?

@EchterAgo
Copy link
Author

EchterAgo commented Nov 11, 2023

So, setting the z_blksz in zfs_znode_alloc to 512 or zfs_blksz when it is zero causes performance issue. 4096 works fast, leaving it at 0 also works fast. This is independent of the infinite loop in zfs_log_write.

This also means that the latest zfs-Windows-2.2.0-release branch has performance issues again.

@EchterAgo
Copy link
Author

ok, so? go back to znode_alloc code, and just make sure blocksize is not 0?

I think it should be enough to set blocksize to a reasonable value in zfs_log_write if it is 0. No need to set z_blksz in zfs_znode_alloc.

@lundman
Copy link

lundman commented Nov 11, 2023

so you are saying its enough to not-change-anything in znode? and just change zfs_log ?

@EchterAgo
Copy link
Author

so you are saying its enough to not-change-anything in znode? and just change zfs_log ?

Yes

@EchterAgo
Copy link
Author

diff --git a/module/os/windows/zfs/zfs_znode.c b/module/os/windows/zfs/zfs_znode.c
index f501b3ac1..eff94188f 100644
--- a/module/os/windows/zfs/zfs_znode.c
+++ b/module/os/windows/zfs/zfs_znode.c
@@ -624,9 +624,6 @@ zfs_znode_alloc(zfsvfs_t *zfsvfs, dmu_buf_t *db, int blksz,
        zp->z_zfsvfs = zfsvfs;
        mutex_exit(&zfsvfs->z_znodes_lock);

-       if (zp->z_blksz == 0)
-               zp->z_blksz = zfs_blksz(zp);
-
        return (zp);
 }

diff --git a/module/zfs/zfs_log.c b/module/zfs/zfs_log.c
index f3013ba2d..8210e3a94 100644
--- a/module/zfs/zfs_log.c
+++ b/module/zfs/zfs_log.c
@@ -646,8 +646,8 @@ zfs_log_write(zilog_t *zilog, dmu_tx_t *tx, int txtype,
        }

 #ifdef _WIN32
-       if (zp->z_zfsvfs->z_os->os_spa->spa_min_alloc > blocksize)
-               blocksize = zp->z_zfsvfs->z_os->os_spa->spa_min_alloc;
+       if (blocksize == 0)
+               blocksize = 512;
 #endif

        if (zilog->zl_logbias == ZFS_LOGBIAS_THROUGHPUT)

basically this

@EchterAgo EchterAgo reopened this Nov 11, 2023
@lundman
Copy link

lundman commented Nov 11, 2023

But, this

-       if (zp->z_zfsvfs->z_os->os_spa->spa_min_alloc > blocksize)
-               blocksize = zp->z_zfsvfs->z_os->os_spa->spa_min_alloc;

should work, if blocksize is 0, it should set it to something greater than that?

@EchterAgo
Copy link
Author

That works too, but by setting it fixed you don't need spa_min_alloc, if that is a concern.

@lundman
Copy link

lundman commented Nov 11, 2023

yeah, but i thought we wanted to set it to ashift, so that it is faster.

@EchterAgo
Copy link
Author

My tests showed that it does not matter what we set in zfs_log_write, 512 works just as fast.

The performance difference comes only from setting z_blksz in zfs_znode_alloc, unless in my case it is set to 4096.

@lundman
Copy link

lundman commented Nov 11, 2023

ok so then, check for blocksize==0, and set to say, SPA_MINBLOCKSIZE or whatever its called.

then in znode, we want to set it to ashift?

@EchterAgo
Copy link
Author

EchterAgo commented Nov 11, 2023

  • Setting z_blksz in zfs_znode_alloc to 512 or zfs_blksz(zp); (which produces 512 here) makes the performance drop. Setting it to 4096 or leaving it at 0 works fast. This is independent of what zfs_log_write

  • Leaving z_blksz potentially zero makes zfs_log_write do an infinite loop after copying a lot of data. Guarding against this by setting to either 512, 4096 or spa_min_alloc does not decrease performance. I still don't understand why this only happens after having copied a lot of data already.

ok so then, check for blocksize==0, and set to say, SPA_MINBLOCKSIZE or whatever its called.

then in znode, we want to set it to ashift?

If we have the blocksize==0 check in zfs_log_write we don't need it in zfs_znode_alloc, so I'd just remove it there. Or we need to investigate why the performance drops when setting it wrong in zfs_znode_alloc.

@EchterAgo
Copy link
Author

It also seems clear that zfs_log_write was not written with zfs_blksz==0 in mind, so maybe we should investigate why it goes to 0 in the first place. Maybe there are more assumptions like this in the code.

@lundman
Copy link

lundman commented Nov 11, 2023

ok if its left as 0, then the zfs_write() tunes it up, usually by powers of 2. So we just need a 0 test in zfs_log. Understood.

What is unusual is the path it takes in zfs_log_write(). Linux etc, does not generally enter WR_INDIRECT before zfs_write(), but windows does if it preallocates. So they most likely never noticed.

@EchterAgo
Copy link
Author

EchterAgo commented Nov 11, 2023

Oh, that makes sense. I don't really have a good grasp of the code yet.

Did you check the iostat -r I showed earlier? In the good case it gets much more ind writes, while in the bad case it gets a lot of agg writes. Both cases run rclone with preallocation though.

With preallocation disabled I never saw any performance difference.

Edit: swapped good/bad

@EchterAgo
Copy link
Author

EchterAgo commented Nov 11, 2023

I managed to add preallocation support to the Windows fio port and it does exhibit the same issues as rclone. The patch is not in a shape to upstream yet, but build artifacts should be available here shortly: https://github.com/EchterAgo/fio/actions/runs/6835259116

I used the MSYS2 64-bit version.

You'll need a configuration like this:

[global]
ioengine=windowsaio
direct=1
iodepth=8
time_based
runtime=600
directory=H\:\testdir
filename_format=testfile_$jobname_$jobnum_$filenum.bin
bs=32k
size=16G
fallocate=native

[sequential_write]
rw=write
numjobs=8

change the directory parameter (note the escaped :) and then run fio name_of_config_file.fio.

If you change it to fallocate=none it runs fast.

@EchterAgo
Copy link
Author

@EchterAgo
Copy link
Author

I noticed that the write speed instantly drops, the gradual dropping of displayed speed in rclone seems to just be its averaging.

@EchterAgo
Copy link
Author

Bad results:

Jobs: 8 (f=8): [W(8)][100.0%][w=5433KiB/s][w=169 IOPS][eta 00m:00s]
sequential_write: (groupid=0, jobs=1): err= 0: pid=5588: Sat Nov 11 23:31:24 2023
  write: IOPS=22, BW=709KiB/s (726kB/s)(416MiB/600291msec); 0 zone resets
    slat (usec): min=53, max=4437, avg=871.41, stdev=570.06
    clat (msec): min=59, max=31377, avg=359.96, stdev=1063.66
     lat (msec): min=59, max=31379, avg=360.84, stdev=1063.68
    clat percentiles (msec):
     |  1.00th=[   80],  5.00th=[  108], 10.00th=[  127], 20.00th=[  161],
     | 30.00th=[  264], 40.00th=[  300], 50.00th=[  351], 60.00th=[  368],
     | 70.00th=[  380], 80.00th=[  401], 90.00th=[  422], 95.00th=[  435],
     | 99.00th=[  489], 99.50th=[ 3339], 99.90th=[17113], 99.95th=[17113],
     | 99.99th=[17113]
   bw (  KiB/s): min=   62, max= 2629, per=14.62%, avg=830.76, stdev=383.90, samples=1010
   iops        : min=    1, max=   82, avg=25.68, stdev=11.97, samples=1010
  lat (msec)   : 100=3.62%, 250=24.24%, 500=71.39%, 750=0.19%, >=2000=0.56%
  cpu          : usr=0.00%, sys=0.33%, ctx=0, majf=0, minf=0
  IO depths    : 1=0.1%, 2=0.1%, 4=0.3%, 8=99.7%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.1%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,13307,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=8
sequential_write: (groupid=0, jobs=1): err= 0: pid=8028: Sat Nov 11 23:31:24 2023
  write: IOPS=22, BW=710KiB/s (727kB/s)(416MiB/600294msec); 0 zone resets
    slat (usec): min=40, max=5751, avg=874.60, stdev=580.65
    clat (msec): min=61, max=31441, avg=359.67, stdev=1073.03
     lat (msec): min=61, max=31442, avg=360.54, stdev=1073.05
    clat percentiles (msec):
     |  1.00th=[   80],  5.00th=[  108], 10.00th=[  126], 20.00th=[  161],
     | 30.00th=[  264], 40.00th=[  300], 50.00th=[  351], 60.00th=[  368],
     | 70.00th=[  380], 80.00th=[  401], 90.00th=[  422], 95.00th=[  439],
     | 99.00th=[  493], 99.50th=[ 2802], 99.90th=[17113], 99.95th=[17113],
     | 99.99th=[17113]
   bw (  KiB/s): min=   62, max= 2757, per=14.63%, avg=831.46, stdev=385.48, samples=1010
   iops        : min=    1, max=   86, avg=25.70, stdev=12.03, samples=1010
  lat (msec)   : 100=3.67%, 250=24.22%, 500=71.21%, 750=0.34%, >=2000=0.56%
  cpu          : usr=0.00%, sys=0.33%, ctx=0, majf=0, minf=0
  IO depths    : 1=0.1%, 2=0.1%, 4=0.2%, 8=99.8%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.1%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,13318,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=8
sequential_write: (groupid=0, jobs=1): err= 0: pid=6584: Sat Nov 11 23:31:24 2023
  write: IOPS=22, BW=710KiB/s (727kB/s)(416MiB/600300msec); 0 zone resets
    slat (usec): min=56, max=4394, avg=873.37, stdev=586.18
    clat (msec): min=54, max=31414, avg=359.89, stdev=1064.09
     lat (msec): min=56, max=31415, avg=360.76, stdev=1064.11
    clat percentiles (msec):
     |  1.00th=[   80],  5.00th=[  108], 10.00th=[  126], 20.00th=[  161],
     | 30.00th=[  262], 40.00th=[  300], 50.00th=[  351], 60.00th=[  368],
     | 70.00th=[  384], 80.00th=[  401], 90.00th=[  422], 95.00th=[  435],
     | 99.00th=[  493], 99.50th=[ 3339], 99.90th=[17113], 99.95th=[17113],
     | 99.99th=[17113]
   bw (  KiB/s): min=   62, max= 2565, per=14.62%, avg=830.89, stdev=385.37, samples=1010
   iops        : min=    1, max=   80, avg=25.68, stdev=12.02, samples=1010
  lat (msec)   : 100=3.70%, 250=24.22%, 500=71.22%, 750=0.29%, >=2000=0.56%
  cpu          : usr=0.00%, sys=0.33%, ctx=0, majf=0, minf=0
  IO depths    : 1=0.1%, 2=0.1%, 4=0.3%, 8=99.7%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.1%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,13310,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=8
sequential_write: (groupid=0, jobs=1): err= 0: pid=6324: Sat Nov 11 23:31:24 2023
  write: IOPS=22, BW=710KiB/s (727kB/s)(416MiB/600297msec); 0 zone resets
    slat (usec): min=54, max=4188, avg=873.87, stdev=583.90
    clat (msec): min=61, max=31394, avg=359.67, stdev=1065.81
     lat (msec): min=61, max=31395, avg=360.54, stdev=1065.84
    clat percentiles (msec):
     |  1.00th=[   80],  5.00th=[  107], 10.00th=[  126], 20.00th=[  161],
     | 30.00th=[  264], 40.00th=[  300], 50.00th=[  351], 60.00th=[  368],
     | 70.00th=[  384], 80.00th=[  401], 90.00th=[  422], 95.00th=[  435],
     | 99.00th=[  485], 99.50th=[ 3339], 99.90th=[17113], 99.95th=[17113],
     | 99.99th=[17113]
   bw (  KiB/s): min=   62, max= 2693, per=14.63%, avg=831.46, stdev=385.79, samples=1010
   iops        : min=    1, max=   84, avg=25.70, stdev=12.03, samples=1010
  lat (msec)   : 100=3.66%, 250=24.22%, 500=71.29%, 750=0.26%, >=2000=0.56%
  cpu          : usr=0.00%, sys=0.33%, ctx=0, majf=0, minf=0
  IO depths    : 1=0.1%, 2=0.1%, 4=0.2%, 8=99.7%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.1%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,13318,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=8
sequential_write: (groupid=0, jobs=1): err= 0: pid=5712: Sat Nov 11 23:31:24 2023
  write: IOPS=22, BW=711KiB/s (728kB/s)(417MiB/600298msec); 0 zone resets
    slat (usec): min=48, max=4943, avg=875.41, stdev=575.74
    clat (msec): min=61, max=31374, avg=359.18, stdev=1062.56
     lat (msec): min=62, max=31376, avg=360.06, stdev=1062.59
    clat percentiles (msec):
     |  1.00th=[   80],  5.00th=[  108], 10.00th=[  127], 20.00th=[  161],
     | 30.00th=[  262], 40.00th=[  300], 50.00th=[  351], 60.00th=[  368],
     | 70.00th=[  380], 80.00th=[  397], 90.00th=[  422], 95.00th=[  435],
     | 99.00th=[  489], 99.50th=[ 3339], 99.90th=[17113], 99.95th=[17113],
     | 99.99th=[17113]
   bw (  KiB/s): min=   62, max= 2629, per=14.65%, avg=832.57, stdev=385.06, samples=1010
   iops        : min=    1, max=   82, avg=25.73, stdev=12.01, samples=1010
  lat (msec)   : 100=3.72%, 250=24.24%, 500=71.22%, 750=0.25%, >=2000=0.56%
  cpu          : usr=0.00%, sys=0.33%, ctx=0, majf=0, minf=0
  IO depths    : 1=0.1%, 2=0.1%, 4=0.3%, 8=99.7%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.1%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,13336,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=8
sequential_write: (groupid=0, jobs=1): err= 0: pid=8520: Sat Nov 11 23:31:24 2023
  write: IOPS=22, BW=709KiB/s (726kB/s)(416MiB/600284msec); 0 zone resets
    slat (usec): min=46, max=5246, avg=875.78, stdev=590.37
    clat (msec): min=53, max=31429, avg=360.06, stdev=1065.23
     lat (msec): min=53, max=31431, avg=360.94, stdev=1065.25
    clat percentiles (msec):
     |  1.00th=[   80],  5.00th=[  107], 10.00th=[  127], 20.00th=[  161],
     | 30.00th=[  262], 40.00th=[  300], 50.00th=[  351], 60.00th=[  368],
     | 70.00th=[  380], 80.00th=[  401], 90.00th=[  422], 95.00th=[  435],
     | 99.00th=[  485], 99.50th=[ 3339], 99.90th=[17113], 99.95th=[17113],
     | 99.99th=[17113]
   bw (  KiB/s): min=   62, max= 2693, per=14.62%, avg=830.52, stdev=384.44, samples=1010
   iops        : min=    1, max=   84, avg=25.67, stdev=11.99, samples=1010
  lat (msec)   : 100=3.72%, 250=24.18%, 500=71.35%, 750=0.18%, >=2000=0.56%
  cpu          : usr=0.00%, sys=0.33%, ctx=0, majf=0, minf=0
  IO depths    : 1=0.1%, 2=0.1%, 4=0.3%, 8=99.7%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.1%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,13303,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=8
sequential_write: (groupid=0, jobs=1): err= 0: pid=6192: Sat Nov 11 23:31:24 2023
  write: IOPS=22, BW=709KiB/s (726kB/s)(416MiB/600290msec); 0 zone resets
    slat (usec): min=42, max=5372, avg=883.05, stdev=589.35
    clat (msec): min=57, max=31449, avg=359.95, stdev=1073.91
     lat (msec): min=58, max=31449, avg=360.84, stdev=1073.94
    clat percentiles (msec):
     |  1.00th=[   80],  5.00th=[  108], 10.00th=[  126], 20.00th=[  159],
     | 30.00th=[  264], 40.00th=[  300], 50.00th=[  351], 60.00th=[  368],
     | 70.00th=[  384], 80.00th=[  397], 90.00th=[  422], 95.00th=[  439],
     | 99.00th=[  493], 99.50th=[ 2836], 99.90th=[17113], 99.95th=[17113],
     | 99.99th=[17113]
   bw (  KiB/s): min=   62, max= 2629, per=14.62%, avg=830.76, stdev=386.67, samples=1010
   iops        : min=    1, max=   82, avg=25.68, stdev=12.06, samples=1010
  lat (msec)   : 100=3.76%, 250=24.03%, 500=71.40%, 750=0.25%, >=2000=0.56%
  cpu          : usr=0.00%, sys=0.33%, ctx=0, majf=0, minf=0
  IO depths    : 1=0.1%, 2=0.1%, 4=0.3%, 8=99.7%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.1%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,13307,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=8
sequential_write: (groupid=0, jobs=1): err= 0: pid=7048: Sat Nov 11 23:31:24 2023
  write: IOPS=22, BW=710KiB/s (727kB/s)(416MiB/600299msec); 0 zone resets
    slat (usec): min=51, max=4253, avg=875.25, stdev=582.45
    clat (msec): min=53, max=31397, avg=359.51, stdev=1071.81
     lat (msec): min=53, max=31397, avg=360.38, stdev=1071.83
    clat percentiles (msec):
     |  1.00th=[   80],  5.00th=[  108], 10.00th=[  126], 20.00th=[  161],
     | 30.00th=[  264], 40.00th=[  300], 50.00th=[  351], 60.00th=[  368],
     | 70.00th=[  380], 80.00th=[  397], 90.00th=[  418], 95.00th=[  435],
     | 99.00th=[  489], 99.50th=[ 2836], 99.90th=[17113], 99.95th=[17113],
     | 99.99th=[17113]
   bw (  KiB/s): min=  123, max= 2629, per=14.65%, avg=832.68, stdev=385.08, samples=1009
   iops        : min=    3, max=   82, avg=25.73, stdev=12.01, samples=1009
  lat (msec)   : 100=3.70%, 250=24.11%, 500=71.34%, 750=0.23%, 1000=0.06%
  lat (msec)   : >=2000=0.56%
  cpu          : usr=0.00%, sys=0.33%, ctx=0, majf=0, minf=0
  IO depths    : 1=0.1%, 2=0.1%, 4=0.3%, 8=99.7%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.1%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,13324,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=8

Run status group 0 (all jobs):
  WRITE: bw=5678KiB/s (5815kB/s), 709KiB/s-711KiB/s (726kB/s-728kB/s), io=3329MiB (3491MB), run=600284-600300msec

@EchterAgo
Copy link
Author

EchterAgo commented Nov 11, 2023

Hmmm, very weird, fio is also slow in the case where rclone is fast???? Disabling preallocation makes fio fast again.

Now I'm wondering if rclone actually changes its IO pattern based on some kind of block size detection?

@EchterAgo
Copy link
Author

I'm reopening this because I think we still need to figure this out further.

@EchterAgo EchterAgo reopened this Nov 11, 2023
@EchterAgo
Copy link
Author

EchterAgo commented Nov 11, 2023

rclones write pattern is the same whether I set z_blksize or not, always sequential 32KiB writes from multiple threads (8 in this case) with a preallocation size that is rounded up to the next block size. The reported file block size (IRP_MJ_QUERY_VOLUME_INFORMATION on the file) always seems to be 512, which is something we should probably fix too:

image

@EchterAgo
Copy link
Author

Yea, so the IRP_MJ_WRITEs just take a lot longer, in the slow case a bit over 4ms for 32KiB, in the fast case 0.12ms for 32KiB

@EchterAgo
Copy link
Author

EchterAgo commented Nov 11, 2023

Ok, so the difference with fio compared to rclone is that my current fio configuration uses direct IO (non-cached). Disabling direct IO causes fio to behave the same as rclone, including crashing / hanging when aborting.

This is the config:

[global]
ioengine=windowsaio
direct=0
iodepth=8
time_based
runtime=600
directory=D\:\testdir
filename_format=testfile_$jobname_$jobnum_$filenum.bin
bs=32k
size=16G
fallocate=native

[sequential_write]
rw=write
numjobs=8

I wonder why direct IO is always so much slower.

@EchterAgo
Copy link
Author

EchterAgo commented Nov 11, 2023

I also noticed that rclone seems to be using slightly different write path when slow?

Fast:

image

Slow:

image

Notice the difference in path. I'm really not sure why it does this. Call stacks of both look the same:

image

@EchterAgo
Copy link
Author

EchterAgo commented Nov 11, 2023

So, to summarize what I found out:

  • Direct IO (non-cached) is always slow.
  • With normal IO rclone and fio behave the same.
  • There is a curious difference in the path names Process Monitor shows between the slow and fast case.
  • It is actually the IRP_MJ_WRITE calls that take much longer, for 32KiB = 0.12ms in the fast case, >4ms in the slow case.
  • IRP_MJ_QUERY_VOLUME_INFORMATION always reports 512 bytes, this is something for a separate issue and unrelated.

@EchterAgo
Copy link
Author

fio results:

Bad zfs-Windows-2.2.0-release (eb7f88a):

sequential_write: (groupid=0, jobs=1): err= 0: pid=10028: Sun Nov 12 04:39:08 2023
  write: IOPS=25, BW=826KiB/s (846kB/s)(485MiB/600328msec); 0 zone resets
    slat (usec): min=35, max=7279, avg=929.57, stdev=649.23
    clat (nsec): min=1300, max=11682M, avg=307850031.60, stdev=678554956.15
     lat (usec): min=136, max=11683k, avg=308779.60, stdev=678557.72
    clat percentiles (usec):
     |  1.00th=[     553],  5.00th=[    1352], 10.00th=[    2008],
     | 20.00th=[    3032], 30.00th=[    4080], 40.00th=[    5407],
     | 50.00th=[    8356], 60.00th=[   80217], 70.00th=[  202376],
     | 80.00th=[  425722], 90.00th=[ 1035994], 95.00th=[ 1568670],
     | 99.00th=[ 3372221], 99.50th=[ 3942646], 99.90th=[ 5737808],
     | 99.95th=[11609834], 99.99th=[11744052]
   bw (  KiB/s): min=   62, max=28352, per=18.30%, avg=1588.57, stdev=1685.94, samples=617
   iops        : min=    1, max=  886, avg=49.31, stdev=52.68, samples=617
  lat (usec)   : 2=0.01%, 4=0.01%, 100=0.01%, 250=0.21%, 500=0.64%
  lat (usec)   : 750=0.74%, 1000=1.28%
  lat (msec)   : 2=7.08%, 4=19.13%, 10=22.84%, 20=1.66%, 50=2.70%
  lat (msec)   : 100=5.59%, 250=10.60%, 500=9.90%, 750=4.49%, 1000=2.13%
  lat (msec)   : 2000=8.46%, >=2000=2.55%
  cpu          : usr=0.00%, sys=0.33%, ctx=0, majf=0, minf=0
  IO depths    : 1=0.1%, 2=2.8%, 4=46.4%, 8=50.8%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=97.4%, 8=2.6%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,15505,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=8
sequential_write: (groupid=0, jobs=1): err= 0: pid=10012: Sun Nov 12 04:39:08 2023
  write: IOPS=37, BW=1197KiB/s (1225kB/s)(702MiB/600546msec); 0 zone resets
    slat (usec): min=34, max=5554, avg=781.06, stdev=564.89
    clat (nsec): min=1100, max=11798M, avg=211887205.32, stdev=610879410.10
     lat (usec): min=97, max=11799k, avg=212668.26, stdev=610935.71
    clat percentiles (usec):
     |  1.00th=[     233],  5.00th=[     586], 10.00th=[     955],
     | 20.00th=[    1631], 30.00th=[    2245], 40.00th=[    2900],
     | 50.00th=[    3589], 60.00th=[    4686], 70.00th=[    8225],
     | 80.00th=[  175113], 90.00th=[  708838], 95.00th=[ 1216349],
     | 99.00th=[ 3003122], 99.50th=[ 3942646], 99.90th=[ 5804917],
     | 99.95th=[ 6006244], 99.99th=[11744052]
   bw (  KiB/s): min=   62, max=74161, per=28.14%, avg=2441.82, stdev=7804.85, samples=581
   iops        : min=    1, max= 2317, avg=75.96, stdev=243.90, samples=581
  lat (usec)   : 2=0.15%, 4=0.10%, 10=0.03%, 20=0.01%, 100=0.01%
  lat (usec)   : 250=0.80%, 500=2.76%, 750=3.39%, 1000=3.46%
  lat (msec)   : 2=15.07%, 4=28.73%, 10=16.86%, 20=0.82%, 50=1.53%
  lat (msec)   : 100=2.97%, 250=5.86%, 500=4.74%, 750=2.88%, 1000=1.55%
  lat (msec)   : 2000=6.52%, >=2000=1.74%
  cpu          : usr=0.00%, sys=0.67%, ctx=0, majf=0, minf=0
  IO depths    : 1=0.3%, 2=13.4%, 4=50.8%, 8=35.5%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=93.9%, 8=6.1%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,22455,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=8
sequential_write: (groupid=0, jobs=1): err= 0: pid=10004: Sun Nov 12 04:39:08 2023
  write: IOPS=39, BW=1255KiB/s (1285kB/s)(736MiB/600158msec); 0 zone resets
    slat (usec): min=38, max=6544, avg=770.36, stdev=551.01
    clat (nsec): min=1400, max=12411M, avg=201795987.20, stdev=591894692.07
     lat (usec): min=130, max=12411k, avg=202566.35, stdev=591953.49
    clat percentiles (usec):
     |  1.00th=[     235],  5.00th=[     594], 10.00th=[     955],
     | 20.00th=[    1614], 30.00th=[    2245], 40.00th=[    2835],
     | 50.00th=[    3523], 60.00th=[    4424], 70.00th=[    6915],
     | 80.00th=[  139461], 90.00th=[  692061], 95.00th=[ 1249903],
     | 99.00th=[ 2768241], 99.50th=[ 3774874], 99.90th=[ 5402264],
     | 99.95th=[ 5737808], 99.99th=[12415140]
   bw (  KiB/s): min=   62, max=71360, per=30.88%, avg=2679.66, stdev=8360.34, samples=555
   iops        : min=    1, max= 2230, avg=83.38, stdev=261.26, samples=555
  lat (usec)   : 2=0.21%, 4=0.11%, 10=0.01%, 20=0.01%, 50=0.01%
  lat (usec)   : 250=0.75%, 500=2.74%, 750=3.32%, 1000=3.49%
  lat (msec)   : 2=15.48%, 4=30.02%, 10=17.04%, 20=0.97%, 50=1.47%
  lat (msec)   : 100=2.71%, 250=5.42%, 500=4.36%, 750=2.37%, 1000=1.10%
  lat (msec)   : 2000=6.88%, >=2000=1.54%
  cpu          : usr=0.00%, sys=0.67%, ctx=0, majf=0, minf=0
  IO depths    : 1=0.4%, 2=14.7%, 4=51.6%, 8=33.3%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=93.3%, 8=6.7%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,23544,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=8
sequential_write: (groupid=0, jobs=1): err= 0: pid=9948: Sun Nov 12 04:39:08 2023
  write: IOPS=39, BW=1269KiB/s (1300kB/s)(744MiB/600257msec); 0 zone resets
    slat (usec): min=37, max=5571, avg=747.25, stdev=537.40
    clat (nsec): min=1500, max=9393.5M, avg=199599392.34, stdev=566584013.84
     lat (usec): min=121, max=9393.9k, avg=200346.64, stdev=566640.96
    clat percentiles (usec):
     |  1.00th=[    235],  5.00th=[    545], 10.00th=[    881],
     | 20.00th=[   1516], 30.00th=[   2114], 40.00th=[   2704],
     | 50.00th=[   3359], 60.00th=[   4228], 70.00th=[   6587],
     | 80.00th=[ 133694], 90.00th=[ 717226], 95.00th=[1266680],
     | 99.00th=[2701132], 99.50th=[3472884], 99.90th=[5469373],
     | 99.95th=[6408897], 99.99th=[9328133]
   bw (  KiB/s): min=   62, max=72640, per=31.42%, avg=2726.08, stdev=8767.12, samples=552
   iops        : min=    1, max= 2270, avg=84.86, stdev=273.99, samples=552
  lat (usec)   : 2=0.13%, 4=0.11%, 10=0.02%, 100=0.01%, 250=0.82%
  lat (usec)   : 500=3.22%, 750=3.60%, 1000=3.93%
  lat (msec)   : 2=16.09%, 4=30.01%, 10=15.76%, 20=0.71%, 50=1.45%
  lat (msec)   : 100=2.53%, 250=5.85%, 500=3.80%, 750=2.07%, 1000=1.48%
  lat (msec)   : 2000=6.81%, >=2000=1.60%
  cpu          : usr=0.00%, sys=0.67%, ctx=0, majf=0, minf=0
  IO depths    : 1=0.3%, 2=15.1%, 4=51.4%, 8=33.3%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=93.3%, 8=6.7%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,23813,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=8
sequential_write: (groupid=0, jobs=1): err= 0: pid=9976: Sun Nov 12 04:39:08 2023
  write: IOPS=40, BW=1290KiB/s (1321kB/s)(757MiB/600813msec); 0 zone resets
    slat (usec): min=41, max=5376, avg=765.91, stdev=551.19
    clat (nsec): min=1200, max=13163M, avg=196334710.01, stdev=593207893.35
     lat (usec): min=116, max=13166k, avg=197100.62, stdev=593258.20
    clat percentiles (usec):
     |  1.00th=[     235],  5.00th=[     570], 10.00th=[     898],
     | 20.00th=[    1516], 30.00th=[    2114], 40.00th=[    2737],
     | 50.00th=[    3392], 60.00th=[    4293], 70.00th=[    6521],
     | 80.00th=[  132645], 90.00th=[  616563], 95.00th=[ 1249903],
     | 99.00th=[ 2231370], 99.50th=[ 3607102], 99.90th=[ 6006244],
     | 99.95th=[ 6274679], 99.99th=[13220447]
   bw (  KiB/s): min=   62, max=71510, per=30.78%, avg=2670.03, stdev=8457.94, samples=573
   iops        : min=    1, max= 2234, avg=83.08, stdev=264.32, samples=573
  lat (usec)   : 2=0.15%, 4=0.08%, 10=0.01%, 50=0.01%, 100=0.01%
  lat (usec)   : 250=0.85%, 500=2.91%, 750=3.81%, 1000=3.77%
  lat (msec)   : 2=16.33%, 4=29.23%, 10=16.96%, 20=0.95%, 50=1.13%
  lat (msec)   : 100=2.69%, 250=5.14%, 500=4.48%, 750=2.51%, 1000=1.05%
  lat (msec)   : 2000=6.30%, >=2000=1.66%
  cpu          : usr=0.00%, sys=0.67%, ctx=0, majf=0, minf=0
  IO depths    : 1=0.2%, 2=14.6%, 4=50.8%, 8=34.4%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=93.6%, 8=6.4%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,24224,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=8
sequential_write: (groupid=0, jobs=1): err= 0: pid=9936: Sun Nov 12 04:39:08 2023
  write: IOPS=30, BW=971KiB/s (995kB/s)(569MiB/600258msec); 0 zone resets
    slat (usec): min=43, max=5535, avg=831.52, stdev=603.84
    clat (nsec): min=1300, max=10806M, avg=261601082.24, stdev=649005894.65
     lat (usec): min=157, max=10806k, avg=262432.61, stdev=649042.97
    clat percentiles (usec):
     |  1.00th=[     281],  5.00th=[     725], 10.00th=[    1188],
     | 20.00th=[    1991], 30.00th=[    2704], 40.00th=[    3523],
     | 50.00th=[    4686], 60.00th=[    7177], 70.00th=[  100140],
     | 80.00th=[  316670], 90.00th=[ 1035994], 95.00th=[ 1451230],
     | 99.00th=[ 2902459], 99.50th=[ 4009755], 99.90th=[ 6610224],
     | 99.95th=[ 6811550], 99.99th=[10804528]
   bw (  KiB/s): min=   62, max=70912, per=22.59%, avg=1960.99, stdev=5916.94, samples=588
   iops        : min=    1, max= 2216, avg=60.94, stdev=184.91, samples=588
  lat (usec)   : 2=0.10%, 4=0.04%, 20=0.01%, 50=0.01%, 250=0.67%
  lat (usec)   : 500=2.10%, 750=2.33%, 1000=2.71%
  lat (msec)   : 2=12.07%, 4=24.67%, 10=18.86%, 20=1.10%, 50=1.72%
  lat (msec)   : 100=3.62%, 250=7.78%, 500=6.88%, 750=2.85%, 1000=1.65%
  lat (msec)   : 2000=9.12%, >=2000=1.71%
  cpu          : usr=0.00%, sys=0.33%, ctx=0, majf=0, minf=0
  IO depths    : 1=0.2%, 2=9.4%, 4=49.2%, 8=41.2%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=95.3%, 8=4.7%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,18219,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=8
sequential_write: (groupid=0, jobs=1): err= 0: pid=10008: Sun Nov 12 04:39:08 2023
  write: IOPS=30, BW=960KiB/s (983kB/s)(563MiB/600257msec); 0 zone resets
    slat (usec): min=42, max=6136, avg=846.13, stdev=607.05
    clat (nsec): min=1100, max=10650M, avg=264571256.47, stdev=650836973.18
     lat (usec): min=107, max=10650k, avg=265417.39, stdev=650862.27
    clat percentiles (usec):
     |  1.00th=[     293],  5.00th=[     717], 10.00th=[    1188],
     | 20.00th=[    1975], 30.00th=[    2737], 40.00th=[    3523],
     | 50.00th=[    4686], 60.00th=[    6980], 70.00th=[   99091],
     | 80.00th=[  341836], 90.00th=[ 1019216], 95.00th=[ 1417675],
     | 99.00th=[ 3003122], 99.50th=[ 4278191], 99.90th=[ 5804917],
     | 99.95th=[ 6073353], 99.99th=[10670310]
   bw (  KiB/s): min=   62, max=72583, per=22.22%, avg=1928.89, stdev=5898.11, samples=591
   iops        : min=    1, max= 2268, avg=59.93, stdev=184.33, samples=591
  lat (usec)   : 2=0.10%, 4=0.03%, 250=0.64%, 500=2.05%, 750=2.50%
  lat (usec)   : 1000=2.70%
  lat (msec)   : 2=12.25%, 4=24.59%, 10=19.23%, 20=0.99%, 50=1.34%
  lat (msec)   : 100=3.64%, 250=6.90%, 500=7.01%, 750=3.79%, 1000=1.85%
  lat (msec)   : 2000=8.42%, >=2000=1.98%
  cpu          : usr=0.00%, sys=0.50%, ctx=0, majf=0, minf=0
  IO depths    : 1=0.1%, 2=9.7%, 4=49.3%, 8=40.8%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=95.2%, 8=4.8%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,18013,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=8
sequential_write: (groupid=0, jobs=1): err= 0: pid=10016: Sun Nov 12 04:39:08 2023
  write: IOPS=28, BW=912KiB/s (934kB/s)(535MiB/600259msec); 0 zone resets
    slat (usec): min=40, max=5853, avg=874.99, stdev=627.20
    clat (nsec): min=1100, max=12888M, avg=278624238.24, stdev=672448636.08
     lat (usec): min=245, max=12889k, avg=279499.23, stdev=672471.05
    clat percentiles (usec):
     |  1.00th=[     334],  5.00th=[     873], 10.00th=[    1434],
     | 20.00th=[    2311], 30.00th=[    3130], 40.00th=[    4146],
     | 50.00th=[    5800], 60.00th=[   14615], 70.00th=[  154141],
     | 80.00th=[  375391], 90.00th=[  985662], 95.00th=[ 1551893],
     | 99.00th=[ 3238003], 99.50th=[ 3741320], 99.90th=[ 5872026],
     | 99.95th=[ 6006244], 99.99th=[12952011]
   bw (  KiB/s): min=   62, max=71936, per=20.61%, avg=1788.46, stdev=4683.82, samples=606
   iops        : min=    1, max= 2248, avg=55.53, stdev=146.39, samples=606
  lat (usec)   : 2=0.09%, 4=0.02%, 10=0.01%, 20=0.01%, 250=0.41%
  lat (usec)   : 500=1.62%, 750=1.85%, 1000=2.00%
  lat (msec)   : 2=10.31%, 4=22.62%, 10=19.90%, 20=1.41%, 50=1.88%
  lat (msec)   : 100=4.10%, 250=9.11%, 500=8.76%, 750=4.31%, 1000=1.65%
  lat (msec)   : 2000=7.38%, >=2000=2.56%
  cpu          : usr=0.00%, sys=0.33%, ctx=0, majf=0, minf=0
  IO depths    : 1=0.1%, 2=6.8%, 4=47.2%, 8=45.9%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=96.2%, 8=3.8%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,17116,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=8

Run status group 0 (all jobs):
  WRITE: bw=8676KiB/s (8884kB/s), 826KiB/s-1290KiB/s (846kB/s-1321kB/s), io=5090MiB (5338MB), run=600158-600813msec

Good:

diff --git a/module/os/windows/zfs/zfs_znode.c b/module/os/windows/zfs/zfs_znode.c
index f501b3ac1..eff94188f 100644
--- a/module/os/windows/zfs/zfs_znode.c
+++ b/module/os/windows/zfs/zfs_znode.c
@@ -624,9 +624,6 @@ zfs_znode_alloc(zfsvfs_t *zfsvfs, dmu_buf_t *db, int blksz,
        zp->z_zfsvfs = zfsvfs;
        mutex_exit(&zfsvfs->z_znodes_lock);

-       if (zp->z_blksz == 0)
-               zp->z_blksz = zfs_blksz(zp);
-
        return (zp);
 }
sequential_write: (groupid=0, jobs=1): err= 0: pid=6380: Sun Nov 12 04:53:38 2023
  write: IOPS=1139, BW=35.6MiB/s (37.3MB/s)(20.9GiB/600015msec); 0 zone resets
    slat (usec): min=27, max=1618.5k, avg=456.55, stdev=4595.92
    clat (nsec): min=700, max=1625.9M, avg=5272008.86, stdev=24928883.42
     lat (usec): min=59, max=1626.1k, avg=5728.56, stdev=25345.08
    clat percentiles (usec):
     |  1.00th=[    99],  5.00th=[   249], 10.00th=[   388], 20.00th=[   635],
     | 30.00th=[   906], 40.00th=[  1221], 50.00th=[  1582], 60.00th=[  1991],
     | 70.00th=[  2507], 80.00th=[  3195], 90.00th=[  4555], 95.00th=[  9110],
     | 99.00th=[128451], 99.50th=[175113], 99.90th=[316670], 99.95th=[358613],
     | 99.99th=[526386]
   bw (  KiB/s): min=  128, max=114020, per=12.56%, avg=36829.87, stdev=27553.48, samples=1172
   iops        : min=    4, max= 3563, avg=1150.53, stdev=861.00, samples=1172
  lat (nsec)   : 750=0.01%, 1000=0.01%
  lat (usec)   : 2=0.29%, 4=0.13%, 10=0.01%, 20=0.01%, 50=0.01%
  lat (usec)   : 100=0.57%, 250=4.05%, 500=9.41%, 750=9.88%, 1000=8.74%
  lat (msec)   : 2=27.18%, 4=26.94%, 10=8.06%, 20=1.65%, 50=1.23%
  lat (msec)   : 100=0.32%, 250=1.31%, 500=0.19%, 750=0.01%, 1000=0.01%
  lat (msec)   : 2000=0.01%
  cpu          : usr=0.50%, sys=18.50%, ctx=0, majf=0, minf=0
  IO depths    : 1=0.5%, 2=23.5%, 4=56.8%, 8=19.3%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=89.2%, 8=10.8%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,683421,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=8
sequential_write: (groupid=0, jobs=1): err= 0: pid=8360: Sun Nov 12 04:53:38 2023
  write: IOPS=1148, BW=35.9MiB/s (37.6MB/s)(21.0GiB/600033msec); 0 zone resets
    slat (usec): min=29, max=1618.6k, avg=454.85, stdev=4799.65
    clat (nsec): min=600, max=1620.8M, avg=5212781.94, stdev=24071738.20
     lat (usec): min=53, max=1621.2k, avg=5667.64, stdev=24542.01
    clat percentiles (usec):
     |  1.00th=[    98],  5.00th=[   241], 10.00th=[   379], 20.00th=[   619],
     | 30.00th=[   881], 40.00th=[  1205], 50.00th=[  1565], 60.00th=[  1975],
     | 70.00th=[  2474], 80.00th=[  3195], 90.00th=[  4555], 95.00th=[  8979],
     | 99.00th=[128451], 99.50th=[173016], 99.90th=[308282], 99.95th=[333448],
     | 99.99th=[633340]
   bw (  KiB/s): min=  128, max=127313, per=12.66%, avg=37122.23, stdev=27642.35, samples=1173
   iops        : min=    4, max= 3978, avg=1159.65, stdev=863.77, samples=1173
  lat (nsec)   : 750=0.01%, 1000=0.01%
  lat (usec)   : 2=0.28%, 4=0.12%, 10=0.01%, 20=0.01%, 50=0.01%
  lat (usec)   : 100=0.62%, 250=4.26%, 500=9.66%, 750=10.17%, 1000=8.78%
  lat (msec)   : 2=26.76%, 4=26.64%, 10=7.99%, 20=1.65%, 50=1.20%
  lat (msec)   : 100=0.34%, 250=1.33%, 500=0.17%, 750=0.01%, 1000=0.01%
  lat (msec)   : 2000=0.01%
  cpu          : usr=0.50%, sys=18.83%, ctx=0, majf=0, minf=0
  IO depths    : 1=0.5%, 2=23.7%, 4=57.0%, 8=18.9%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=89.1%, 8=10.9%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,689243,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=8
sequential_write: (groupid=0, jobs=1): err= 0: pid=8356: Sun Nov 12 04:53:38 2023
  write: IOPS=1135, BW=35.5MiB/s (37.2MB/s)(20.8GiB/600004msec); 0 zone resets
    slat (usec): min=28, max=1618.6k, avg=471.20, stdev=5349.31
    clat (nsec): min=900, max=1620.6M, avg=5242952.86, stdev=24132216.73
     lat (usec): min=66, max=1620.9k, avg=5714.15, stdev=24712.24
    clat percentiles (usec):
     |  1.00th=[   102],  5.00th=[   255], 10.00th=[   400], 20.00th=[   660],
     | 30.00th=[   938], 40.00th=[  1254], 50.00th=[  1614], 60.00th=[  2024],
     | 70.00th=[  2540], 80.00th=[  3261], 90.00th=[  4686], 95.00th=[  9372],
     | 99.00th=[125305], 99.50th=[170918], 99.90th=[308282], 99.95th=[333448],
     | 99.99th=[557843]
   bw (  KiB/s): min=  500, max=135759, per=12.51%, avg=36703.27, stdev=26848.21, samples=1172
   iops        : min=   15, max= 4242, avg=1146.55, stdev=838.97, samples=1172
  lat (nsec)   : 1000=0.01%
  lat (usec)   : 2=0.29%, 4=0.16%, 10=0.01%, 20=0.01%, 50=0.01%
  lat (usec)   : 100=0.48%, 250=3.87%, 500=8.93%, 750=9.66%, 1000=8.77%
  lat (msec)   : 2=27.43%, 4=27.16%, 10=8.42%, 20=1.67%, 50=1.33%
  lat (msec)   : 100=0.36%, 250=1.27%, 500=0.17%, 750=0.01%, 1000=0.01%
  lat (msec)   : 2000=0.01%
  cpu          : usr=0.50%, sys=19.00%, ctx=0, majf=0, minf=0
  IO depths    : 1=0.5%, 2=23.2%, 4=56.6%, 8=19.6%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=89.4%, 8=10.6%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,681247,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=8
sequential_write: (groupid=0, jobs=1): err= 0: pid=6252: Sun Nov 12 04:53:38 2023
  write: IOPS=1163, BW=36.4MiB/s (38.1MB/s)(21.3GiB/600005msec); 0 zone resets
    slat (usec): min=28, max=1619.7k, avg=448.13, stdev=4857.41
    clat (nsec): min=900, max=1621.9M, avg=5150571.87, stdev=23767175.05
     lat (usec): min=52, max=1622.1k, avg=5598.70, stdev=24254.03
    clat percentiles (usec):
     |  1.00th=[    93],  5.00th=[   233], 10.00th=[   367], 20.00th=[   611],
     | 30.00th=[   873], 40.00th=[  1188], 50.00th=[  1549], 60.00th=[  1958],
     | 70.00th=[  2474], 80.00th=[  3163], 90.00th=[  4555], 95.00th=[  9372],
     | 99.00th=[125305], 99.50th=[168821], 99.90th=[299893], 99.95th=[333448],
     | 99.99th=[633340]
   bw (  KiB/s): min=  192, max=238492, per=12.83%, avg=37615.90, stdev=29260.82, samples=1173
   iops        : min=    6, max= 7452, avg=1175.09, stdev=914.35, samples=1173
  lat (nsec)   : 1000=0.01%
  lat (usec)   : 2=0.25%, 4=0.22%, 10=0.01%, 20=0.01%, 50=0.01%
  lat (usec)   : 100=0.69%, 250=4.40%, 500=9.82%, 750=10.14%, 1000=8.75%
  lat (msec)   : 2=26.79%, 4=26.33%, 10=7.79%, 20=1.72%, 50=1.30%
  lat (msec)   : 100=0.31%, 250=1.29%, 500=0.17%, 750=0.01%, 1000=0.01%
  lat (msec)   : 2000=0.01%
  cpu          : usr=0.50%, sys=18.67%, ctx=0, majf=0, minf=0
  IO depths    : 1=0.5%, 2=23.5%, 4=56.8%, 8=19.2%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=89.2%, 8=10.8%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,698194,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=8
sequential_write: (groupid=0, jobs=1): err= 0: pid=8388: Sun Nov 12 04:53:38 2023
  write: IOPS=1153, BW=36.1MiB/s (37.8MB/s)(21.1GiB/600003msec); 0 zone resets
    slat (usec): min=28, max=1618.3k, avg=452.19, stdev=4883.17
    clat (nsec): min=600, max=1623.7M, avg=5221157.53, stdev=24483382.89
     lat (usec): min=66, max=1624.1k, avg=5673.35, stdev=24962.35
    clat percentiles (usec):
     |  1.00th=[    96],  5.00th=[   239], 10.00th=[   375], 20.00th=[   619],
     | 30.00th=[   881], 40.00th=[  1188], 50.00th=[  1549], 60.00th=[  1958],
     | 70.00th=[  2474], 80.00th=[  3163], 90.00th=[  4490], 95.00th=[  8979],
     | 99.00th=[128451], 99.50th=[173016], 99.90th=[308282], 99.95th=[333448],
     | 99.99th=[641729]
   bw (  KiB/s): min=  498, max=247953, per=12.71%, avg=37264.06, stdev=28461.24, samples=1173
   iops        : min=   15, max= 7748, avg=1164.10, stdev=889.38, samples=1173
  lat (nsec)   : 750=0.01%, 1000=0.01%
  lat (usec)   : 2=0.26%, 4=0.15%, 10=0.01%, 20=0.01%, 50=0.01%
  lat (usec)   : 100=0.66%, 250=4.30%, 500=9.74%, 750=10.10%, 1000=8.72%
  lat (msec)   : 2=26.95%, 4=26.59%, 10=7.83%, 20=1.59%, 50=1.24%
  lat (msec)   : 100=0.33%, 250=1.31%, 500=0.18%, 750=0.01%, 1000=0.01%
  lat (msec)   : 2000=0.01%
  cpu          : usr=0.50%, sys=19.17%, ctx=0, majf=0, minf=0
  IO depths    : 1=0.5%, 2=23.7%, 4=56.9%, 8=19.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=89.2%, 8=10.8%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,692245,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=8
sequential_write: (groupid=0, jobs=1): err= 0: pid=8320: Sun Nov 12 04:53:38 2023
  write: IOPS=1150, BW=35.9MiB/s (37.7MB/s)(21.1GiB/600003msec); 0 zone resets
    slat (usec): min=28, max=1618.6k, avg=445.47, stdev=4232.55
    clat (nsec): min=800, max=1630.6M, avg=5279443.64, stdev=24975772.09
     lat (usec): min=70, max=1630.7k, avg=5724.92, stdev=25328.10
    clat percentiles (usec):
     |  1.00th=[    99],  5.00th=[   241], 10.00th=[   379], 20.00th=[   627],
     | 30.00th=[   889], 40.00th=[  1205], 50.00th=[  1549], 60.00th=[  1958],
     | 70.00th=[  2474], 80.00th=[  3163], 90.00th=[  4555], 95.00th=[  9241],
     | 99.00th=[126354], 99.50th=[168821], 99.90th=[316670], 99.95th=[350225],
     | 99.99th=[666895]
   bw (  KiB/s): min=  499, max=162171, per=12.65%, avg=37107.40, stdev=28312.63, samples=1174
   iops        : min=   15, max= 5067, avg=1159.20, stdev=884.72, samples=1174
  lat (nsec)   : 1000=0.01%
  lat (usec)   : 2=0.28%, 4=0.11%, 10=0.01%, 20=0.01%, 50=0.01%
  lat (usec)   : 100=0.60%, 250=4.28%, 500=9.63%, 750=10.15%, 1000=8.76%
  lat (msec)   : 2=27.04%, 4=26.52%, 10=7.81%, 20=1.59%, 50=1.35%
  lat (msec)   : 100=0.35%, 250=1.28%, 500=0.19%, 750=0.01%, 1000=0.01%
  lat (msec)   : 2000=0.01%
  cpu          : usr=0.67%, sys=19.00%, ctx=0, majf=0, minf=0
  IO depths    : 1=0.5%, 2=23.6%, 4=56.9%, 8=19.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=89.2%, 8=10.8%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,690008,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=8
sequential_write: (groupid=0, jobs=1): err= 0: pid=9700: Sun Nov 12 04:53:38 2023
  write: IOPS=1136, BW=35.5MiB/s (37.2MB/s)(20.8GiB/600011msec); 0 zone resets
    slat (usec): min=28, max=1619.6k, avg=465.10, stdev=5020.55
    clat (nsec): min=600, max=1620.7M, avg=5283678.99, stdev=24997035.41
     lat (usec): min=57, max=1623.4k, avg=5748.77, stdev=25489.80
    clat percentiles (usec):
     |  1.00th=[   103],  5.00th=[   255], 10.00th=[   400], 20.00th=[   660],
     | 30.00th=[   930], 40.00th=[  1254], 50.00th=[  1598], 60.00th=[  2008],
     | 70.00th=[  2540], 80.00th=[  3228], 90.00th=[  4621], 95.00th=[  9503],
     | 99.00th=[127402], 99.50th=[173016], 99.90th=[308282], 99.95th=[337642],
     | 99.99th=[641729]
   bw (  KiB/s): min=  256, max=104773, per=12.54%, avg=36765.79, stdev=27041.13, samples=1172
   iops        : min=    8, max= 3274, avg=1148.52, stdev=844.99, samples=1172
  lat (nsec)   : 750=0.01%, 1000=0.01%
  lat (usec)   : 2=0.29%, 4=0.13%, 10=0.01%, 20=0.01%, 50=0.01%
  lat (usec)   : 100=0.49%, 250=3.90%, 500=8.97%, 750=9.78%, 1000=8.74%
  lat (msec)   : 2=27.33%, 4=27.17%, 10=8.33%, 20=1.74%, 50=1.28%
  lat (msec)   : 100=0.37%, 250=1.23%, 500=0.19%, 750=0.01%, 1000=0.01%
  lat (msec)   : 2000=0.01%
  cpu          : usr=0.50%, sys=19.00%, ctx=0, majf=0, minf=0
  IO depths    : 1=0.5%, 2=23.4%, 4=56.7%, 8=19.5%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=89.3%, 8=10.7%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,682043,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=8
sequential_write: (groupid=0, jobs=1): err= 0: pid=10012: Sun Nov 12 04:53:38 2023
  write: IOPS=1139, BW=35.6MiB/s (37.3MB/s)(20.9GiB/600136msec); 0 zone resets
    slat (usec): min=29, max=1618.5k, avg=459.64, stdev=4991.50
    clat (nsec): min=500, max=1620.3M, avg=5263147.47, stdev=24610201.30
     lat (usec): min=60, max=1620.5k, avg=5722.78, stdev=25105.81
    clat percentiles (usec):
     |  1.00th=[   102],  5.00th=[   255], 10.00th=[   396], 20.00th=[   652],
     | 30.00th=[   922], 40.00th=[  1237], 50.00th=[  1598], 60.00th=[  2008],
     | 70.00th=[  2507], 80.00th=[  3195], 90.00th=[  4555], 95.00th=[  9110],
     | 99.00th=[128451], 99.50th=[175113], 99.90th=[312476], 99.95th=[341836],
     | 99.99th=[534774]
   bw (  KiB/s): min=  192, max=119346, per=12.57%, avg=36866.39, stdev=27250.18, samples=1172
   iops        : min=    6, max= 3729, avg=1151.67, stdev=851.52, samples=1172
  lat (nsec)   : 750=0.01%, 1000=0.01%
  lat (usec)   : 2=0.29%, 4=0.14%, 10=0.01%, 20=0.01%, 50=0.01%
  lat (usec)   : 100=0.51%, 250=3.91%, 500=9.09%, 750=9.80%, 1000=8.76%
  lat (msec)   : 2=27.36%, 4=27.49%, 10=7.88%, 20=1.67%, 50=1.23%
  lat (msec)   : 100=0.34%, 250=1.31%, 500=0.17%, 750=0.01%, 1000=0.01%
  lat (msec)   : 2000=0.01%
  cpu          : usr=0.67%, sys=19.00%, ctx=0, majf=0, minf=0
  IO depths    : 1=0.5%, 2=23.3%, 4=56.6%, 8=19.6%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=89.3%, 8=10.7%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,684054,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=8

Run status group 0 (all jobs):
  WRITE: bw=286MiB/s (300MB/s), 35.5MiB/s-36.4MiB/s (37.2MB/s-38.1MB/s), io=168GiB (180GB), run=600003-600136msec

@lundman
Copy link

lundman commented Nov 16, 2023

OK, if I have understood things, I cleaned up the code in both places. If we are happy with it, we can close this.

Sorry for the delay, wanted to push out the macOS version

@EchterAgo
Copy link
Author

The code you pushed is pretty much the same as I've been testing the last few days so I'm going to close.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants