Performance regression during rclone copy with preallocation enabled #318

EchterAgo · 2023-11-06T07:13:07Z

This seems to be a regression that started with 1ecc0ca. See #283 for some more context.

I just logged zpool iostat 5 for an 8 threaded rclone copy to a newly created pool and it seems like there is some write amplification going on when blksz is not set to 4096.

iostat_512.txt
iostat_4096.txt

Note that in the 512 byte case iostat reports ~80-100MB/s IO with a high write operation count, but rclone itself only reports 5-10MB/s. What I also noticed is that it takes a long time to even get the rclone process to abort the copy.

In the 4096 case iostat reports ~160-240MB/s and a much lower write operation count, which is about what I would expect and have seen before 1ecc0ca. rclone itself also reports a very similar speed range as iostat during the copy. I can also abort rclone within seconds instead of minutes.

Specifying --local-no-preallocate on the rclone command line also makes it go fast again.

The text was updated successfully, but these errors were encountered:

EchterAgo · 2023-11-06T07:15:07Z

What is weird is that this seems to only happen with rclone so far, everything else does not seem to slow down. I tried a fio sequential write benchmark with 8 jobs and 32KB block size (same as rclone) and got the expected ~200MB/s speed.

lundman · 2023-11-06T07:20:53Z

Most likely it uses sync writes, to make sure it has landed on disk. Would be equivalent of sync=always on the dataset for the other tools.

EchterAgo · 2023-11-06T08:28:50Z

Wow, when I try rclone with 512 in blksz and preallocation disabled, it instantly hangs the virtual machine, and for some reason really hard, I can't even reset the machine in VMware and the kernel debugger will not break. After a few minutes it will recover. I'm not seeing any IO errors on the hardware though and also tried restarting the machine :\

I'm not sure if this also happens with 4096 in blksz, but this used to work.

EchterAgo · 2023-11-06T08:38:14Z

So the system recovers after a while and the debugger breaks, but it did not reveal anything interesting. However, when disabling preallocation in rclone, the copy seems to proceed at a normal speed.

EchterAgo · 2023-11-06T09:01:37Z

Hmmm, this only seems to happen when using VMWare, I don't see the same when using libvirt/QEMU. Actually it works as expected when I don't use VMWare Shared Folders as the copy source. It does still crash like in #319 when aborting the copy though.

Can this performance regression be related to preallocation/sparse files? On 1ecc0ca that is the only difference I see between a test run that is fast and one that is slow. When slow, it starts out fast but then falls over the course of a minute to < 10MB/s, aborting takes minutes. When fast, it starts fast and stays fast, aborting takes seconds.

When I set blksz to 4096 it does not matter if the test scripts use preallocation/sparse files or not.

I guess I should find out whether specifying only --local-no-preallocate or --local-no-sparse on the rclone command line makes a difference.

EchterAgo · 2023-11-06T09:03:44Z

It is definitely the --local-no-preallocate flag that makes the difference, not --local-no-sparse.

EchterAgo · 2023-11-06T09:06:58Z

I also see the same behavior on my second host that uses QEMU instead of VMWare.

lundman · 2023-11-06T23:17:09Z

OK so, you feel confident that setting z_blksz to ashift gives you a considerable performance boost. This is when we only set z_blksz to 4096 if it is 0, does it help if we remove the if part, and always set it? Any corruption issues? Otherwise, let me add code to set it nicely.

lundman · 2023-11-07T01:09:25Z

Hmm I wonder if it is just zfs_log_write - if we change blocksize there to ashift, and prealloc, will it go faster. That would be a much smaller change to upstream

EchterAgo · 2023-11-07T10:02:57Z

OK so, you feel confident that setting z_blksz to ashift gives you a considerable performance boost.

That does definitely improve things. I'm unsure if it is the right thing to do, I just noticed performance drop sharply after 1ecc0ca. What is still weird is that I have been unable to measure this performance drop with anything other than rclone. I tried to recreate the rclone IO pattern using fio, but I'm not sure it actually supports preallocation on Windows. All my tests with fio so far showed expected performance levels. I'll have to check if it actually does preallocation.

This is when we only set z_blksz to 4096 if it is 0, does it help if we remove the if part, and always set it? Any corruption issues? Otherwise, let me add code to set it nicely.

I'll try that.

lundman · 2023-11-07T10:20:22Z

Or try the blocksize in zfs_log_write() - then we dont change much on upstream

EchterAgo · 2023-11-07T11:02:10Z

Setting blocksize only in zfs_log_write does make the copy work fast. I'll need to let it run for a few hours to confirm the memory issue is still good, but I expect it will be.

diff --git a/module/os/windows/zfs/zfs_znode.c b/module/os/windows/zfs/zfs_znode.c
index f501b3ac1..eff94188f 100644
--- a/module/os/windows/zfs/zfs_znode.c
+++ b/module/os/windows/zfs/zfs_znode.c
@@ -624,9 +624,6 @@ zfs_znode_alloc(zfsvfs_t *zfsvfs, dmu_buf_t *db, int blksz,
        zp->z_zfsvfs = zfsvfs;
        mutex_exit(&zfsvfs->z_znodes_lock);

-       if (zp->z_blksz == 0)
-               zp->z_blksz = zfs_blksz(zp);
-
        return (zp);
 }

diff --git a/module/zfs/zfs_log.c b/module/zfs/zfs_log.c
index 613fcdb23..ad0dfec0d 100644
--- a/module/zfs/zfs_log.c
+++ b/module/zfs/zfs_log.c
@@ -634,6 +634,9 @@ zfs_log_write(zilog_t *zilog, dmu_tx_t *tx, int txtype,
        uint64_t gen = 0;
        ssize_t size = resid;

+       if (blocksize == 0)
+               blocksize = 4096;
+
        if (zil_replaying(zilog, tx) || zp->z_unlinked ||
            zfs_xattr_owner_unlinked(zp)) {
                if (callback != NULL)

lundman · 2023-11-07T11:11:54Z

excellent, I'll do a commit to fish out the value properly when I wake

lundman · 2023-11-08T01:33:38Z

I moved it back into zfs_znode.c - there's too much change to zfs_log.c otherwise.

f0e86b4

EchterAgo · 2023-11-10T05:43:28Z

I think this is fixed.

lundman · 2023-11-10T22:39:20Z

I had to change the way we do this, running the same commit on macOS caused panic in the removal tests, it is likely to be a bug in that code, see:
openzfs#12078

but don't want to wait for that to be fixed, so let's handle the performance change differently, in just zfs_log_write(). This passes zfs-tests as before.

EchterAgo · 2023-11-11T08:33:41Z

I also just did some more tests, I doubt the performance issue actually comes from zfs_log_write. It is a separate issue when setting the z_blksz to 512 in zfs_znode_alloc. When I just set the blocksize to 512 in zfs_log_write, and remove setting the zp->z_blkszI in zfs_znode_alloc then I also get the expected performance.

Good (as described above), going @ ~220MB/s:

PS C:\Windows\system32> C:\tools\zfs\zpool.exe iostat -r 5

Data          sync_read    sync_write    async_read    async_write      scrub         trim         rebuild
req_size      ind    agg    ind    agg    ind    agg    ind    agg    ind    agg    ind    agg    ind    agg
----------  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----
512             0      0      0      0      0      0      0      0      0      0      0      0      0      0
1K              0      0      0      0      0      0      0      0      0      0      0      0      0      0
2K              0      0      0      0      0      0      0      0      0      0      0      0      0      0
4K              0      0    128      0      0      0    749      0      0      0      0      0      0      0
8K             12      0     28      0      0      0      0    189      0      0      0      0      0      0
16K             0      0      0      0      0      0      0     56      0      0      0      0      0      0
32K             0      0      0      0      0      0      0      9      0      0      0      0      0      0
64K            16      0     32      0      0      0      3      0      0      0      0      0      0      0
128K            0      0     16      0      0      0      8      4      0      0      0      0      0      0
256K            0      0      0      0      0      0     12     15      0      0      0      0      0      0
512K            0      0      0      0      0      0  1.10K     46      0      0      0      0      0      0
1M              0      0      0      0      0      0      0      0      0      0      0      0      0      0
2M              0      0      0      0      0      0      0      0      0      0      0      0      0      0
4M              0      0      0      0      0      0      0      0      0      0      0      0      0      0
8M              0      0      0      0      0      0      0      0      0      0      0      0      0      0
16M             0      0      0      0      0      0      0      0      0      0      0      0      0      0
------------------------------------------------------------------------------------------------------------

Data          sync_read    sync_write    async_read    async_write      scrub         trim         rebuild
req_size      ind    agg    ind    agg    ind    agg    ind    agg    ind    agg    ind    agg    ind    agg
----------  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----
512             0      0      0      0      0      0      0      0      0      0      0      0      0      0
1K              0      0      0      0      0      0      0      0      0      0      0      0      0      0
2K              0      0      0      0      0      0      0      0      0      0      0      0      0      0
4K              0      0     17      0      0      0    105      0      0      0      0      0      0      0
8K              0      0      0      0      0      0      0     20      0      0      0      0      0      0
16K             0      0      0      0      0      0      0     10      0      0      0      0      0      0
32K             0      0      0      0      0      0      0      0      0      0      0      0      0      0
64K             0      0      0      0      0      0      1      0      0      0      0      0      0      0
128K            0      0      0      0      0      0      3      0      0      0      0      0      0      0
256K            0      0      0      0      0      0     28      2      0      0      0      0      0      0
512K            0      0      0      0      0      0    312      9      0      0      0      0      0      0
1M              0      0      0      0      0      0      0      0      0      0      0      0      0      0
2M              0      0      0      0      0      0      0      0      0      0      0      0      0      0
4M              0      0      0      0      0      0      0      0      0      0      0      0      0      0
8M              0      0      0      0      0      0      0      0      0      0      0      0      0      0
16M             0      0      0      0      0      0      0      0      0      0      0      0      0      0
------------------------------------------------------------------------------------------------------------

Data          sync_read    sync_write    async_read    async_write      scrub         trim         rebuild
req_size      ind    agg    ind    agg    ind    agg    ind    agg    ind    agg    ind    agg    ind    agg
----------  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----
512             0      0      0      0      0      0      0      0      0      0      0      0      0      0
1K              0      0      0      0      0      0      0      0      0      0      0      0      0      0
2K              0      0      0      0      0      0      0      0      0      0      0      0      0      0
4K              0      0     22      0      0      0    160      0      0      0      0      0      0      0
8K              0      0      0      0      0      0      0     34      0      0      0      0      0      0
16K             0      0      0      0      0      0      1     11      0      0      0      0      0      0
32K             0      0      0      0      0      0      7      0      0      0      0      0      0      0
64K             0      0      0      0      0      0      8      0      0      0      0      0      0      0
128K            0      0      0      0      0      0      8      3      0      0      0      0      0      0
256K            0      0      0      0      0      0     57      2      0      0      0      0      0      0
512K            0      0      0      0      0      0    365     25      0      0      0      0      0      0
1M              0      0      0      0      0      0      0      0      0      0      0      0      0      0
2M              0      0      0      0      0      0      0      0      0      0      0      0      0      0
4M              0      0      0      0      0      0      0      0      0      0      0      0      0      0
8M              0      0      0      0      0      0      0      0      0      0      0      0      0      0
16M             0      0      0      0      0      0      0      0      0      0      0      0      0      0
------------------------------------------------------------------------------------------------------------

Data          sync_read    sync_write    async_read    async_write      scrub         trim         rebuild
req_size      ind    agg    ind    agg    ind    agg    ind    agg    ind    agg    ind    agg    ind    agg
----------  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----
512             0      0      0      0      0      0      0      0      0      0      0      0      0      0
1K              0      0      0      0      0      0      0      0      0      0      0      0      0      0
2K              0      0      0      0      0      0      0      0      0      0      0      0      0      0
4K              0      0     17      0      0      0    126      0      0      0      0      0      0      0
8K              0      0      0      0      0      0      0     27      0      0      0      0      0      0
16K             0      0      0      0      0      0      0     10      0      0      0      0      0      0
32K             0      0      0      0      0      0      2      2      0      0      0      0      0      0
64K             0      0      0      0      0      0      3      0      0      0      0      0      0      0
128K            0      0      0      0      0      0      2      0      0      0      0      0      0      0
256K            0      0      0      0      0      0      7      1      0      0      0      0      0      0
512K            0      0      0      0      0      0    417      2      0      0      0      0      0      0
1M              0      0      0      0      0      0      0      0      0      0      0      0      0      0
2M              0      0      0      0      0      0      0      0      0      0      0      0      0      0
4M              0      0      0      0      0      0      0      0      0      0      0      0      0      0
8M              0      0      0      0      0      0      0      0      0      0      0      0      0      0
16M             0      0      0      0      0      0      0      0      0      0      0      0      0      0
------------------------------------------------------------------------------------------------------------

Data          sync_read    sync_write    async_read    async_write      scrub         trim         rebuild
req_size      ind    agg    ind    agg    ind    agg    ind    agg    ind    agg    ind    agg    ind    agg
----------  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----
512             0      0      0      0      0      0      0      0      0      0      0      0      0      0
1K              0      0      0      0      0      0      0      0      0      0      0      0      0      0
2K              0      0      0      0      0      0      0      0      0      0      0      0      0      0
4K              0      0     20      0      0      0    144      0      0      0      0      0      0      0
8K              0      0      0      0      0      0      3     26      0      0      0      0      0      0
16K             0      0      0      0      0      0      0      7      0      0      0      0      0      0
32K             0      0      0      0      0      0      1      2      0      0      0      0      0      0
64K             0      0      0      0      0      0      1      0      0      0      0      0      0      0
128K            0      0      0      0      0      0      3      0      0      0      0      0      0      0
256K            0      0      0      0      0      0     55      3      0      0      0      0      0      0
512K            0      0      0      0      0      0    329     37      0      0      0      0      0      0
1M              0      0      0      0      0      0      0      0      0      0      0      0      0      0
2M              0      0      0      0      0      0      0      0      0      0      0      0      0      0
4M              0      0      0      0      0      0      0      0      0      0      0      0      0      0
8M              0      0      0      0      0      0      0      0      0      0      0      0      0      0
16M             0      0      0      0      0      0      0      0      0      0      0      0      0      0
------------------------------------------------------------------------------------------------------------

Now with only

if (zp->z_blksz == 0)
    zp->z_blksz = zfs_blksz(zp);

in zfs_znode_alloc, with setting blocksize removed in zfs_log_write.

Bad (starts at > 200MB/s, slows down over multiple seconds and ends up at a few hundred kilobytes per second):

PS C:\Windows\system32> C:\tools\zfs\zpool.exe iostat -r 5

Data          sync_read    sync_write    async_read    async_write      scrub         trim         rebuild
req_size      ind    agg    ind    agg    ind    agg    ind    agg    ind    agg    ind    agg    ind    agg
----------  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----
512             0      0      0      0      0      0      0      0      0      0      0      0      0      0
1K              0      0      0      0      0      0      0      0      0      0      0      0      0      0
2K              0      0      0      0      0      0      0      0      0      0      0      0      0      0
4K              0      0    128      0      0      0  35.1K      0      0      0      0      0      0      0
8K             12      0     28      0      0      0  1.11K  16.8K      0      0      0      0      0      0
16K             0      0      0      0      0      0  1.07K  9.45K      0      0      0      0      0      0
32K             0      0      0      0      0      0      0  6.04K      0      0      0      0      0      0
64K            16      0     32      0      0      0      0  9.71K      0      0      0      0      0      0
128K            0      0     16      0      0      0      0  7.86K      0      0      0      0      0      0
256K            0      0      0      0      0      0      0    259      0      0      0      0      0      0
512K            0      0      0      0      0      0      0      4      0      0      0      0      0      0
1M              0      0      0      0      0      0      0      0      0      0      0      0      0      0
2M              0      0      0      0      0      0      0      0      0      0      0      0      0      0
4M              0      0      0      0      0      0      0      0      0      0      0      0      0      0
8M              0      0      0      0      0      0      0      0      0      0      0      0      0      0
16M             0      0      0      0      0      0      0      0      0      0      0      0      0      0
------------------------------------------------------------------------------------------------------------

Data          sync_read    sync_write    async_read    async_write      scrub         trim         rebuild
req_size      ind    agg    ind    agg    ind    agg    ind    agg    ind    agg    ind    agg    ind    agg
----------  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----
512             0      0      0      0      0      0      0      0      0      0      0      0      0      0
1K              0      0      0      0      0      0      0      0      0      0      0      0      0      0
2K              0      0      0      0      0      0      0      0      0      0      0      0      0      0
4K              0      0      2      0      0      0  8.03K      0      0      0      0      0      0      0
8K              0      0      0      0      0      0     29  1.95K      0      0      0      0      0      0
16K             0      0      0      0      0      0     29    324      0      0      0      0      0      0
32K             0      0      0      0      0      0      0     21      0      0      0      0      0      0
64K             0      0      0      0      0      0      0     14      0      0      0      0      0      0
128K            0      0      0      0      0      0      0      8      0      0      0      0      0      0
256K            0      0      0      0      0      0      0      1      0      0      0      0      0      0
512K            0      0      0      0      0      0      0      0      0      0      0      0      0      0
1M              0      0      0      0      0      0      0      0      0      0      0      0      0      0
2M              0      0      0      0      0      0      0      0      0      0      0      0      0      0
4M              0      0      0      0      0      0      0      0      0      0      0      0      0      0
8M              0      0      0      0      0      0      0      0      0      0      0      0      0      0
16M             0      0      0      0      0      0      0      0      0      0      0      0      0      0
------------------------------------------------------------------------------------------------------------

Data          sync_read    sync_write    async_read    async_write      scrub         trim         rebuild
req_size      ind    agg    ind    agg    ind    agg    ind    agg    ind    agg    ind    agg    ind    agg
----------  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----
512             0      0      0      0      0      0      0      0      0      0      0      0      0      0
1K              0      0      0      0      0      0      0      0      0      0      0      0      0      0
2K              0      0      0      0      0      0      0      0      0      0      0      0      0      0
4K              0      0      0      0      0      0    314      0      0      0      0      0      0      0
8K              0      0      0      0      0      0     44    138      0      0      0      0      0      0
16K             0      0      0      0      0      0     34    128      0      0      0      0      0      0
32K             0      0      0      0      0      0      0    144      0      0      0      0      0      0
64K             0      0      0      0      0      0      0    289      0      0      0      0      0      0
128K            0      0      0      0      0      0      0    281      0      0      0      0      0      0
256K            0      0      0      0      0      0      0     13      0      0      0      0      0      0
512K            0      0      0      0      0      0      0      0      0      0      0      0      0      0
1M              0      0      0      0      0      0      0      0      0      0      0      0      0      0
2M              0      0      0      0      0      0      0      0      0      0      0      0      0      0
4M              0      0      0      0      0      0      0      0      0      0      0      0      0      0
8M              0      0      0      0      0      0      0      0      0      0      0      0      0      0
16M             0      0      0      0      0      0      0      0      0      0      0      0      0      0
------------------------------------------------------------------------------------------------------------

Data          sync_read    sync_write    async_read    async_write      scrub         trim         rebuild
req_size      ind    agg    ind    agg    ind    agg    ind    agg    ind    agg    ind    agg    ind    agg
----------  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----
512             0      0      0      0      0      0      0      0      0      0      0      0      0      0
1K              0      0      0      0      0      0      0      0      0      0      0      0      0      0
2K              0      0      0      0      0      0      0      0      0      0      0      0      0      0
4K              0      0      5      0      0      0    191      0      0      0      0      0      0      0
8K              0      0      0      0      0      0     22    109      0      0      0      0      0      0
16K             0      0      0      0      0      0     19     79      0      0      0      0      0      0
32K             0      0      0      0      0      0      0     62      0      0      0      0      0      0
64K             0      0      0      0      0      0      0    137      0      0      0      0      0      0
128K            0      0      0      0      0      0      0    444      0      0      0      0      0      0
256K            0      0      0      0      0      0      0      7      0      0      0      0      0      0
512K            0      0      0      0      0      0      0      0      0      0      0      0      0      0
1M              0      0      0      0      0      0      0      0      0      0      0      0      0      0
2M              0      0      0      0      0      0      0      0      0      0      0      0      0      0
4M              0      0      0      0      0      0      0      0      0      0      0      0      0      0
8M              0      0      0      0      0      0      0      0      0      0      0      0      0      0
16M             0      0      0      0      0      0      0      0      0      0      0      0      0      0
------------------------------------------------------------------------------------------------------------

Data          sync_read    sync_write    async_read    async_write      scrub         trim         rebuild
req_size      ind    agg    ind    agg    ind    agg    ind    agg    ind    agg    ind    agg    ind    agg
----------  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----
512             0      0      0      0      0      0      0      0      0      0      0      0      0      0
1K              0      0      0      0      0      0      0      0      0      0      0      0      0      0
2K              0      0      0      0      0      0      0      0      0      0      0      0      0      0
4K              0      0      2      0      0      0    203      0      0      0      0      0      0      0
8K              0      0      0      0      0      0     33    123      0      0      0      0      0      0
16K             0      0      0      0      0      0     28     96      0      0      0      0      0      0
32K             0      0      0      0      0      0      0    114      0      0      0      0      0      0
64K             0      0      0      0      0      0      0    236      0      0      0      0      0      0
128K            0      0      0      0      0      0      0    340      0      0      0      0      0      0
256K            0      0      0      0      0      0      0     11      0      0      0      0      0      0
512K            0      0      0      0      0      0      0      0      0      0      0      0      0      0
1M              0      0      0      0      0      0      0      0      0      0      0      0      0      0
2M              0      0      0      0      0      0      0      0      0      0      0      0      0      0
4M              0      0      0      0      0      0      0      0      0      0      0      0      0      0
8M              0      0      0      0      0      0      0      0      0      0      0      0      0      0
16M             0      0      0      0      0      0      0      0      0      0      0      0      0      0
------------------------------------------------------------------------------------------------------------

Data          sync_read    sync_write    async_read    async_write      scrub         trim         rebuild
req_size      ind    agg    ind    agg    ind    agg    ind    agg    ind    agg    ind    agg    ind    agg
----------  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----
512             0      0      0      0      0      0      0      0      0      0      0      0      0      0
1K              0      0      0      0      0      0      0      0      0      0      0      0      0      0
2K              0      0      0      0      0      0      0      0      0      0      0      0      0      0
4K              0      0      2      0      0      0    216      0      0      0      0      0      0      0
8K              0      0      0      0      0      0     30    115      0      0      0      0      0      0
16K             0      0      0      0      0      0     15     83      0      0      0      0      0      0
32K             0      0      0      0      0      0      0     90      0      0      0      0      0      0
64K             0      0      0      0      0      0      0    157      0      0      0      0      0      0
128K            0      0      0      0      0      0      0    461      0      0      0      0      0      0
256K            0      0      0      0      0      0      0      5      0      0      0      0      0      0
512K            0      0      0      0      0      0      0      0      0      0      0      0      0      0
1M              0      0      0      0      0      0      0      0      0      0      0      0      0      0
2M              0      0      0      0      0      0      0      0      0      0      0      0      0      0
4M              0      0      0      0      0      0      0      0      0      0      0      0      0      0
8M              0      0      0      0      0      0      0      0      0      0      0      0      0      0
16M             0      0      0      0      0      0      0      0      0      0      0      0      0      0
------------------------------------------------------------------------------------------------------------
PS C:\Windows\system32>

EchterAgo · 2023-11-11T08:36:52Z

To make it more clear what I mean with good / bad, here are the code diffs to latest zfs-Windows-2.2.0-release branch (eb7f88a):

Good:

diff --git a/module/os/windows/zfs/zfs_znode.c b/module/os/windows/zfs/zfs_znode.c
index f501b3ac1..eff94188f 100644
--- a/module/os/windows/zfs/zfs_znode.c
+++ b/module/os/windows/zfs/zfs_znode.c
@@ -624,9 +624,6 @@ zfs_znode_alloc(zfsvfs_t *zfsvfs, dmu_buf_t *db, int blksz,
        zp->z_zfsvfs = zfsvfs;
        mutex_exit(&zfsvfs->z_znodes_lock);

-       if (zp->z_blksz == 0)
-               zp->z_blksz = zfs_blksz(zp);
-
        return (zp);
 }

diff --git a/module/zfs/zfs_log.c b/module/zfs/zfs_log.c
index f3013ba2d..8210e3a94 100644
--- a/module/zfs/zfs_log.c
+++ b/module/zfs/zfs_log.c
@@ -646,8 +646,8 @@ zfs_log_write(zilog_t *zilog, dmu_tx_t *tx, int txtype,
        }

 #ifdef _WIN32
-       if (zp->z_zfsvfs->z_os->os_spa->spa_min_alloc > blocksize)
-               blocksize = zp->z_zfsvfs->z_os->os_spa->spa_min_alloc;
+       if (blocksize == 0)
+               blocksize = 512;
 #endif

        if (zilog->zl_logbias == ZFS_LOGBIAS_THROUGHPUT)

Bad:

diff --git a/module/zfs/zfs_log.c b/module/zfs/zfs_log.c
index f3013ba2d..70b16e373 100644
--- a/module/zfs/zfs_log.c
+++ b/module/zfs/zfs_log.c
@@ -645,11 +645,6 @@ zfs_log_write(zilog_t *zilog, dmu_tx_t *tx, int txtype,
                return;
        }

-#ifdef _WIN32
-       if (zp->z_zfsvfs->z_os->os_spa->spa_min_alloc > blocksize)
-               blocksize = zp->z_zfsvfs->z_os->os_spa->spa_min_alloc;
-#endif
-
        if (zilog->zl_logbias == ZFS_LOGBIAS_THROUGHPUT)
                write_state = WR_INDIRECT;
        else if (!spa_has_slogs(zilog->zl_spa) &&

I'm pretty sure this is not about zfs_log_write.

EchterAgo · 2023-11-11T08:38:09Z

Also, the zfs_log_write infinite loop only happened after having already copied 2.2TB, this performance issue happens pretty much directly after starting the copy, and only when preallocation is enabled.

lundman · 2023-11-11T08:40:58Z

OKOK sorry, so you are saying performance wise, you want the change in zfs_znode.c? The zfs_log_write I thought was the slow issue, and shouldnt be 0 if the new code is there, since it should set it

EchterAgo · 2023-11-11T08:43:05Z

Yes, it seems performance wise it only makes a difference if set in zfs_znode_alloc.

I also just noticed something else interesting regarding this, give me a few minutes to investigate.

EchterAgo · 2023-11-11T09:13:53Z

Ok, nevermind, I though performance recovers, but now after 30 minutes it is still slow. ~~Starts out fast, then perf slowly falls:~~

~~@ 0m:30s 50MB/s~~
~~@ 1m:00s 25MB/s~~
~~@ 1m:30s 12.5MB/s~~
~~@ 2m:00s 6MB/s (individual copy threads @ < 100KB/s, 8 threads)~~
~~@ 3m:00s stabilizes @5-6MB/s, continues like this until 30m~~

Outdated, see #318 (comment)

lundman · 2023-11-11T09:19:32Z

ok, so? go back to znode_alloc code, and just make sure blocksize is not 0?

EchterAgo · 2023-11-11T09:19:38Z

So, setting the z_blksz in zfs_znode_alloc to 512 or zfs_blksz when it is zero causes performance issue. 4096 works fast, leaving it at 0 also works fast. This is independent of the infinite loop in zfs_log_write.

This also means that the latest zfs-Windows-2.2.0-release branch has performance issues again.

EchterAgo · 2023-11-11T09:21:28Z

ok, so? go back to znode_alloc code, and just make sure blocksize is not 0?

I think it should be enough to set blocksize to a reasonable value in zfs_log_write if it is 0. No need to set z_blksz in zfs_znode_alloc.

lundman · 2023-11-11T09:22:11Z

so you are saying its enough to not-change-anything in znode? and just change zfs_log ?

EchterAgo · 2023-11-11T09:24:39Z

so you are saying its enough to not-change-anything in znode? and just change zfs_log ?

Yes

EchterAgo · 2023-11-11T09:24:58Z

diff --git a/module/os/windows/zfs/zfs_znode.c b/module/os/windows/zfs/zfs_znode.c
index f501b3ac1..eff94188f 100644
--- a/module/os/windows/zfs/zfs_znode.c
+++ b/module/os/windows/zfs/zfs_znode.c
@@ -624,9 +624,6 @@ zfs_znode_alloc(zfsvfs_t *zfsvfs, dmu_buf_t *db, int blksz,
        zp->z_zfsvfs = zfsvfs;
        mutex_exit(&zfsvfs->z_znodes_lock);

-       if (zp->z_blksz == 0)
-               zp->z_blksz = zfs_blksz(zp);
-
        return (zp);
 }

diff --git a/module/zfs/zfs_log.c b/module/zfs/zfs_log.c
index f3013ba2d..8210e3a94 100644
--- a/module/zfs/zfs_log.c
+++ b/module/zfs/zfs_log.c
@@ -646,8 +646,8 @@ zfs_log_write(zilog_t *zilog, dmu_tx_t *tx, int txtype,
        }

 #ifdef _WIN32
-       if (zp->z_zfsvfs->z_os->os_spa->spa_min_alloc > blocksize)
-               blocksize = zp->z_zfsvfs->z_os->os_spa->spa_min_alloc;
+       if (blocksize == 0)
+               blocksize = 512;
 #endif

        if (zilog->zl_logbias == ZFS_LOGBIAS_THROUGHPUT)

basically this

lundman · 2023-11-11T09:26:23Z

But, this

-       if (zp->z_zfsvfs->z_os->os_spa->spa_min_alloc > blocksize)
-               blocksize = zp->z_zfsvfs->z_os->os_spa->spa_min_alloc;

should work, if blocksize is 0, it should set it to something greater than that?

EchterAgo · 2023-11-11T09:27:03Z

That works too, but by setting it fixed you don't need spa_min_alloc, if that is a concern.

lundman · 2023-11-11T09:27:45Z

yeah, but i thought we wanted to set it to ashift, so that it is faster.

EchterAgo · 2023-11-11T09:28:46Z

My tests showed that it does not matter what we set in zfs_log_write, 512 works just as fast.

The performance difference comes only from setting z_blksz in zfs_znode_alloc, unless in my case it is set to 4096.

lundman · 2023-11-11T09:35:19Z

ok so then, check for blocksize==0, and set to say, SPA_MINBLOCKSIZE or whatever its called.

then in znode, we want to set it to ashift?

EchterAgo · 2023-11-11T09:42:24Z

Setting z_blksz in zfs_znode_alloc to 512 or zfs_blksz(zp); (which produces 512 here) makes the performance drop. Setting it to 4096 or leaving it at 0 works fast. This is independent of what zfs_log_write
Leaving z_blksz potentially zero makes zfs_log_write do an infinite loop after copying a lot of data. Guarding against this by setting to either 512, 4096 or spa_min_alloc does not decrease performance. I still don't understand why this only happens after having copied a lot of data already.

ok so then, check for blocksize==0, and set to say, SPA_MINBLOCKSIZE or whatever its called.

then in znode, we want to set it to ashift?

If we have the blocksize==0 check in zfs_log_write we don't need it in zfs_znode_alloc, so I'd just remove it there. Or we need to investigate why the performance drops when setting it wrong in zfs_znode_alloc.

EchterAgo · 2023-11-11T09:45:17Z

It also seems clear that zfs_log_write was not written with zfs_blksz==0 in mind, so maybe we should investigate why it goes to 0 in the first place. Maybe there are more assumptions like this in the code.

lundman · 2023-11-11T09:47:07Z

ok if its left as 0, then the zfs_write() tunes it up, usually by powers of 2. So we just need a 0 test in zfs_log. Understood.

What is unusual is the path it takes in zfs_log_write(). Linux etc, does not generally enter WR_INDIRECT before zfs_write(), but windows does if it preallocates. So they most likely never noticed.

EchterAgo · 2023-11-11T09:50:43Z

Oh, that makes sense. I don't really have a good grasp of the code yet.

Did you check the iostat -r I showed earlier? In the good case it gets much more ind writes, while in the bad case it gets a lot of agg writes. Both cases run rclone with preallocation though.

With preallocation disabled I never saw any performance difference.

Edit: swapped good/bad

EchterAgo · 2023-11-11T16:12:15Z

I managed to add preallocation support to the Windows fio port and it does exhibit the same issues as rclone. The patch is not in a shape to upstream yet, but build artifacts should be available here shortly: https://github.com/EchterAgo/fio/actions/runs/6835259116

I used the MSYS2 64-bit version.

You'll need a configuration like this:

[global]
ioengine=windowsaio
direct=1
iodepth=8
time_based
runtime=600
directory=H\:\testdir
filename_format=testfile_$jobname_$jobnum_$filenum.bin
bs=32k
size=16G
fallocate=native

[sequential_write]
rw=write
numjobs=8

change the directory parameter (note the escaped :) and then run fio name_of_config_file.fio.

If you change it to fallocate=none it runs fast.

EchterAgo · 2023-11-11T16:22:16Z

https://github.com/EchterAgo/fio/suites/18107906055/artifacts/1043791681

EchterAgo · 2023-11-11T16:32:02Z

I noticed that the write speed instantly drops, the gradual dropping of displayed speed in rclone seems to just be its averaging.

EchterAgo · 2023-11-11T16:34:10Z

Bad results:

Jobs: 8 (f=8): [W(8)][100.0%][w=5433KiB/s][w=169 IOPS][eta 00m:00s]
sequential_write: (groupid=0, jobs=1): err= 0: pid=5588: Sat Nov 11 23:31:24 2023
  write: IOPS=22, BW=709KiB/s (726kB/s)(416MiB/600291msec); 0 zone resets
    slat (usec): min=53, max=4437, avg=871.41, stdev=570.06
    clat (msec): min=59, max=31377, avg=359.96, stdev=1063.66
     lat (msec): min=59, max=31379, avg=360.84, stdev=1063.68
    clat percentiles (msec):
     |  1.00th=[   80],  5.00th=[  108], 10.00th=[  127], 20.00th=[  161],
     | 30.00th=[  264], 40.00th=[  300], 50.00th=[  351], 60.00th=[  368],
     | 70.00th=[  380], 80.00th=[  401], 90.00th=[  422], 95.00th=[  435],
     | 99.00th=[  489], 99.50th=[ 3339], 99.90th=[17113], 99.95th=[17113],
     | 99.99th=[17113]
   bw (  KiB/s): min=   62, max= 2629, per=14.62%, avg=830.76, stdev=383.90, samples=1010
   iops        : min=    1, max=   82, avg=25.68, stdev=11.97, samples=1010
  lat (msec)   : 100=3.62%, 250=24.24%, 500=71.39%, 750=0.19%, >=2000=0.56%
  cpu          : usr=0.00%, sys=0.33%, ctx=0, majf=0, minf=0
  IO depths    : 1=0.1%, 2=0.1%, 4=0.3%, 8=99.7%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.1%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,13307,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=8
sequential_write: (groupid=0, jobs=1): err= 0: pid=8028: Sat Nov 11 23:31:24 2023
  write: IOPS=22, BW=710KiB/s (727kB/s)(416MiB/600294msec); 0 zone resets
    slat (usec): min=40, max=5751, avg=874.60, stdev=580.65
    clat (msec): min=61, max=31441, avg=359.67, stdev=1073.03
     lat (msec): min=61, max=31442, avg=360.54, stdev=1073.05
    clat percentiles (msec):
     |  1.00th=[   80],  5.00th=[  108], 10.00th=[  126], 20.00th=[  161],
     | 30.00th=[  264], 40.00th=[  300], 50.00th=[  351], 60.00th=[  368],
     | 70.00th=[  380], 80.00th=[  401], 90.00th=[  422], 95.00th=[  439],
     | 99.00th=[  493], 99.50th=[ 2802], 99.90th=[17113], 99.95th=[17113],
     | 99.99th=[17113]
   bw (  KiB/s): min=   62, max= 2757, per=14.63%, avg=831.46, stdev=385.48, samples=1010
   iops        : min=    1, max=   86, avg=25.70, stdev=12.03, samples=1010
  lat (msec)   : 100=3.67%, 250=24.22%, 500=71.21%, 750=0.34%, >=2000=0.56%
  cpu          : usr=0.00%, sys=0.33%, ctx=0, majf=0, minf=0
  IO depths    : 1=0.1%, 2=0.1%, 4=0.2%, 8=99.8%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.1%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,13318,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=8
sequential_write: (groupid=0, jobs=1): err= 0: pid=6584: Sat Nov 11 23:31:24 2023
  write: IOPS=22, BW=710KiB/s (727kB/s)(416MiB/600300msec); 0 zone resets
    slat (usec): min=56, max=4394, avg=873.37, stdev=586.18
    clat (msec): min=54, max=31414, avg=359.89, stdev=1064.09
     lat (msec): min=56, max=31415, avg=360.76, stdev=1064.11
    clat percentiles (msec):
     |  1.00th=[   80],  5.00th=[  108], 10.00th=[  126], 20.00th=[  161],
     | 30.00th=[  262], 40.00th=[  300], 50.00th=[  351], 60.00th=[  368],
     | 70.00th=[  384], 80.00th=[  401], 90.00th=[  422], 95.00th=[  435],
     | 99.00th=[  493], 99.50th=[ 3339], 99.90th=[17113], 99.95th=[17113],
     | 99.99th=[17113]
   bw (  KiB/s): min=   62, max= 2565, per=14.62%, avg=830.89, stdev=385.37, samples=1010
   iops        : min=    1, max=   80, avg=25.68, stdev=12.02, samples=1010
  lat (msec)   : 100=3.70%, 250=24.22%, 500=71.22%, 750=0.29%, >=2000=0.56%
  cpu          : usr=0.00%, sys=0.33%, ctx=0, majf=0, minf=0
  IO depths    : 1=0.1%, 2=0.1%, 4=0.3%, 8=99.7%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.1%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,13310,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=8
sequential_write: (groupid=0, jobs=1): err= 0: pid=6324: Sat Nov 11 23:31:24 2023
  write: IOPS=22, BW=710KiB/s (727kB/s)(416MiB/600297msec); 0 zone resets
    slat (usec): min=54, max=4188, avg=873.87, stdev=583.90
    clat (msec): min=61, max=31394, avg=359.67, stdev=1065.81
     lat (msec): min=61, max=31395, avg=360.54, stdev=1065.84
    clat percentiles (msec):
     |  1.00th=[   80],  5.00th=[  107], 10.00th=[  126], 20.00th=[  161],
     | 30.00th=[  264], 40.00th=[  300], 50.00th=[  351], 60.00th=[  368],
     | 70.00th=[  384], 80.00th=[  401], 90.00th=[  422], 95.00th=[  435],
     | 99.00th=[  485], 99.50th=[ 3339], 99.90th=[17113], 99.95th=[17113],
     | 99.99th=[17113]
   bw (  KiB/s): min=   62, max= 2693, per=14.63%, avg=831.46, stdev=385.79, samples=1010
   iops        : min=    1, max=   84, avg=25.70, stdev=12.03, samples=1010
  lat (msec)   : 100=3.66%, 250=24.22%, 500=71.29%, 750=0.26%, >=2000=0.56%
  cpu          : usr=0.00%, sys=0.33%, ctx=0, majf=0, minf=0
  IO depths    : 1=0.1%, 2=0.1%, 4=0.2%, 8=99.7%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.1%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,13318,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=8
sequential_write: (groupid=0, jobs=1): err= 0: pid=5712: Sat Nov 11 23:31:24 2023
  write: IOPS=22, BW=711KiB/s (728kB/s)(417MiB/600298msec); 0 zone resets
    slat (usec): min=48, max=4943, avg=875.41, stdev=575.74
    clat (msec): min=61, max=31374, avg=359.18, stdev=1062.56
     lat (msec): min=62, max=31376, avg=360.06, stdev=1062.59
    clat percentiles (msec):
     |  1.00th=[   80],  5.00th=[  108], 10.00th=[  127], 20.00th=[  161],
     | 30.00th=[  262], 40.00th=[  300], 50.00th=[  351], 60.00th=[  368],
     | 70.00th=[  380], 80.00th=[  397], 90.00th=[  422], 95.00th=[  435],
     | 99.00th=[  489], 99.50th=[ 3339], 99.90th=[17113], 99.95th=[17113],
     | 99.99th=[17113]
   bw (  KiB/s): min=   62, max= 2629, per=14.65%, avg=832.57, stdev=385.06, samples=1010
   iops        : min=    1, max=   82, avg=25.73, stdev=12.01, samples=1010
  lat (msec)   : 100=3.72%, 250=24.24%, 500=71.22%, 750=0.25%, >=2000=0.56%
  cpu          : usr=0.00%, sys=0.33%, ctx=0, majf=0, minf=0
  IO depths    : 1=0.1%, 2=0.1%, 4=0.3%, 8=99.7%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.1%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,13336,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=8
sequential_write: (groupid=0, jobs=1): err= 0: pid=8520: Sat Nov 11 23:31:24 2023
  write: IOPS=22, BW=709KiB/s (726kB/s)(416MiB/600284msec); 0 zone resets
    slat (usec): min=46, max=5246, avg=875.78, stdev=590.37
    clat (msec): min=53, max=31429, avg=360.06, stdev=1065.23
     lat (msec): min=53, max=31431, avg=360.94, stdev=1065.25
    clat percentiles (msec):
     |  1.00th=[   80],  5.00th=[  107], 10.00th=[  127], 20.00th=[  161],
     | 30.00th=[  262], 40.00th=[  300], 50.00th=[  351], 60.00th=[  368],
     | 70.00th=[  380], 80.00th=[  401], 90.00th=[  422], 95.00th=[  435],
     | 99.00th=[  485], 99.50th=[ 3339], 99.90th=[17113], 99.95th=[17113],
     | 99.99th=[17113]
   bw (  KiB/s): min=   62, max= 2693, per=14.62%, avg=830.52, stdev=384.44, samples=1010
   iops        : min=    1, max=   84, avg=25.67, stdev=11.99, samples=1010
  lat (msec)   : 100=3.72%, 250=24.18%, 500=71.35%, 750=0.18%, >=2000=0.56%
  cpu          : usr=0.00%, sys=0.33%, ctx=0, majf=0, minf=0
  IO depths    : 1=0.1%, 2=0.1%, 4=0.3%, 8=99.7%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.1%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,13303,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=8
sequential_write: (groupid=0, jobs=1): err= 0: pid=6192: Sat Nov 11 23:31:24 2023
  write: IOPS=22, BW=709KiB/s (726kB/s)(416MiB/600290msec); 0 zone resets
    slat (usec): min=42, max=5372, avg=883.05, stdev=589.35
    clat (msec): min=57, max=31449, avg=359.95, stdev=1073.91
     lat (msec): min=58, max=31449, avg=360.84, stdev=1073.94
    clat percentiles (msec):
     |  1.00th=[   80],  5.00th=[  108], 10.00th=[  126], 20.00th=[  159],
     | 30.00th=[  264], 40.00th=[  300], 50.00th=[  351], 60.00th=[  368],
     | 70.00th=[  384], 80.00th=[  397], 90.00th=[  422], 95.00th=[  439],
     | 99.00th=[  493], 99.50th=[ 2836], 99.90th=[17113], 99.95th=[17113],
     | 99.99th=[17113]
   bw (  KiB/s): min=   62, max= 2629, per=14.62%, avg=830.76, stdev=386.67, samples=1010
   iops        : min=    1, max=   82, avg=25.68, stdev=12.06, samples=1010
  lat (msec)   : 100=3.76%, 250=24.03%, 500=71.40%, 750=0.25%, >=2000=0.56%
  cpu          : usr=0.00%, sys=0.33%, ctx=0, majf=0, minf=0
  IO depths    : 1=0.1%, 2=0.1%, 4=0.3%, 8=99.7%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.1%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,13307,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=8
sequential_write: (groupid=0, jobs=1): err= 0: pid=7048: Sat Nov 11 23:31:24 2023
  write: IOPS=22, BW=710KiB/s (727kB/s)(416MiB/600299msec); 0 zone resets
    slat (usec): min=51, max=4253, avg=875.25, stdev=582.45
    clat (msec): min=53, max=31397, avg=359.51, stdev=1071.81
     lat (msec): min=53, max=31397, avg=360.38, stdev=1071.83
    clat percentiles (msec):
     |  1.00th=[   80],  5.00th=[  108], 10.00th=[  126], 20.00th=[  161],
     | 30.00th=[  264], 40.00th=[  300], 50.00th=[  351], 60.00th=[  368],
     | 70.00th=[  380], 80.00th=[  397], 90.00th=[  418], 95.00th=[  435],
     | 99.00th=[  489], 99.50th=[ 2836], 99.90th=[17113], 99.95th=[17113],
     | 99.99th=[17113]
   bw (  KiB/s): min=  123, max= 2629, per=14.65%, avg=832.68, stdev=385.08, samples=1009
   iops        : min=    3, max=   82, avg=25.73, stdev=12.01, samples=1009
  lat (msec)   : 100=3.70%, 250=24.11%, 500=71.34%, 750=0.23%, 1000=0.06%
  lat (msec)   : >=2000=0.56%
  cpu          : usr=0.00%, sys=0.33%, ctx=0, majf=0, minf=0
  IO depths    : 1=0.1%, 2=0.1%, 4=0.3%, 8=99.7%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.1%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,13324,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=8

Run status group 0 (all jobs):
  WRITE: bw=5678KiB/s (5815kB/s), 709KiB/s-711KiB/s (726kB/s-728kB/s), io=3329MiB (3491MB), run=600284-600300msec

EchterAgo · 2023-11-11T16:47:18Z

Hmmm, very weird, fio is also slow in the case where rclone is fast???? Disabling preallocation makes fio fast again.

Now I'm wondering if rclone actually changes its IO pattern based on some kind of block size detection?

EchterAgo · 2023-11-11T16:51:39Z

I'm reopening this because I think we still need to figure this out further.

EchterAgo · 2023-11-11T17:52:21Z

rclones write pattern is the same whether I set z_blksize or not, always sequential 32KiB writes from multiple threads (8 in this case) with a preallocation size that is rounded up to the next block size. The reported file block size (IRP_MJ_QUERY_VOLUME_INFORMATION on the file) always seems to be 512, which is something we should probably fix too:

EchterAgo · 2023-11-11T18:06:19Z

Yea, so the IRP_MJ_WRITEs just take a lot longer, in the slow case a bit over 4ms for 32KiB, in the fast case 0.12ms for 32KiB

EchterAgo · 2023-11-11T18:18:11Z

Ok, so the difference with fio compared to rclone is that my current fio configuration uses direct IO (non-cached). Disabling direct IO causes fio to behave the same as rclone, including crashing / hanging when aborting.

This is the config:

[global]
ioengine=windowsaio
direct=0
iodepth=8
time_based
runtime=600
directory=D\:\testdir
filename_format=testfile_$jobname_$jobnum_$filenum.bin
bs=32k
size=16G
fallocate=native

[sequential_write]
rw=write
numjobs=8

I wonder why direct IO is always so much slower.

EchterAgo · 2023-11-11T18:20:41Z

I also noticed that rclone seems to be using slightly different write path when slow?

Fast:

Slow:

Notice the difference in path. I'm really not sure why it does this. Call stacks of both look the same:

EchterAgo · 2023-11-11T18:25:55Z

So, to summarize what I found out:

Direct IO (non-cached) is always slow.
With normal IO rclone and fio behave the same.
There is a curious difference in the path names Process Monitor shows between the slow and fast case.
It is actually the IRP_MJ_WRITE calls that take much longer, for 32KiB = 0.12ms in the fast case, >4ms in the slow case.
IRP_MJ_QUERY_VOLUME_INFORMATION always reports 512 bytes, this is something for a separate issue and unrelated.

EchterAgo · 2023-11-11T21:56:18Z

fio results:

Bad zfs-Windows-2.2.0-release (eb7f88a):

sequential_write: (groupid=0, jobs=1): err= 0: pid=10028: Sun Nov 12 04:39:08 2023
  write: IOPS=25, BW=826KiB/s (846kB/s)(485MiB/600328msec); 0 zone resets
    slat (usec): min=35, max=7279, avg=929.57, stdev=649.23
    clat (nsec): min=1300, max=11682M, avg=307850031.60, stdev=678554956.15
     lat (usec): min=136, max=11683k, avg=308779.60, stdev=678557.72
    clat percentiles (usec):
     |  1.00th=[     553],  5.00th=[    1352], 10.00th=[    2008],
     | 20.00th=[    3032], 30.00th=[    4080], 40.00th=[    5407],
     | 50.00th=[    8356], 60.00th=[   80217], 70.00th=[  202376],
     | 80.00th=[  425722], 90.00th=[ 1035994], 95.00th=[ 1568670],
     | 99.00th=[ 3372221], 99.50th=[ 3942646], 99.90th=[ 5737808],
     | 99.95th=[11609834], 99.99th=[11744052]
   bw (  KiB/s): min=   62, max=28352, per=18.30%, avg=1588.57, stdev=1685.94, samples=617
   iops        : min=    1, max=  886, avg=49.31, stdev=52.68, samples=617
  lat (usec)   : 2=0.01%, 4=0.01%, 100=0.01%, 250=0.21%, 500=0.64%
  lat (usec)   : 750=0.74%, 1000=1.28%
  lat (msec)   : 2=7.08%, 4=19.13%, 10=22.84%, 20=1.66%, 50=2.70%
  lat (msec)   : 100=5.59%, 250=10.60%, 500=9.90%, 750=4.49%, 1000=2.13%
  lat (msec)   : 2000=8.46%, >=2000=2.55%
  cpu          : usr=0.00%, sys=0.33%, ctx=0, majf=0, minf=0
  IO depths    : 1=0.1%, 2=2.8%, 4=46.4%, 8=50.8%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=97.4%, 8=2.6%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,15505,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=8
sequential_write: (groupid=0, jobs=1): err= 0: pid=10012: Sun Nov 12 04:39:08 2023
  write: IOPS=37, BW=1197KiB/s (1225kB/s)(702MiB/600546msec); 0 zone resets
    slat (usec): min=34, max=5554, avg=781.06, stdev=564.89
    clat (nsec): min=1100, max=11798M, avg=211887205.32, stdev=610879410.10
     lat (usec): min=97, max=11799k, avg=212668.26, stdev=610935.71
    clat percentiles (usec):
     |  1.00th=[     233],  5.00th=[     586], 10.00th=[     955],
     | 20.00th=[    1631], 30.00th=[    2245], 40.00th=[    2900],
     | 50.00th=[    3589], 60.00th=[    4686], 70.00th=[    8225],
     | 80.00th=[  175113], 90.00th=[  708838], 95.00th=[ 1216349],
     | 99.00th=[ 3003122], 99.50th=[ 3942646], 99.90th=[ 5804917],
     | 99.95th=[ 6006244], 99.99th=[11744052]
   bw (  KiB/s): min=   62, max=74161, per=28.14%, avg=2441.82, stdev=7804.85, samples=581
   iops        : min=    1, max= 2317, avg=75.96, stdev=243.90, samples=581
  lat (usec)   : 2=0.15%, 4=0.10%, 10=0.03%, 20=0.01%, 100=0.01%
  lat (usec)   : 250=0.80%, 500=2.76%, 750=3.39%, 1000=3.46%
  lat (msec)   : 2=15.07%, 4=28.73%, 10=16.86%, 20=0.82%, 50=1.53%
  lat (msec)   : 100=2.97%, 250=5.86%, 500=4.74%, 750=2.88%, 1000=1.55%
  lat (msec)   : 2000=6.52%, >=2000=1.74%
  cpu          : usr=0.00%, sys=0.67%, ctx=0, majf=0, minf=0
  IO depths    : 1=0.3%, 2=13.4%, 4=50.8%, 8=35.5%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=93.9%, 8=6.1%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,22455,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=8
sequential_write: (groupid=0, jobs=1): err= 0: pid=10004: Sun Nov 12 04:39:08 2023
  write: IOPS=39, BW=1255KiB/s (1285kB/s)(736MiB/600158msec); 0 zone resets
    slat (usec): min=38, max=6544, avg=770.36, stdev=551.01
    clat (nsec): min=1400, max=12411M, avg=201795987.20, stdev=591894692.07
     lat (usec): min=130, max=12411k, avg=202566.35, stdev=591953.49
    clat percentiles (usec):
     |  1.00th=[     235],  5.00th=[     594], 10.00th=[     955],
     | 20.00th=[    1614], 30.00th=[    2245], 40.00th=[    2835],
     | 50.00th=[    3523], 60.00th=[    4424], 70.00th=[    6915],
     | 80.00th=[  139461], 90.00th=[  692061], 95.00th=[ 1249903],
     | 99.00th=[ 2768241], 99.50th=[ 3774874], 99.90th=[ 5402264],
     | 99.95th=[ 5737808], 99.99th=[12415140]
   bw (  KiB/s): min=   62, max=71360, per=30.88%, avg=2679.66, stdev=8360.34, samples=555
   iops        : min=    1, max= 2230, avg=83.38, stdev=261.26, samples=555
  lat (usec)   : 2=0.21%, 4=0.11%, 10=0.01%, 20=0.01%, 50=0.01%
  lat (usec)   : 250=0.75%, 500=2.74%, 750=3.32%, 1000=3.49%
  lat (msec)   : 2=15.48%, 4=30.02%, 10=17.04%, 20=0.97%, 50=1.47%
  lat (msec)   : 100=2.71%, 250=5.42%, 500=4.36%, 750=2.37%, 1000=1.10%
  lat (msec)   : 2000=6.88%, >=2000=1.54%
  cpu          : usr=0.00%, sys=0.67%, ctx=0, majf=0, minf=0
  IO depths    : 1=0.4%, 2=14.7%, 4=51.6%, 8=33.3%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=93.3%, 8=6.7%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,23544,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=8
sequential_write: (groupid=0, jobs=1): err= 0: pid=9948: Sun Nov 12 04:39:08 2023
  write: IOPS=39, BW=1269KiB/s (1300kB/s)(744MiB/600257msec); 0 zone resets
    slat (usec): min=37, max=5571, avg=747.25, stdev=537.40
    clat (nsec): min=1500, max=9393.5M, avg=199599392.34, stdev=566584013.84
     lat (usec): min=121, max=9393.9k, avg=200346.64, stdev=566640.96
    clat percentiles (usec):
     |  1.00th=[    235],  5.00th=[    545], 10.00th=[    881],
     | 20.00th=[   1516], 30.00th=[   2114], 40.00th=[   2704],
     | 50.00th=[   3359], 60.00th=[   4228], 70.00th=[   6587],
     | 80.00th=[ 133694], 90.00th=[ 717226], 95.00th=[1266680],
     | 99.00th=[2701132], 99.50th=[3472884], 99.90th=[5469373],
     | 99.95th=[6408897], 99.99th=[9328133]
   bw (  KiB/s): min=   62, max=72640, per=31.42%, avg=2726.08, stdev=8767.12, samples=552
   iops        : min=    1, max= 2270, avg=84.86, stdev=273.99, samples=552
  lat (usec)   : 2=0.13%, 4=0.11%, 10=0.02%, 100=0.01%, 250=0.82%
  lat (usec)   : 500=3.22%, 750=3.60%, 1000=3.93%
  lat (msec)   : 2=16.09%, 4=30.01%, 10=15.76%, 20=0.71%, 50=1.45%
  lat (msec)   : 100=2.53%, 250=5.85%, 500=3.80%, 750=2.07%, 1000=1.48%
  lat (msec)   : 2000=6.81%, >=2000=1.60%
  cpu          : usr=0.00%, sys=0.67%, ctx=0, majf=0, minf=0
  IO depths    : 1=0.3%, 2=15.1%, 4=51.4%, 8=33.3%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=93.3%, 8=6.7%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,23813,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=8
sequential_write: (groupid=0, jobs=1): err= 0: pid=9976: Sun Nov 12 04:39:08 2023
  write: IOPS=40, BW=1290KiB/s (1321kB/s)(757MiB/600813msec); 0 zone resets
    slat (usec): min=41, max=5376, avg=765.91, stdev=551.19
    clat (nsec): min=1200, max=13163M, avg=196334710.01, stdev=593207893.35
     lat (usec): min=116, max=13166k, avg=197100.62, stdev=593258.20
    clat percentiles (usec):
     |  1.00th=[     235],  5.00th=[     570], 10.00th=[     898],
     | 20.00th=[    1516], 30.00th=[    2114], 40.00th=[    2737],
     | 50.00th=[    3392], 60.00th=[    4293], 70.00th=[    6521],
     | 80.00th=[  132645], 90.00th=[  616563], 95.00th=[ 1249903],
     | 99.00th=[ 2231370], 99.50th=[ 3607102], 99.90th=[ 6006244],
     | 99.95th=[ 6274679], 99.99th=[13220447]
   bw (  KiB/s): min=   62, max=71510, per=30.78%, avg=2670.03, stdev=8457.94, samples=573
   iops        : min=    1, max= 2234, avg=83.08, stdev=264.32, samples=573
  lat (usec)   : 2=0.15%, 4=0.08%, 10=0.01%, 50=0.01%, 100=0.01%
  lat (usec)   : 250=0.85%, 500=2.91%, 750=3.81%, 1000=3.77%
  lat (msec)   : 2=16.33%, 4=29.23%, 10=16.96%, 20=0.95%, 50=1.13%
  lat (msec)   : 100=2.69%, 250=5.14%, 500=4.48%, 750=2.51%, 1000=1.05%
  lat (msec)   : 2000=6.30%, >=2000=1.66%
  cpu          : usr=0.00%, sys=0.67%, ctx=0, majf=0, minf=0
  IO depths    : 1=0.2%, 2=14.6%, 4=50.8%, 8=34.4%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=93.6%, 8=6.4%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,24224,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=8
sequential_write: (groupid=0, jobs=1): err= 0: pid=9936: Sun Nov 12 04:39:08 2023
  write: IOPS=30, BW=971KiB/s (995kB/s)(569MiB/600258msec); 0 zone resets
    slat (usec): min=43, max=5535, avg=831.52, stdev=603.84
    clat (nsec): min=1300, max=10806M, avg=261601082.24, stdev=649005894.65
     lat (usec): min=157, max=10806k, avg=262432.61, stdev=649042.97
    clat percentiles (usec):
     |  1.00th=[     281],  5.00th=[     725], 10.00th=[    1188],
     | 20.00th=[    1991], 30.00th=[    2704], 40.00th=[    3523],
     | 50.00th=[    4686], 60.00th=[    7177], 70.00th=[  100140],
     | 80.00th=[  316670], 90.00th=[ 1035994], 95.00th=[ 1451230],
     | 99.00th=[ 2902459], 99.50th=[ 4009755], 99.90th=[ 6610224],
     | 99.95th=[ 6811550], 99.99th=[10804528]
   bw (  KiB/s): min=   62, max=70912, per=22.59%, avg=1960.99, stdev=5916.94, samples=588
   iops        : min=    1, max= 2216, avg=60.94, stdev=184.91, samples=588
  lat (usec)   : 2=0.10%, 4=0.04%, 20=0.01%, 50=0.01%, 250=0.67%
  lat (usec)   : 500=2.10%, 750=2.33%, 1000=2.71%
  lat (msec)   : 2=12.07%, 4=24.67%, 10=18.86%, 20=1.10%, 50=1.72%
  lat (msec)   : 100=3.62%, 250=7.78%, 500=6.88%, 750=2.85%, 1000=1.65%
  lat (msec)   : 2000=9.12%, >=2000=1.71%
  cpu          : usr=0.00%, sys=0.33%, ctx=0, majf=0, minf=0
  IO depths    : 1=0.2%, 2=9.4%, 4=49.2%, 8=41.2%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=95.3%, 8=4.7%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,18219,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=8
sequential_write: (groupid=0, jobs=1): err= 0: pid=10008: Sun Nov 12 04:39:08 2023
  write: IOPS=30, BW=960KiB/s (983kB/s)(563MiB/600257msec); 0 zone resets
    slat (usec): min=42, max=6136, avg=846.13, stdev=607.05
    clat (nsec): min=1100, max=10650M, avg=264571256.47, stdev=650836973.18
     lat (usec): min=107, max=10650k, avg=265417.39, stdev=650862.27
    clat percentiles (usec):
     |  1.00th=[     293],  5.00th=[     717], 10.00th=[    1188],
     | 20.00th=[    1975], 30.00th=[    2737], 40.00th=[    3523],
     | 50.00th=[    4686], 60.00th=[    6980], 70.00th=[   99091],
     | 80.00th=[  341836], 90.00th=[ 1019216], 95.00th=[ 1417675],
     | 99.00th=[ 3003122], 99.50th=[ 4278191], 99.90th=[ 5804917],
     | 99.95th=[ 6073353], 99.99th=[10670310]
   bw (  KiB/s): min=   62, max=72583, per=22.22%, avg=1928.89, stdev=5898.11, samples=591
   iops        : min=    1, max= 2268, avg=59.93, stdev=184.33, samples=591
  lat (usec)   : 2=0.10%, 4=0.03%, 250=0.64%, 500=2.05%, 750=2.50%
  lat (usec)   : 1000=2.70%
  lat (msec)   : 2=12.25%, 4=24.59%, 10=19.23%, 20=0.99%, 50=1.34%
  lat (msec)   : 100=3.64%, 250=6.90%, 500=7.01%, 750=3.79%, 1000=1.85%
  lat (msec)   : 2000=8.42%, >=2000=1.98%
  cpu          : usr=0.00%, sys=0.50%, ctx=0, majf=0, minf=0
  IO depths    : 1=0.1%, 2=9.7%, 4=49.3%, 8=40.8%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=95.2%, 8=4.8%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,18013,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=8
sequential_write: (groupid=0, jobs=1): err= 0: pid=10016: Sun Nov 12 04:39:08 2023
  write: IOPS=28, BW=912KiB/s (934kB/s)(535MiB/600259msec); 0 zone resets
    slat (usec): min=40, max=5853, avg=874.99, stdev=627.20
    clat (nsec): min=1100, max=12888M, avg=278624238.24, stdev=672448636.08
     lat (usec): min=245, max=12889k, avg=279499.23, stdev=672471.05
    clat percentiles (usec):
     |  1.00th=[     334],  5.00th=[     873], 10.00th=[    1434],
     | 20.00th=[    2311], 30.00th=[    3130], 40.00th=[    4146],
     | 50.00th=[    5800], 60.00th=[   14615], 70.00th=[  154141],
     | 80.00th=[  375391], 90.00th=[  985662], 95.00th=[ 1551893],
     | 99.00th=[ 3238003], 99.50th=[ 3741320], 99.90th=[ 5872026],
     | 99.95th=[ 6006244], 99.99th=[12952011]
   bw (  KiB/s): min=   62, max=71936, per=20.61%, avg=1788.46, stdev=4683.82, samples=606
   iops        : min=    1, max= 2248, avg=55.53, stdev=146.39, samples=606
  lat (usec)   : 2=0.09%, 4=0.02%, 10=0.01%, 20=0.01%, 250=0.41%
  lat (usec)   : 500=1.62%, 750=1.85%, 1000=2.00%
  lat (msec)   : 2=10.31%, 4=22.62%, 10=19.90%, 20=1.41%, 50=1.88%
  lat (msec)   : 100=4.10%, 250=9.11%, 500=8.76%, 750=4.31%, 1000=1.65%
  lat (msec)   : 2000=7.38%, >=2000=2.56%
  cpu          : usr=0.00%, sys=0.33%, ctx=0, majf=0, minf=0
  IO depths    : 1=0.1%, 2=6.8%, 4=47.2%, 8=45.9%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=96.2%, 8=3.8%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,17116,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=8

Run status group 0 (all jobs):
  WRITE: bw=8676KiB/s (8884kB/s), 826KiB/s-1290KiB/s (846kB/s-1321kB/s), io=5090MiB (5338MB), run=600158-600813msec

Good:

diff --git a/module/os/windows/zfs/zfs_znode.c b/module/os/windows/zfs/zfs_znode.c
index f501b3ac1..eff94188f 100644
--- a/module/os/windows/zfs/zfs_znode.c
+++ b/module/os/windows/zfs/zfs_znode.c
@@ -624,9 +624,6 @@ zfs_znode_alloc(zfsvfs_t *zfsvfs, dmu_buf_t *db, int blksz,
        zp->z_zfsvfs = zfsvfs;
        mutex_exit(&zfsvfs->z_znodes_lock);

-       if (zp->z_blksz == 0)
-               zp->z_blksz = zfs_blksz(zp);
-
        return (zp);
 }

sequential_write: (groupid=0, jobs=1): err= 0: pid=6380: Sun Nov 12 04:53:38 2023
  write: IOPS=1139, BW=35.6MiB/s (37.3MB/s)(20.9GiB/600015msec); 0 zone resets
    slat (usec): min=27, max=1618.5k, avg=456.55, stdev=4595.92
    clat (nsec): min=700, max=1625.9M, avg=5272008.86, stdev=24928883.42
     lat (usec): min=59, max=1626.1k, avg=5728.56, stdev=25345.08
    clat percentiles (usec):
     |  1.00th=[    99],  5.00th=[   249], 10.00th=[   388], 20.00th=[   635],
     | 30.00th=[   906], 40.00th=[  1221], 50.00th=[  1582], 60.00th=[  1991],
     | 70.00th=[  2507], 80.00th=[  3195], 90.00th=[  4555], 95.00th=[  9110],
     | 99.00th=[128451], 99.50th=[175113], 99.90th=[316670], 99.95th=[358613],
     | 99.99th=[526386]
   bw (  KiB/s): min=  128, max=114020, per=12.56%, avg=36829.87, stdev=27553.48, samples=1172
   iops        : min=    4, max= 3563, avg=1150.53, stdev=861.00, samples=1172
  lat (nsec)   : 750=0.01%, 1000=0.01%
  lat (usec)   : 2=0.29%, 4=0.13%, 10=0.01%, 20=0.01%, 50=0.01%
  lat (usec)   : 100=0.57%, 250=4.05%, 500=9.41%, 750=9.88%, 1000=8.74%
  lat (msec)   : 2=27.18%, 4=26.94%, 10=8.06%, 20=1.65%, 50=1.23%
  lat (msec)   : 100=0.32%, 250=1.31%, 500=0.19%, 750=0.01%, 1000=0.01%
  lat (msec)   : 2000=0.01%
  cpu          : usr=0.50%, sys=18.50%, ctx=0, majf=0, minf=0
  IO depths    : 1=0.5%, 2=23.5%, 4=56.8%, 8=19.3%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=89.2%, 8=10.8%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,683421,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=8
sequential_write: (groupid=0, jobs=1): err= 0: pid=8360: Sun Nov 12 04:53:38 2023
  write: IOPS=1148, BW=35.9MiB/s (37.6MB/s)(21.0GiB/600033msec); 0 zone resets
    slat (usec): min=29, max=1618.6k, avg=454.85, stdev=4799.65
    clat (nsec): min=600, max=1620.8M, avg=5212781.94, stdev=24071738.20
     lat (usec): min=53, max=1621.2k, avg=5667.64, stdev=24542.01
    clat percentiles (usec):
     |  1.00th=[    98],  5.00th=[   241], 10.00th=[   379], 20.00th=[   619],
     | 30.00th=[   881], 40.00th=[  1205], 50.00th=[  1565], 60.00th=[  1975],
     | 70.00th=[  2474], 80.00th=[  3195], 90.00th=[  4555], 95.00th=[  8979],
     | 99.00th=[128451], 99.50th=[173016], 99.90th=[308282], 99.95th=[333448],
     | 99.99th=[633340]
   bw (  KiB/s): min=  128, max=127313, per=12.66%, avg=37122.23, stdev=27642.35, samples=1173
   iops        : min=    4, max= 3978, avg=1159.65, stdev=863.77, samples=1173
  lat (nsec)   : 750=0.01%, 1000=0.01%
  lat (usec)   : 2=0.28%, 4=0.12%, 10=0.01%, 20=0.01%, 50=0.01%
  lat (usec)   : 100=0.62%, 250=4.26%, 500=9.66%, 750=10.17%, 1000=8.78%
  lat (msec)   : 2=26.76%, 4=26.64%, 10=7.99%, 20=1.65%, 50=1.20%
  lat (msec)   : 100=0.34%, 250=1.33%, 500=0.17%, 750=0.01%, 1000=0.01%
  lat (msec)   : 2000=0.01%
  cpu          : usr=0.50%, sys=18.83%, ctx=0, majf=0, minf=0
  IO depths    : 1=0.5%, 2=23.7%, 4=57.0%, 8=18.9%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=89.1%, 8=10.9%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,689243,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=8
sequential_write: (groupid=0, jobs=1): err= 0: pid=8356: Sun Nov 12 04:53:38 2023
  write: IOPS=1135, BW=35.5MiB/s (37.2MB/s)(20.8GiB/600004msec); 0 zone resets
    slat (usec): min=28, max=1618.6k, avg=471.20, stdev=5349.31
    clat (nsec): min=900, max=1620.6M, avg=5242952.86, stdev=24132216.73
     lat (usec): min=66, max=1620.9k, avg=5714.15, stdev=24712.24
    clat percentiles (usec):
     |  1.00th=[   102],  5.00th=[   255], 10.00th=[   400], 20.00th=[   660],
     | 30.00th=[   938], 40.00th=[  1254], 50.00th=[  1614], 60.00th=[  2024],
     | 70.00th=[  2540], 80.00th=[  3261], 90.00th=[  4686], 95.00th=[  9372],
     | 99.00th=[125305], 99.50th=[170918], 99.90th=[308282], 99.95th=[333448],
     | 99.99th=[557843]
   bw (  KiB/s): min=  500, max=135759, per=12.51%, avg=36703.27, stdev=26848.21, samples=1172
   iops        : min=   15, max= 4242, avg=1146.55, stdev=838.97, samples=1172
  lat (nsec)   : 1000=0.01%
  lat (usec)   : 2=0.29%, 4=0.16%, 10=0.01%, 20=0.01%, 50=0.01%
  lat (usec)   : 100=0.48%, 250=3.87%, 500=8.93%, 750=9.66%, 1000=8.77%
  lat (msec)   : 2=27.43%, 4=27.16%, 10=8.42%, 20=1.67%, 50=1.33%
  lat (msec)   : 100=0.36%, 250=1.27%, 500=0.17%, 750=0.01%, 1000=0.01%
  lat (msec)   : 2000=0.01%
  cpu          : usr=0.50%, sys=19.00%, ctx=0, majf=0, minf=0
  IO depths    : 1=0.5%, 2=23.2%, 4=56.6%, 8=19.6%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=89.4%, 8=10.6%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,681247,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=8
sequential_write: (groupid=0, jobs=1): err= 0: pid=6252: Sun Nov 12 04:53:38 2023
  write: IOPS=1163, BW=36.4MiB/s (38.1MB/s)(21.3GiB/600005msec); 0 zone resets
    slat (usec): min=28, max=1619.7k, avg=448.13, stdev=4857.41
    clat (nsec): min=900, max=1621.9M, avg=5150571.87, stdev=23767175.05
     lat (usec): min=52, max=1622.1k, avg=5598.70, stdev=24254.03
    clat percentiles (usec):
     |  1.00th=[    93],  5.00th=[   233], 10.00th=[   367], 20.00th=[   611],
     | 30.00th=[   873], 40.00th=[  1188], 50.00th=[  1549], 60.00th=[  1958],
     | 70.00th=[  2474], 80.00th=[  3163], 90.00th=[  4555], 95.00th=[  9372],
     | 99.00th=[125305], 99.50th=[168821], 99.90th=[299893], 99.95th=[333448],
     | 99.99th=[633340]
   bw (  KiB/s): min=  192, max=238492, per=12.83%, avg=37615.90, stdev=29260.82, samples=1173
   iops        : min=    6, max= 7452, avg=1175.09, stdev=914.35, samples=1173
  lat (nsec)   : 1000=0.01%
  lat (usec)   : 2=0.25%, 4=0.22%, 10=0.01%, 20=0.01%, 50=0.01%
  lat (usec)   : 100=0.69%, 250=4.40%, 500=9.82%, 750=10.14%, 1000=8.75%
  lat (msec)   : 2=26.79%, 4=26.33%, 10=7.79%, 20=1.72%, 50=1.30%
  lat (msec)   : 100=0.31%, 250=1.29%, 500=0.17%, 750=0.01%, 1000=0.01%
  lat (msec)   : 2000=0.01%
  cpu          : usr=0.50%, sys=18.67%, ctx=0, majf=0, minf=0
  IO depths    : 1=0.5%, 2=23.5%, 4=56.8%, 8=19.2%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=89.2%, 8=10.8%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,698194,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=8
sequential_write: (groupid=0, jobs=1): err= 0: pid=8388: Sun Nov 12 04:53:38 2023
  write: IOPS=1153, BW=36.1MiB/s (37.8MB/s)(21.1GiB/600003msec); 0 zone resets
    slat (usec): min=28, max=1618.3k, avg=452.19, stdev=4883.17
    clat (nsec): min=600, max=1623.7M, avg=5221157.53, stdev=24483382.89
     lat (usec): min=66, max=1624.1k, avg=5673.35, stdev=24962.35
    clat percentiles (usec):
     |  1.00th=[    96],  5.00th=[   239], 10.00th=[   375], 20.00th=[   619],
     | 30.00th=[   881], 40.00th=[  1188], 50.00th=[  1549], 60.00th=[  1958],
     | 70.00th=[  2474], 80.00th=[  3163], 90.00th=[  4490], 95.00th=[  8979],
     | 99.00th=[128451], 99.50th=[173016], 99.90th=[308282], 99.95th=[333448],
     | 99.99th=[641729]
   bw (  KiB/s): min=  498, max=247953, per=12.71%, avg=37264.06, stdev=28461.24, samples=1173
   iops        : min=   15, max= 7748, avg=1164.10, stdev=889.38, samples=1173
  lat (nsec)   : 750=0.01%, 1000=0.01%
  lat (usec)   : 2=0.26%, 4=0.15%, 10=0.01%, 20=0.01%, 50=0.01%
  lat (usec)   : 100=0.66%, 250=4.30%, 500=9.74%, 750=10.10%, 1000=8.72%
  lat (msec)   : 2=26.95%, 4=26.59%, 10=7.83%, 20=1.59%, 50=1.24%
  lat (msec)   : 100=0.33%, 250=1.31%, 500=0.18%, 750=0.01%, 1000=0.01%
  lat (msec)   : 2000=0.01%
  cpu          : usr=0.50%, sys=19.17%, ctx=0, majf=0, minf=0
  IO depths    : 1=0.5%, 2=23.7%, 4=56.9%, 8=19.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=89.2%, 8=10.8%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,692245,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=8
sequential_write: (groupid=0, jobs=1): err= 0: pid=8320: Sun Nov 12 04:53:38 2023
  write: IOPS=1150, BW=35.9MiB/s (37.7MB/s)(21.1GiB/600003msec); 0 zone resets
    slat (usec): min=28, max=1618.6k, avg=445.47, stdev=4232.55
    clat (nsec): min=800, max=1630.6M, avg=5279443.64, stdev=24975772.09
     lat (usec): min=70, max=1630.7k, avg=5724.92, stdev=25328.10
    clat percentiles (usec):
     |  1.00th=[    99],  5.00th=[   241], 10.00th=[   379], 20.00th=[   627],
     | 30.00th=[   889], 40.00th=[  1205], 50.00th=[  1549], 60.00th=[  1958],
     | 70.00th=[  2474], 80.00th=[  3163], 90.00th=[  4555], 95.00th=[  9241],
     | 99.00th=[126354], 99.50th=[168821], 99.90th=[316670], 99.95th=[350225],
     | 99.99th=[666895]
   bw (  KiB/s): min=  499, max=162171, per=12.65%, avg=37107.40, stdev=28312.63, samples=1174
   iops        : min=   15, max= 5067, avg=1159.20, stdev=884.72, samples=1174
  lat (nsec)   : 1000=0.01%
  lat (usec)   : 2=0.28%, 4=0.11%, 10=0.01%, 20=0.01%, 50=0.01%
  lat (usec)   : 100=0.60%, 250=4.28%, 500=9.63%, 750=10.15%, 1000=8.76%
  lat (msec)   : 2=27.04%, 4=26.52%, 10=7.81%, 20=1.59%, 50=1.35%
  lat (msec)   : 100=0.35%, 250=1.28%, 500=0.19%, 750=0.01%, 1000=0.01%
  lat (msec)   : 2000=0.01%
  cpu          : usr=0.67%, sys=19.00%, ctx=0, majf=0, minf=0
  IO depths    : 1=0.5%, 2=23.6%, 4=56.9%, 8=19.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=89.2%, 8=10.8%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,690008,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=8
sequential_write: (groupid=0, jobs=1): err= 0: pid=9700: Sun Nov 12 04:53:38 2023
  write: IOPS=1136, BW=35.5MiB/s (37.2MB/s)(20.8GiB/600011msec); 0 zone resets
    slat (usec): min=28, max=1619.6k, avg=465.10, stdev=5020.55
    clat (nsec): min=600, max=1620.7M, avg=5283678.99, stdev=24997035.41
     lat (usec): min=57, max=1623.4k, avg=5748.77, stdev=25489.80
    clat percentiles (usec):
     |  1.00th=[   103],  5.00th=[   255], 10.00th=[   400], 20.00th=[   660],
     | 30.00th=[   930], 40.00th=[  1254], 50.00th=[  1598], 60.00th=[  2008],
     | 70.00th=[  2540], 80.00th=[  3228], 90.00th=[  4621], 95.00th=[  9503],
     | 99.00th=[127402], 99.50th=[173016], 99.90th=[308282], 99.95th=[337642],
     | 99.99th=[641729]
   bw (  KiB/s): min=  256, max=104773, per=12.54%, avg=36765.79, stdev=27041.13, samples=1172
   iops        : min=    8, max= 3274, avg=1148.52, stdev=844.99, samples=1172
  lat (nsec)   : 750=0.01%, 1000=0.01%
  lat (usec)   : 2=0.29%, 4=0.13%, 10=0.01%, 20=0.01%, 50=0.01%
  lat (usec)   : 100=0.49%, 250=3.90%, 500=8.97%, 750=9.78%, 1000=8.74%
  lat (msec)   : 2=27.33%, 4=27.17%, 10=8.33%, 20=1.74%, 50=1.28%
  lat (msec)   : 100=0.37%, 250=1.23%, 500=0.19%, 750=0.01%, 1000=0.01%
  lat (msec)   : 2000=0.01%
  cpu          : usr=0.50%, sys=19.00%, ctx=0, majf=0, minf=0
  IO depths    : 1=0.5%, 2=23.4%, 4=56.7%, 8=19.5%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=89.3%, 8=10.7%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,682043,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=8
sequential_write: (groupid=0, jobs=1): err= 0: pid=10012: Sun Nov 12 04:53:38 2023
  write: IOPS=1139, BW=35.6MiB/s (37.3MB/s)(20.9GiB/600136msec); 0 zone resets
    slat (usec): min=29, max=1618.5k, avg=459.64, stdev=4991.50
    clat (nsec): min=500, max=1620.3M, avg=5263147.47, stdev=24610201.30
     lat (usec): min=60, max=1620.5k, avg=5722.78, stdev=25105.81
    clat percentiles (usec):
     |  1.00th=[   102],  5.00th=[   255], 10.00th=[   396], 20.00th=[   652],
     | 30.00th=[   922], 40.00th=[  1237], 50.00th=[  1598], 60.00th=[  2008],
     | 70.00th=[  2507], 80.00th=[  3195], 90.00th=[  4555], 95.00th=[  9110],
     | 99.00th=[128451], 99.50th=[175113], 99.90th=[312476], 99.95th=[341836],
     | 99.99th=[534774]
   bw (  KiB/s): min=  192, max=119346, per=12.57%, avg=36866.39, stdev=27250.18, samples=1172
   iops        : min=    6, max= 3729, avg=1151.67, stdev=851.52, samples=1172
  lat (nsec)   : 750=0.01%, 1000=0.01%
  lat (usec)   : 2=0.29%, 4=0.14%, 10=0.01%, 20=0.01%, 50=0.01%
  lat (usec)   : 100=0.51%, 250=3.91%, 500=9.09%, 750=9.80%, 1000=8.76%
  lat (msec)   : 2=27.36%, 4=27.49%, 10=7.88%, 20=1.67%, 50=1.23%
  lat (msec)   : 100=0.34%, 250=1.31%, 500=0.17%, 750=0.01%, 1000=0.01%
  lat (msec)   : 2000=0.01%
  cpu          : usr=0.67%, sys=19.00%, ctx=0, majf=0, minf=0
  IO depths    : 1=0.5%, 2=23.3%, 4=56.6%, 8=19.6%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=89.3%, 8=10.7%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,684054,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=8

Run status group 0 (all jobs):
  WRITE: bw=286MiB/s (300MB/s), 35.5MiB/s-36.4MiB/s (37.2MB/s-38.1MB/s), io=168GiB (180GB), run=600003-600136msec

lundman · 2023-11-16T05:51:07Z

OK, if I have understood things, I cleaned up the code in both places. If we are happy with it, we can close this.

Sorry for the delay, wanted to push out the macOS version

EchterAgo · 2023-11-16T06:44:07Z

The code you pushed is pretty much the same as I've been testing the last few days so I'm going to close.

EchterAgo mentioned this issue Nov 6, 2023

Possible memory leak "ExAllocatePoolWithTag failed" in spl-seg_kmem.c, line 134 #283

Closed

EchterAgo changed the title ~~Performance regression during rclone copy~~ Performance regression during rclone copy with preallocation enabled Nov 6, 2023

EchterAgo closed this as completed Nov 10, 2023

EchterAgo reopened this Nov 11, 2023

EchterAgo closed this as completed Nov 11, 2023

EchterAgo reopened this Nov 11, 2023

EchterAgo closed this as completed Nov 16, 2023

lundman mentioned this issue Jan 24, 2024

Copying files to a dataset and then verifying shows mismatches #227

Open

Performance regression during rclone copy with preallocation enabled #318

Performance regression during rclone copy with preallocation enabled #318

Comments

EchterAgo commented Nov 6, 2023 • edited Loading

EchterAgo commented Nov 6, 2023

lundman commented Nov 6, 2023

EchterAgo commented Nov 6, 2023 • edited Loading

EchterAgo commented Nov 6, 2023

EchterAgo commented Nov 6, 2023

EchterAgo commented Nov 6, 2023

EchterAgo commented Nov 6, 2023

lundman commented Nov 6, 2023

lundman commented Nov 7, 2023

EchterAgo commented Nov 7, 2023

lundman commented Nov 7, 2023

EchterAgo commented Nov 7, 2023

lundman commented Nov 7, 2023

lundman commented Nov 8, 2023

EchterAgo commented Nov 10, 2023

lundman commented Nov 10, 2023

EchterAgo commented Nov 11, 2023

EchterAgo commented Nov 11, 2023

EchterAgo commented Nov 11, 2023

lundman commented Nov 11, 2023

EchterAgo commented Nov 11, 2023

EchterAgo commented Nov 11, 2023 • edited Loading

lundman commented Nov 11, 2023

EchterAgo commented Nov 11, 2023 • edited Loading

EchterAgo commented Nov 11, 2023

lundman commented Nov 11, 2023

EchterAgo commented Nov 11, 2023

EchterAgo commented Nov 11, 2023

lundman commented Nov 11, 2023

EchterAgo commented Nov 11, 2023

lundman commented Nov 11, 2023

EchterAgo commented Nov 11, 2023

lundman commented Nov 11, 2023

EchterAgo commented Nov 11, 2023 • edited Loading

EchterAgo commented Nov 11, 2023

lundman commented Nov 11, 2023

EchterAgo commented Nov 11, 2023 • edited Loading

EchterAgo commented Nov 11, 2023 • edited Loading

EchterAgo commented Nov 11, 2023

EchterAgo commented Nov 11, 2023

EchterAgo commented Nov 11, 2023

EchterAgo commented Nov 11, 2023 • edited Loading

EchterAgo commented Nov 11, 2023

EchterAgo commented Nov 11, 2023 • edited Loading

EchterAgo commented Nov 11, 2023

EchterAgo commented Nov 11, 2023 • edited Loading

EchterAgo commented Nov 11, 2023 • edited Loading

EchterAgo commented Nov 11, 2023 • edited Loading

EchterAgo commented Nov 11, 2023

lundman commented Nov 16, 2023

EchterAgo commented Nov 16, 2023

EchterAgo commented Nov 6, 2023 •

edited

Loading

EchterAgo commented Nov 6, 2023 •

edited

Loading

EchterAgo commented Nov 11, 2023 •

edited

Loading

EchterAgo commented Nov 11, 2023 •

edited

Loading

EchterAgo commented Nov 11, 2023 •

edited

Loading

EchterAgo commented Nov 11, 2023 •

edited

Loading

EchterAgo commented Nov 11, 2023 •

edited

Loading

EchterAgo commented Nov 11, 2023 •

edited

Loading

EchterAgo commented Nov 11, 2023 •

edited

Loading

EchterAgo commented Nov 11, 2023 •

edited

Loading

EchterAgo commented Nov 11, 2023 •

edited

Loading

EchterAgo commented Nov 11, 2023 •

edited

Loading