Copying files to a dataset and then verifying shows mismatches #227

EchterAgo · 2023-06-09T08:11:33Z

I'm using rclone sync to copy ~5TB of data to a ZFS dataset created with these options:

zpool create -o ashift=12 -O atime=off -O compression=lz4 -O recordsize=1MB -O casesensitivity=insensitive -O normalization=formD Data raidz1 PHYSICALDRIVE0 PHYSICALDRIVE1 PHYSICALDRIVE4 PHYSICALDRIVE5 cache PHYSICALDRIVE2 PHYSICALDRIVE3

After the copy is done, I use rclone sync -c to fully verify the contents of the copied files and it always comes up with differences. So far I've only seen it with bigger files, mostly with a 150GB disk image file, but also with some ~30GB image files.

I used HxD to compare the files and it turns out there are sections filled with zeroes instead of the original data.

I'll try to diagnose this further.

The text was updated successfully, but these errors were encountered:

EchterAgo · 2023-06-09T08:30:12Z

The hole in the file seems to start and end at a 4 KiB aligned offset and is 32 KiB in size. The data after the hole is identical again. I'm letting it continue scanning to see if there are further differences.

EchterAgo · 2023-06-09T08:35:06Z

The harddrives have

LogicalSectorSize  : 512
PhysicalSectorSize : 4096

and the cache SSDs have

LogicalSectorSize  : 512
PhysicalSectorSize : 512

EchterAgo · 2023-06-09T08:42:59Z

The corruption also remains when I reread the data.

EchterAgo · 2023-06-09T10:52:39Z

I checked another file and the corruption start and end are again 4 KiB aligned but this time it is larger, 49152 bytes (48 KiB)

lundman · 2023-06-10T01:43:48Z

Thanks for looking into this, I'll make it higher priority since we definitely don't want any corruption. If it's sections of nulls, i wonder if it is a seek/hole issue. I will dig deeper. Is there a relatively easily to make it happen, like close to a one-liner?

EchterAgo · 2023-06-10T07:28:39Z

I haven't found an easy way to reproduce yet, but it does keep happening when I transfer a big amount of data. I'm currently trying to see if any rclone options help, like --local-no-preallocate and --local-no-sparse. I'll also check different dataset options, maybe disable compression. It takes a while to test due to the big data size and the low amount of data actually zeroed, in ~5TB of data only 80 KiB were actually bad. I will report back here when I have more information.

I also checked for any errors in logs, but could not find anything, SMART values look good. Both systems I copy to/from have ECC memory, the source system is a Linux system running ZFS. Previous attempts to copy the same data to either NTFS / ReFS worked without issue. I'll also try to get some test running locally that writes random data from a fixed seed and compares the result.

EchterAgo · 2023-06-10T07:30:16Z

I also tried copying data with Windows Explorer and haven't had any corruption yet. Windows Explorer copies files sequentially in a single thread, rclone copies multiple files simultaneously and can even use multiple streams for single files.

sskras · 2023-06-10T10:01:54Z

@EchterAgo commented 2 hours ago:

Both systems I copy to/from have ECC memory, the source system is a Linux system running ZFS. Previous attempts to copy the same data to either NTFS / ReFS worked without issue.

It would be nice to check if it can be reproduced by Linux or FreeBSD implementations of OpenZFS. At least Linux one contains some hard to reproduce bugs too. Eg: openzfs#12014

EchterAgo · 2023-06-11T01:42:43Z

I'm using the Linux ZFS port for a long time already and I've never encountered corruption like this. The pool options are almost the same, except for case sensitive, RAIDZ2 instead of RAIDZ1 and only one cache device. I also use rclone a lot to copy to the Linux pool and I do sanity check the data, no problems so far.

Yesterday I used Windows Explorer to copy the full 5TB to the Windows pool and it seems everything copied correctly. I also tried rclone with disabled preallocation and disabled sparse files and the corruption remained. I'm not sure I specified the options correctly for the remote though, so I'll retry this now.

lundman · 2023-06-14T00:19:50Z

I've given up on #223 for now, after 5 weeks, I need a break from it.

Sounds like I should try out rclone on Windows and see if i can figure out what goes wrong. Maybe it is something in seek/write/holes that goes wrong.

sskras · 2023-06-14T03:27:44Z

Thanks @lundman. Wow, you're cool digger! Take rest and take care. :)

And thanks for referencing the FileSpy tool, which I haven't heard before. It's a bit sad it isn't an open source thing, but it's impressive anyway! Cool. :)

EchterAgo · 2023-06-14T09:41:24Z

I've given up on #223 for now, after 5 weeks, I need a break from it.

Sounds like I should try out rclone on Windows and see if i can figure out what goes wrong. Maybe it is something in seek/write/holes that goes wrong.

I have also done more testing, none of --multi-thread-streams 0, --local-no-sparse, --local-no-preallocate and --transfers 1 make rclone copy correctly and I'm sure the options are applied correctly. Interestingly I've done a few more copies with Windows Explorer and so far all those copies worked. I've looked at the WriteFile calls and rclone seems to use 32KB writes while Windows explorer starts with 256KB first, then increases to 768KB, then 1 MB.

I plan to try some different filesystem options next. I'll also look into adding code to log zero data in some places.

lundman · 2023-06-14T09:52:03Z

When rclone runs, and using FileSpy are all the writes in sequence?

Like 0..32KB 32KB..64KB 64KB..96KB. .....

etc? Or does it seek around a bit? I wonder if the "offset counter update" is wrong, or if it sends the offset each time

EchterAgo · 2023-06-14T10:04:17Z

With the options I used it should just write sequentially with a single thread, but I'll verify this using FileSpy.

EchterAgo · 2023-06-14T10:16:12Z

FileSpy_20230614_rclone_write.zip

This is a capture of rclone writing the biggest file I have in the dataset. It does look sequential in 32KB chunks. There seems to be a FASTIO_WRITE call that fails in there. I'll also get a capture of Windows Explorer writing the same file.

I also wonder if it is feasible to just capture the whole copy, then I could write a small tool to locate the problematic offsets in the capture.

lundman · 2023-06-14T10:29:23Z

Ignore the FASTIO errors, it tries it first to see if we support it, and if not, does a regular IRP call.
We have the FASTIO patch in a branch, but it complicates things, so I was going for stable first.

2	17:10:10.318		rclone.exe	IRP_MJ_WRITE	00060A00	FFFFBB06C3C12088	D:\data2.crypt	STATUS_SUCCESS	Offset: 00000003-7B3C7000 ToWrite: 8000 Written: 8000 
4	17:10:10.318		rclone.exe	IRP_MJ_WRITE	00060A00	FFFFBB06C3C12088	D:\data2.crypt	STATUS_SUCCESS	Offset: 00000003-7B3CF000 ToWrite: 8000 Written: 8000 
6	17:10:10.318		rclone.exe	IRP_MJ_WRITE	00060A00	FFFFBB06C3C12088	D:\data2.crypt	STATUS_SUCCESS	Offset: 00000003-7B3D7000 ToWrite: 8000 Written: 8000 
8	17:10:10.318		rclone.exe	IRP_MJ_WRITE	00060A00	FFFFBB06C3C12088	D:\data2.crypt	STATUS_SUCCESS	Offset: 00000003-7B3DF000 ToWrite: 8000 Written: 8000

Hmm what does IrpSp->Parameters.Write.ByteOffset.HighPart being 3 mean? Another special hardcoded value?

I am aware of these two:

		if (IrpSp->Parameters.Write.ByteOffset.LowPart ==
		    FILE_USE_FILE_POINTER_POSITION) {
			byteOffset = fileObject->CurrentByteOffset;
		} else if (IrpSp->Parameters.Write.ByteOffset.LowPart ==
		    FILE_WRITE_TO_END_OF_FILE) { // APPEND

#define FILE_WRITE_TO_END_OF_FILE 0xffffffff
#define FILE_USE_FILE_POINTER_POSITION 0xfffffffe

lundman · 2023-06-14T10:37:40Z

Ah ok this is just a snippet of a very large file? Then maybe it is up to 0x000000037B3C7000. So as long as it started at 0, then 1, 2, and now 3 that is fine. All writes should then end with 0x7000 and 0xf000 as long as the writes are 0x8000. They certainly do in the snippet I have.
You can tell FileSpy to monitor the drive letter of ZFS, then at the end, File->Save.

EchterAgo · 2023-06-14T10:54:07Z

Yes, the file is 150GB, that is a normal file offset. I checked with smaller files and the high part of the address is zero.

At the rate the FileSpy memory usage is growing when capturing I doubt I have enough memory to capture a whole rclone transfer. Is there any way to get FileSpy to log directly to disk?

EchterAgo · 2023-06-14T11:04:08Z

Initially I believed this to only affect larger files, but I've since seen this corruption in 1GB sized files as well.

EchterAgo · 2023-06-14T12:25:26Z

I can just copy the biggest file over and over again and capture its events.

lundman · 2023-06-21T01:02:51Z

So the io tests in MS test.exe highlight we have a gap:

Write to file at 0, PASS
Check position hasn't moved, PASS
Set position, FAIL (STATUS_NOT_IMPLEMENTED)
Check position has now moved, FAIL (CurrentByteOffset was 0, expected 6)
Read from file specifying offset, PASS
Check position, FAIL (CurrentByteOffset was 0, expected 6)
Read from file specifying offset as FILE_USE_FILE_POINTER_POSITION, PASS
Write to file specifying offset as FILE_USE_FILE_POINTER_POSITION, PASS
Set position to 0, FAIL (STATUS_NOT_IMPLEMENTED)
Write to file specifying offset as FILE_WRITE_TO_END_OF_FILE, PASS
Check position, PASS

So let me fix those before we test again

lundman · 2023-06-21T01:03:34Z

56eca2a

Unlikely to affect things, you would have seen not-implemented in FileSpy.

EchterAgo · 2023-06-21T12:15:27Z

I tested with the latest changes and still see corruption. I also tested different sources for the copy, including local NTFS drives, with the same result.

So far, anytime I copy enough data using rclone sync <src> <dst> and then verify it using rclone check --differ <differ.log> <src> <dst> I get differences in the differ.log, different files each time even though I create a fresh zpool each time. When I examine the files there is a zeroed section where the start and end offsets are 4k aligned. So far I haven't found a file that has multiple such sections, but I haven't examined each of them.

I haven't found any rclone / zfs / zpool options that make a difference yet and there are no reported checksum or R/W errors.

EchterAgo · 2023-06-21T15:35:43Z

Ok, I have a simple tool to reproduce:

import uuid
import random
import math
import os
from pathlib import Path

ROOTPATH = Path('D:\\testdir')
DATA_BLOCK_SIZE = 32 * 1024
DATA_NUM_BLOCKS = math.floor((1 * 1024 * 1024 * 1024) / DATA_BLOCK_SIZE)
#DATA_NUM_BLOCKS = 3

def gen_file():
    filename = ROOTPATH / str(uuid.uuid4())
    start_state = random.getstate()
    
    with open(filename, 'wb') as f:
        for bi in range(0, DATA_NUM_BLOCKS):
            data_block = random.randbytes(DATA_BLOCK_SIZE)
            f.write(data_block)

    random.setstate(start_state)

    with open(filename, 'rb') as f:
        for bi in range(0, DATA_NUM_BLOCKS):
            disk_data_block = f.read(DATA_BLOCK_SIZE)
            expected_data_block = random.randbytes(DATA_BLOCK_SIZE)
            
            if disk_data_block != expected_data_block:
                print(f'found mismatch in {filename} @ {bi * DATA_BLOCK_SIZE}')


def main():
    random.seed()
    
    os.makedirs(ROOTPATH, exist_ok=True)
    
    while True:
        gen_file()

if __name__ == "__main__":
    main()

it still needs a while to hit the errror. For my first try it wrote 888GB to hit the error. Same issue as with rclone but now isolated to only writing to ZFS.

EchterAgo · 2023-06-21T15:36:33Z

You can set a breakpoint on the mismatch print to stop the program when something goes wrong. I'll try to get a trace at that point tomorrow.

EchterAgo · 2023-06-21T15:38:37Z

I also tested with ZFS default options, same behavior. I'll check without cache devices, different redundancy options and just single device tomorrow.

EchterAgo · 2023-06-21T15:42:36Z

I also analyzed the file, it is the same zero filled 4k aligned corruption. I'll try this on a clean Windows install in a VM next.

EchterAgo · 2023-06-21T16:39:17Z

On another try I hit corruption after 200GB.

lundman · 2023-08-24T22:13:15Z

Good news it will compress really well!

Oh and there is no Registry zfs_disable_wincache=1 for this version (we know that works anyway) - we'll bring it back if it eventually works, just letting you know its not plumbed in the new code.

EchterAgo · 2023-08-24T22:17:40Z

I don't think there ever was a zfs_disable_wincache registry entry, I used the debugger to change it, but I would like having that feature. I can try making that myself and submitting a PR if that is something you want added.

EchterAgo · 2023-08-24T22:20:14Z

I still wonder why I didn't see the issue when trying without the L2ARC cache devices. Maybe it was just luck? I'll try this again.

lundman · 2023-08-24T22:20:55Z

Ah yep, I never did add the MODULE_PARAM lines around it.

lundman · 2023-08-25T00:04:16Z

Hah yeah, literally the first if was wrong, so it skipped alllll the new code.

lundman · 2023-08-25T00:19:28Z

0c3c274

EchterAgo · 2023-08-25T16:18:49Z

First copy passed, I am going to keep testing a few more times.

EchterAgo · 2023-08-26T08:07:27Z

By now I am convinced that this is fixed, so I am going to close this. If I hit it again I will reopen.

lundman · 2023-08-26T08:09:51Z

That is exciting - I will massage it to fit with ZFS style a bit more, then move into main branches.

lundman · 2023-08-26T23:28:51Z

The read function is just as complicated with special cases - even though we dont know of any issues, it may be worth copying as well.

EchterAgo · 2023-08-27T00:36:20Z

It seems that I set up remote access correctly so I will be able to run some tests while away from home.

What I noticed yesterday is that my system seems to mostly hang but not crash after running the copy, deleting all the files and restarting the copy. Immediately before this almost all of the machines memory is used. After restarting importing the pool crashes in unlinked drain. I think I can trigger this quite reliably.

lundman · 2023-08-27T00:49:46Z

Lets have a separate issue for unlinked drain bug - it should be easy (easier) to fix at least, just needs stack.

Someone else mentioned we aren't freeing space on delete again - unsure how that happened since we've not touched that area.

EchterAgo · 2023-11-16T06:51:48Z

This occurred again today. It does seem to occur less frequently. This is on a completely different computer.

lundman · 2023-11-16T07:16:42Z

What happened? "Copying files to a dataset and then verifying shows mismatches" ?

EchterAgo · 2023-11-16T07:20:11Z

Copied with rclone sync then reread with rclone check showed a mismatch. Checking differences with HxD showed a zeroed hole.

lundman · 2023-11-16T07:22:34Z

boo, i dont like this news :)

I assume we arent using the new clone-file feature, upstream is disabling it at the moment for making zero holes in files.

EchterAgo · 2023-11-16T07:34:21Z

I think this is still the same issue as previously, it just occurs less frequently. I set up a fio job now that keeps writing and verifying with the rclone IO pattern:

[global]
ioengine=windowsaio
direct=0
iodepth=8
time_based
runtime=0
directory=D\:\testdir
filename_format=testfile_$jobname_$jobnum_$filenum.bin
bs=32k
size=16G
fallocate=native

[sequential_write]
rw=write
numjobs=8
verify=xxhash
verify_fatal=1
verify_dump=1

I'll run that for a bit and see if that produces anything.

EchterAgo · 2023-11-23T06:39:13Z

openzfs#15526 sounds very similar to this.

lundman · 2023-11-23T06:46:34Z

They certainly are panicking at the moment, I don't think we do any clonefile? Although, it does sound it happens outside of clonefile as well, so it is possible.

Got anything for me to work on? I am adding the new CaseSensitive flag, but almost done.

EchterAgo · 2023-11-23T14:05:44Z

They certainly are panicking at the moment, I don't think we do any clonefile? Although, it does sound it happens outside of clonefile as well, so it is possible.

Got anything for me to work on? I am adding the new CaseSensitive flag, but almost done.

Yea, it does seem to happen even without clonefile. I don't have any issues right now other than the data corruption one, I'm still trying to get a API trace capture and cbuf from right after it happens, but it is difficult.

EchterAgo · 2023-11-23T14:07:47Z

It seems that after your newwrite branch the frequency of the corruptions went way down, so far I can't reproduce it using my test script anymore, neither with fio, but rclone still triggers it somehow, just way less frequent.

EchterAgo · 2023-11-23T14:08:54Z

The upstream issue also seems very timing sensitive. Maybe I should also try some different pool options.

EchterAgo · 2023-11-24T19:38:29Z

I'll revert to before the newwrite branch and try to reproduce with openzfs#15571

EchterAgo · 2023-11-26T14:34:53Z

So, testing with c3c9364 (zfswin-2.2.0rc3) it is still easy for me to reproduce the problem, so that rules out any change in my setup. I have now cherry picked 76df45c on top of that and am rerunning my tests now.

EchterAgo · 2023-11-26T22:33:36Z

Sadly applying 76df45c did not change anything, so this is a different issue.

lundman · 2024-01-24T00:36:55Z

@EchterAgo You and I worked hard on the issue of corruption, and I believe we fixed them all. #318

Can you let me know if this isn't the case?

EchterAgo mentioned this issue Jun 9, 2023

Setting file modification times does not work #226

Closed

colemickens mentioned this issue Jun 11, 2023

suggestion: windows docs: mention options for stable-openzfs-linux compatibility #228

Open

EchterAgo closed this as completed Aug 26, 2023

EchterAgo reopened this Aug 27, 2023

EchterAgo closed this as completed Aug 27, 2023

EchterAgo reopened this Nov 16, 2023

Copying files to a dataset and then verifying shows mismatches #227

Copying files to a dataset and then verifying shows mismatches #227

Comments

EchterAgo commented Jun 9, 2023

EchterAgo commented Jun 9, 2023 • edited Loading

EchterAgo commented Jun 9, 2023

EchterAgo commented Jun 9, 2023

EchterAgo commented Jun 9, 2023

lundman commented Jun 10, 2023

EchterAgo commented Jun 10, 2023

EchterAgo commented Jun 10, 2023

sskras commented Jun 10, 2023

EchterAgo commented Jun 11, 2023

lundman commented Jun 14, 2023

sskras commented Jun 14, 2023 • edited Loading

EchterAgo commented Jun 14, 2023

lundman commented Jun 14, 2023

EchterAgo commented Jun 14, 2023

EchterAgo commented Jun 14, 2023

lundman commented Jun 14, 2023

lundman commented Jun 14, 2023

EchterAgo commented Jun 14, 2023

EchterAgo commented Jun 14, 2023

EchterAgo commented Jun 14, 2023

lundman commented Jun 21, 2023

lundman commented Jun 21, 2023

EchterAgo commented Jun 21, 2023

EchterAgo commented Jun 21, 2023 • edited Loading

EchterAgo commented Jun 21, 2023

EchterAgo commented Jun 21, 2023

EchterAgo commented Jun 21, 2023 • edited Loading

EchterAgo commented Jun 21, 2023

lundman commented Aug 24, 2023

EchterAgo commented Aug 24, 2023

EchterAgo commented Aug 24, 2023

lundman commented Aug 24, 2023

lundman commented Aug 25, 2023

lundman commented Aug 25, 2023

EchterAgo commented Aug 25, 2023

EchterAgo commented Aug 26, 2023

lundman commented Aug 26, 2023

lundman commented Aug 26, 2023

EchterAgo commented Aug 27, 2023

lundman commented Aug 27, 2023

EchterAgo commented Nov 16, 2023

lundman commented Nov 16, 2023

EchterAgo commented Nov 16, 2023

lundman commented Nov 16, 2023

EchterAgo commented Nov 16, 2023

EchterAgo commented Nov 23, 2023

lundman commented Nov 23, 2023

EchterAgo commented Nov 23, 2023

EchterAgo commented Nov 23, 2023

EchterAgo commented Nov 23, 2023

EchterAgo commented Nov 24, 2023

EchterAgo commented Nov 26, 2023

EchterAgo commented Nov 26, 2023

lundman commented Jan 24, 2024

EchterAgo commented Jun 9, 2023 •

edited

Loading

sskras commented Jun 14, 2023 •

edited

Loading

EchterAgo commented Jun 21, 2023 •

edited

Loading

EchterAgo commented Jun 21, 2023 •

edited

Loading