-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bad performance on NVME SSD #10856
Comments
Due to CoW nature of ZFS it just eats your NVMe's cache faster than XFS, as in review on this disk https://www.tomshardware.com/reviews/kingston-a2000-m2-nvme-ssd/2 :
, please try to test with more than 200GB SIZE of FIO job with XFS. ZFS doesn't do anything new with trim, it's the same as other FSes, but due to it's CoW nature it may trash disk's any cache it has (of course depends on disk's cache implementation) faster. Not a bug for me. |
IIRC IO codepath before and after trim in ZFS is equal, so this issue is about exact NVMe model behavior. @mabod if there is a degradation on XFS after 300GB, i think it will degradate further on up to full NVMe size tests, could you test it too? Ah, your tests didn't give 300GB+ writes, for ex:
IIRC From my experience, you can't test well any SSD/NVMe in just 60 seconds, some models may show a huge degradation even after hours of load. |
I delete my previous post because I realized that I had runtime=60 sec for fio. which did not allow the full test. Please delete for answer as well. You can reply when my new entry is available. Sorry for the inconvenience. |
you are right when it comes to wrote performance. Without any runtime limit the fio write performace with 300 GB is pretty similar for xfs and zfs with btrfs lagging behind: xfs: But the read performance differences are still remarkable: xfs: Why is zfs so bad in reading? |
@mabod Please show all variables from tests, especially BS,
If you test with BS less than recordsize, then it may give you huge read amplification. An 1M recordsize is not an original 128k. |
All parameters are listed in my first post BS is 1M and so is recordsize |
Sidenote: This is basically the same issue why the Phoronix test(S) comparing ZFS with XFS and other filesystems was a bust, ZFS on single NVME is not the best test case for ZFS or the best testcase for ZFS |
Now the zfs test with 300 GB filesaize and no runtime limit als finished. Does not look good for zfs:
The performance is really bad. |
@Ornias1993 |
@mabod As I said: It wasn't an answer, it was a sidenote. ZFS has been notoriusly bad with single NVME drives for quite some time. So it isn't the best test case for general testing of ZFS vs other filesystems. That's all i'm saying. Like I said: Thats a sidenote. I'm not saying anything about this specific issue, just that this is quite known to be a somewhat problematic scenario with ZFS. That doesn't mean it should be problematic, I agree. |
As it appears that it is indeed the thing with this NVMe cache, you may want to use FIO with About read performance on NVMe - looks like this question is a duplicate of #8381 . |
autotrim is disabled. What's the results with autotrim on? |
autotrim makes the performance even worse. Not read speed which basically stays the same, but write speed for 64 GB filesize is going down to 483 MB/s
That is not much slower but slower than the ca. 600 MB/s I get without autotrim. |
I also noticed that you have xattr=on and casesensitivity = sensitive. I would disable both or make xattr=sa. I'm not sure what kind of operations fio does, so this might be irrelevant. And change recordsize back to 128k at least to make random reads faster. |
I typically have xattr=sa. Dont know why this time it was xattr=on. Anyways, it has no impact on fio performance figures. I tested it. |
Does it behave similarly (poor) on SATA SSD? |
I do not have SATA SSD. |
I'd like to suggest the |
I am closing this now. I am not seeing any performance issues with my NVME drives and zfs 2.0.4. |
How much better is it supposed to be for mirrored NVMEs - that should be considered a primary use-case of ZFS, shouldn't it? If this is an "NVME's cache size" issue, then perhaps there should be recommended NVME products for those planning to buy NVMEs and use ZFS on them? |
I closed this issue in 2021. It was for zfs 0.8.4. |
Right - thanks. Did you end up running the numbers again on your same hardware to compare modern ZFS to XFS, @mabod? Just curious what it ended up being. |
What it ended up being, is you bumping this closed(!) 2021(!) issue into 8 people their github notifications. |
System information
Describe the problem you're observing
zfs performance tested with fio is poor on nvme ssd compared to xfs. I expected that. But what I find is that with the second or third fio run the write performance drops significantly to 30 %. And it stays there until I do a trim. Then it starts off with full speed again only to drop again after the second fio run..
XFS performance is the benchmark:
zfs performance with a fresh new pool PCIE3:
But with the 2nd and all subsequent fio runs the zfs performance drops to this:
And it stays there until I do a
zpool trim PCIE3
. Then it starts with higher performance just to drop again. The xfs performance does not depend on trim. XFS performance is constantly good.Additional Info
Hardware
The nvme is a KINGSTON A2000 1TB
PC is a AMD Ryzen 7 3700X with 64 GB RAM.
zpool get all PCIE3:
zfs get all PCIE3/BENCHMARK:
fio command:
fio configs:
The text was updated successfully, but these errors were encountered: