Enable libsleefdft and libsleefquad on riscv64 #503

sh1boot · 2024-01-17T04:45:34Z

Continuation of #496

Mostly just the in-fill of src/arch/helperrvv.h.

src/arch/helperrvv.h

sh1boot · 2024-01-20T01:33:48Z

Enabling inline headers will come next: rivosinc#4

It refactors this code a little bit, but I assume it's better to draw a line under this before moving on.

src/arch/helperrvv.h

luhenry · 2024-01-22T17:08:13Z

@blapie could we please get a review on that, IMO it should be ready for RISC-V. Also it's passing on CI: https://github.com/rivosinc/sleef/actions/runs/7596179815

blapie · 2024-01-22T17:11:03Z

Hi! Will have a look this week! Thanks for the contribution already!

blapie

Looks sensible to me, but might need to have a second look because atm I'm mostly just trusting you on the technicalities.

src/dft/vectortype.h

.github/workflows/build_and_test.yml

sh1boot · 2024-01-30T21:09:40Z

Soliciting review from @ericlove and @GlassOfWhiskey. I understand you were contributors at the beginning of this PR.

src/dft/vectortype.h

sh1boot · 2024-02-03T16:52:51Z

I'm not sure about some of the cast/truncate implementations, here. vint is a different size from vmask (not sure why, is that ok?), and when casting between the two there are a mix of different policies for which bits to throw away.

Unfortunately if I change the implementation the tests still pass.

What's the expectation for functions that go between vint and vmask?

This has come up where I've extended the logic for quad support, where I did indeed get test failures, but only in RVVM2 tests, where the truncation frees a whole register which the compiler can reuse. In RVVM1 bugs can be hidden because vint uses half a register and if you truncate and un-truncate the data can survive.

blapie · 2024-02-06T17:16:35Z

I'm not sure about some of the cast/truncate implementations, here. vint is a different size from vmask (not sure why, is that ok?), and when casting between the two there are a mix of different policies for which bits to throw away.

What do you mean by "there are a mix of different policies"? Do you mean cross vector extensions in SLEEF?
Not sure, if vint and vmask are meant to be of the same size or not, each vector extension could handle them differently. SVE for instance uses the same datatype for both, but it drops the most significant bits I believe, maybe you could use the same approach for truncating/casting?

Unfortunately if I change the implementation the tests still pass.

What's the expectation for functions that go between vint and vmask?

I'm not aware of any particular expectation, as long as the choice is consistent, but my understanding of the dft part of the library is limited.

ericlove · 2024-02-06T18:08:08Z

I'm not sure about some of the cast/truncate implementations, here. vint is a different size from vmask (not sure why, is that ok?), and when casting between the two there are a mix of different policies for which bits to throw away.

Sorry for taking so long to join this conversation, but just thought it might be useful to chime in with my memory of why vint and vmask are the way they are in the RVV port (warning: this memory is reconstructed because it was long paged out...take it with a grain of salt).

The vmask type is constrained by the definition of vcast_vm_i_i to require 64-bit elements, even in single precision code in sleefsimdsp.c. Unfortunately, this interacts badly with one of the design principles behind the RVV port, which is that all types should have the same VLMAX (VECTLENSP here), meaning that, for SP (assuming RVVM1 for example), vmask needs to have LMUL=2, while for DP, where the base type vdouble (vfloat64m1_t) already has 64-bit elements, it needs LMUL=1.

Meanwhile, vint follows a different logic. It is used only in the DP codes, but for some reason is constrained to have 32-bit elements. I no longer know why this is--either I ran into bugs in the original port when trying to widen it to 64 bits, or you have just found an opportunity to fix a major incoherence in the RVV design; you'll have to try it out to know which (and either way, we can certainly add the glaringly absent comment that should accompany it). But given this is the case now, it means that to keep VLMAX constant at VECTLENDP, vint needs to have half the LMUL of other 64-bit types, including vmask.

Hopefully one of these constraints I described is unnecessary, and can be relaxed to simplify both the original RVV design, and your work on the DFT extensions.

sh1boot · 2024-02-09T06:55:24Z

To clarify what I mean about 'different policies', here's the list of conversions to or from the vmask type that I found (supposing a configuration where m1 represents the minimum 128 bits):

vreinterpret_vf_vm: vncvt_x  256 -> 128
vreinterpret_vm_vf: vwcvtu_x  128 -> 256
vreinterpret_vd_vm: bitcast  128 -> 128
vreinterpret_vm_vd: bitcast  128 -> 128
vreinterpret_vm_vi64: bitcast  128 -> 128
vreinterpret_vi64_vm: bitcast  128 -> 128
vreinterpret_vm_vu64: bitcast  128 -> 128
vreinterpret_vu64_vm: bitcast  128 -> 128

vcast_vm_i64: splat  64 -> 128
vcast_vm_u64: splat  64 -> 128
vcast_vm_i_i: splat  (32+32) -> 128  or (32+32) -> 256 for single-precision
vcast_vm_vi: vwcvt_x  64 -> 128
vcastu_vm_vi: vwsll_x  64 -> 128
vcast_vi_vm: vncvt_x  128 ->64
vcastu_vi_vm: vnsrl 32  128 -> 64

Here I think vncvt_x will truncate on a per-lane basis (64-bit down to 32-bit for both single and double precision configurations), vwcvt_x will sign extend (and vwcvtu_x will zero-extend) on a per-lane basis (32-bit up to 64-bit), and vnsrl 32 will truncate on a per-lane basis like vncvt but selecting the top 32 bits instead of the bottom 32 bits. splat just repeats a 64-bit pattern until the destination is full.

Though that's in my local work tree after I've refactored a bit. Comparing to the version in the current pull request, I see two implementations for vcast_vi_vm(), and the single-precision one is a truncate operation which is definitely wrong. However, I can't think where that came from (evidently I added it but I don't see why I would because I don't think it's used?).

~~vcast_vm_i_i gives me pause, because it takes a pair of 64-bit inputs but still only creates a 64-bit pattern out of those before splatting.~~ [edit] looking at the usage, it's clearly expected that the arguments are truncated to 32-bit, even though the interface accepts 64-bit.

~~It looks like vcastu_vi_vm(vcastu_vm_vi(x)) would always return zero, which seems surprising as well.~~ [edit] This is fine, I overlooked the vsll outside of the vwcvt_x.

But all the tests pass.

I'll try rationalising the size of vmask and vint for single-precision and see which options work. If there's a more conventional arrangement then I guess I'll switch to that instead.

sh1boot · 2024-02-10T00:35:40Z

The vmask type is constrained by the definition of vcast_vm_i_i to require 64-bit elements, even in single precision code in sleefsimdsp.c.

The way it's used in sleefsimdsp.c it looks like it's just creating pairs of complementary floats, so it's not unreasonable to cast it back to a 32-bit type having the same VECTLENSP as float32 operations -- then it works out to be the same bit width for DP and SP cases. It just needs a temporary 64-bit splat while the pair of floats are unrolled.

Working that change through piece by piece it looks like all (or just most?) of the compatibility workarounds between SP and DP disappear. So I guess I'll do that.

sh1boot · 2024-02-10T02:30:57Z

Looks like the first step in rebasing automatically closed this PR? Anyway, I've rebased it, now, and fixed or understood everything that I was worried about.

.github/workflows/build_and_test.yml

sh1boot · 2024-02-16T03:49:57Z

I've carved out some of this patch into #520 and #521. I'm not sure how github works, here, but I assume that this patch now depends on those PRs going in first, and when they do this diff will shrink?

#522 to follow.

Diff from #521: rivosinc/sleef@dev/shosie/fix-rvv-warnings...dev/shosie/fix-rvv-dft

blapie · 2024-03-01T15:58:55Z

Ok, all 4 PRs make sense for the most part. Really appreciate you linking to the diffs, although my eyes started to burn a bit after 2 PRs...
Almost ready to merge #520, #521, #503 and #522 (in that order).
Just requesting more comments here #521 (comment)
If you feel like the code could use a bit more commenting please go ahead if that helps future maintenance.

blapie · 2024-03-07T08:18:26Z

@sh1boot This one need a rebase now.

Co-Authored-By: Simon Hosie <[email protected]>

luhenry · 2024-03-07T10:40:32Z

@blapie I've rebased it and updated this PR. The diff between before and after the rebase is the same: https://github.com/shibatch/sleef/compare/82411c1f1883a528471db11c7439b56857038de8..b1c49df4ebe7163f6ab6c553586d0840a46f627e.

luhenry suggested changes Jan 18, 2024

View reviewed changes

src/arch/helperrvv.h Outdated Show resolved Hide resolved

luhenry suggested changes Jan 18, 2024

View reviewed changes

src/arch/helperrvv.h Outdated Show resolved Hide resolved

luhenry suggested changes Jan 18, 2024

View reviewed changes

src/arch/helperrvv.h Outdated Show resolved Hide resolved

src/arch/helperrvv.h Outdated Show resolved Hide resolved

sh1boot commented Jan 19, 2024

View reviewed changes

src/arch/helperrvv.h Outdated Show resolved Hide resolved

luhenry suggested changes Jan 20, 2024

View reviewed changes

src/arch/helperrvv.h Outdated Show resolved Hide resolved

luhenry approved these changes Jan 21, 2024

View reviewed changes

blapie reviewed Jan 25, 2024

View reviewed changes

src/dft/vectortype.h Outdated Show resolved Hide resolved

.github/workflows/build_and_test.yml Show resolved Hide resolved

blapie mentioned this pull request Jan 25, 2024

Restart/Fix CI tests #481

Open

blapie reviewed Jan 31, 2024

View reviewed changes

src/dft/vectortype.h Outdated Show resolved Hide resolved

sh1boot closed this Feb 10, 2024

sh1boot force-pushed the dev/shosie/fix-rvv-dft branch from 605cb26 to be6106e Compare February 10, 2024 02:17

sh1boot reopened this Feb 10, 2024

luhenry reviewed Feb 11, 2024

View reviewed changes

.github/workflows/build_and_test.yml Outdated Show resolved Hide resolved

sh1boot changed the title ~~Enable libsleefdft on riscv64~~ Enable libsleefdft and libsleefquad on riscv64 Feb 11, 2024

sh1boot force-pushed the dev/shosie/fix-rvv-dft branch from 760f34a to 75c1178 Compare February 16, 2024 03:40

This was referenced Feb 16, 2024

Reconcile RVV mask types. #520

Merged

Clean up RVV register composition. #521

Merged

sh1boot mentioned this pull request Feb 16, 2024

Add RVV inline header support. #522

Merged

sh1boot force-pushed the dev/shosie/fix-rvv-dft branch from 75c1178 to 5bd14f3 Compare February 16, 2024 05:03

sh1boot force-pushed the dev/shosie/fix-rvv-dft branch from 5bd14f3 to a5ab642 Compare February 16, 2024 19:15

sh1boot mentioned this pull request Mar 4, 2024

add comments to helperrvv.h #527

Merged

sh1boot force-pushed the dev/shosie/fix-rvv-dft branch from a5ab642 to 82411c1 Compare March 6, 2024 17:54

Enable libsleefdft and libsleefquad on riscv64.

b1c49df

Co-Authored-By: Simon Hosie <[email protected]>

luhenry force-pushed the dev/shosie/fix-rvv-dft branch from 82411c1 to b1c49df Compare March 7, 2024 10:37

blapie merged commit e0fbd34 into shibatch:master Mar 7, 2024
31 checks passed

luhenry deleted the dev/shosie/fix-rvv-dft branch March 25, 2024 07:59

bedroge mentioned this pull request Jun 11, 2024

Notes on kickstarting the RISC-V software layer EESSI/software-layer#552

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable libsleefdft and libsleefquad on riscv64 #503

Enable libsleefdft and libsleefquad on riscv64 #503

sh1boot commented Jan 17, 2024

sh1boot commented Jan 20, 2024

luhenry commented Jan 22, 2024

blapie commented Jan 22, 2024

blapie left a comment

sh1boot commented Jan 30, 2024

sh1boot commented Feb 3, 2024 •

edited

Loading

blapie commented Feb 6, 2024

ericlove commented Feb 6, 2024 •

edited

Loading

sh1boot commented Feb 9, 2024 •

edited

Loading

sh1boot commented Feb 10, 2024 •

edited

Loading

sh1boot commented Feb 10, 2024

sh1boot commented Feb 16, 2024 •

edited

Loading

blapie commented Mar 1, 2024

blapie commented Mar 7, 2024

luhenry commented Mar 7, 2024

Enable libsleefdft and libsleefquad on riscv64 #503

Enable libsleefdft and libsleefquad on riscv64 #503

Conversation

sh1boot commented Jan 17, 2024

sh1boot commented Jan 20, 2024

luhenry commented Jan 22, 2024

blapie commented Jan 22, 2024

blapie left a comment

Choose a reason for hiding this comment

sh1boot commented Jan 30, 2024

sh1boot commented Feb 3, 2024 • edited Loading

blapie commented Feb 6, 2024

ericlove commented Feb 6, 2024 • edited Loading

sh1boot commented Feb 9, 2024 • edited Loading

sh1boot commented Feb 10, 2024 • edited Loading

sh1boot commented Feb 10, 2024

sh1boot commented Feb 16, 2024 • edited Loading

blapie commented Mar 1, 2024

blapie commented Mar 7, 2024

luhenry commented Mar 7, 2024

sh1boot commented Feb 3, 2024 •

edited

Loading

ericlove commented Feb 6, 2024 •

edited

Loading

sh1boot commented Feb 9, 2024 •

edited

Loading

sh1boot commented Feb 10, 2024 •

edited

Loading

sh1boot commented Feb 16, 2024 •

edited

Loading