GCC optimization flag incompatibility #22

mratsim · 2020-08-12T13:42:05Z

FYI, BLST is incompatible with the following GCC flag: -ftree-loop-vectorize

This causes the scalar multiplication to miscompile and easily noticeable on blst_sk_to_pk_in_g1 function.

Unfortunately it is automatically activated with -O3. You might want to have a note about not compiling with -O3 or using -fno-tree-loop-vectorize to deactivate the offending flag.

Tested with GCC v10.1.0

Clang works fine (v10.0.1)

The text was updated successfully, but these errors were encountered:

mratsim · 2020-08-13T09:27:03Z

One solution at your level would be something like

#define POINT_MULT_SCALAR_W5_IMPL(ptype) \
#if (__GNUC__ == 4 && __GNUC_MINOR__ == 8 && __GNUC_PATCHLEVEL__ == 5) \
__attribute__((optimize("no-tree-vectorize"))) \
#endif \
static void ptype##_gather_booth_w5(ptype *restrict p, const ptype table[16], \
                                    limb_t booth_idx) \
{ \
    size_t i; \
    limb_t booth_sign = (booth_idx >> 5) & 1; \
\
    booth_idx &= 0x1f; \
    vec_zero(p, sizeof(ptype)); /* implicit infinity at table[-1] */\
    /* ~6% with -Os, ~2% with -O3 ... */\
    for (i = 1; i <= 16; i++) \
        ptype##_ccopy(p, table + i - 1, i == booth_idx); \
\
    ptype##_cneg(p, booth_sign); \
} \

(assuming this is the problematic function and problematic compiler version)

dot-asm · 2020-08-13T13:50:38Z

BLST is incompatible with the following GCC flag: -ftree-loop-vectorize

Why not other way around? :-):-):-) But on serious note, if specific compiler version fails to compile a piece of code, while others can, it speaks rather in favour of compiler bug. This is not to say that it necessarily means compiler bug, but it's first assumption to make.

One solution at your level would be something like

I for one am not big fan of compiler-specific workarounds, but suggestion is not the way to go. Because nested pre-processor directives don't work. But function can have separate declaration with designated attributes... Another way to solve it would be ... more assembly, so that compiler won't be in position to make the self-defeating assumptions...

On side note. Keep in mind that blst is not that dependent on optimization level, because most of the "magic" happens in assembly. In other words difference between -O2 and -O3 is effectively negligible, so you don't actually have to compile blst with -O3. Unless of course if your C code is sensitive to optimization level, and you want to compile everything in the same go...

dot-asm · 2020-08-13T14:08:06Z

Can you confirm that compiling with -Drestrict= flag helps?

[Just in case for reference, this is not a suggested solution, just an attempt to pinpoint the problem.]

mratsim · 2020-08-13T14:48:29Z

Why not other way around? :-):-):-) But on serious note, if specific compiler version fails to compile a piece of code, while others can, it speaks rather in favour of compiler bug. This is not to say that it necessarily means compiler bug, but it's first assumption to make.

There is a related GCC bug that has been lurking for 10 years at least, for example x264 https://mailman.videolan.org/pipermail/x264-devel/2010-June/007462.html. Clang doesn't exhibit this which also supports a GCC bug.

Yes -Drestrict= makes blst_sk_to_pk_in_g1 behave

Unless of course if your C code is sensitive to optimization level, and you want to compile everything in the same go...

Yes that's the case, I don't compile BLST as a separate DLL but compile it at the same time as the rest of the Nim/C code.

dot-asm · 2020-08-13T20:16:41Z

Why not other way around? :-):-):-)

There is a related GCC bug that has been lurking for 10 years at least,

In other words bug is so old that it's considered a feature:-) This is exactly why I'm not fond of compiler-specific workarounds, they effectively let compiler off the hook...

Either way, could you double-check vec_select_n? ~~It's only x86_64 for the moment...~~

dot-asm · 2020-08-18T09:54:04Z

vec_select_n is merged. Closing...

mratsim · 2020-09-18T06:18:28Z

Sorry for the late reply, I didn't have time to upgrade earlier .

Unfortunately I seem to still get wrong results with GCC -O3 with the master from yesterday (a8398ed) unless I pass fno-tree-vectorize, despite that branch being merged and f8a77bd

For now I'll keep using fno-tree-vectorize with that compiler.

* Bump BLST * Test for supranational/blst#22 regression * Use SHA256 from BLST + bump nim-blscurve to reenable fno-tree-vectorize * SHA256 on non-blst platforms import fixes * import fixes again * can't prefix with nimcrypto * address review comment [skip ci] * {.noInit.} on the digests

dot-asm · 2020-09-25T19:26:12Z

Still? Hmm... Since -Drestrict= helps, I assume it still does, can you test one thing? Drop the qualifiers from ptype##_ccopy in src/point.h.

dot-asm · 2020-09-25T19:39:13Z

Just in case for reference. restrict qualifiers were added in order to eliminate arguably unjustified branches depending on outcome of pointer comparisons. It's not really a constant-time thing, but rather this-ought-to-complicate-binary-code-validation thing. But one should expect slightly better better performance as well...

mratsim mentioned this issue Aug 12, 2020

Priv to pub blst status-im/nim-blscurve#69

Merged

mratsim changed the title ~~GCC/Clang optimization flags incompatibility~~ GCC optimization flag incompatibility Aug 12, 2020

mratsim mentioned this issue Aug 13, 2020

Bump nim-blscurve status-im/nimbus-eth2#1491

Merged

dot-asm closed this as completed Aug 18, 2020

This was referenced Sep 15, 2020

Upgrade BLST and remove 2 now unnecessary workarounds status-im/nim-blscurve#84

Closed

Update BLST status-im/nim-blscurve#86

Merged

mratsim added a commit to status-im/nimbus-eth2 that referenced this issue Sep 17, 2020

Test for supranational/blst#22 regression

a6bd3d3

emturner mentioned this issue Jul 1, 2024

Downgrade blst back to 0.3.10 trilitech/tezedge#79

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GCC optimization flag incompatibility #22

GCC optimization flag incompatibility #22

mratsim commented Aug 12, 2020 •

edited

Loading

mratsim commented Aug 13, 2020

dot-asm commented Aug 13, 2020

dot-asm commented Aug 13, 2020 •

edited

Loading

mratsim commented Aug 13, 2020 •

edited

Loading

dot-asm commented Aug 13, 2020 •

edited

Loading

dot-asm commented Aug 18, 2020

mratsim commented Sep 18, 2020

dot-asm commented Sep 25, 2020

dot-asm commented Sep 25, 2020

GCC optimization flag incompatibility #22

GCC optimization flag incompatibility #22

Comments

mratsim commented Aug 12, 2020 • edited Loading

mratsim commented Aug 13, 2020

dot-asm commented Aug 13, 2020

dot-asm commented Aug 13, 2020 • edited Loading

mratsim commented Aug 13, 2020 • edited Loading

dot-asm commented Aug 13, 2020 • edited Loading

dot-asm commented Aug 18, 2020

mratsim commented Sep 18, 2020

dot-asm commented Sep 25, 2020

dot-asm commented Sep 25, 2020

mratsim commented Aug 12, 2020 •

edited

Loading

dot-asm commented Aug 13, 2020 •

edited

Loading

mratsim commented Aug 13, 2020 •

edited

Loading

dot-asm commented Aug 13, 2020 •

edited

Loading