Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault in some machines and not in others using OpenBSD adJ74 #206

Closed
vtamara opened this issue Mar 8, 2024 · 15 comments
Closed

Comments

@vtamara
Copy link

vtamara commented Mar 8, 2024

In the context of ava-labs/avalanchego#2782 and porting Avalanche tools to OpenBSD/adJ 7.4 we experienced segmentation faults and traced them to blst.

We compiled and tested blst in different CPUS all of them running OpenBSD/adJ 7.4 wih llvm 13.0.0 and go 1.22.0 we tested non-portable and portable builds as follows:

dmesg | grep cpu0
git clone [email protected]:supranational/blst
cd blst/bindings/go
go build
go test -v
go clean
go clean -cache
CGO_CFLAGS="-O2 -D__BLST_PORTABLE__" go build 
CGO_CFLAGS="-O2 -D__BLST_PORTABLE__" go test -v
cd ../../..
git clone -bv1.7.6adJ74 [email protected]:vtamara/avalanche-network-runner
cd avalanche-network-runner
./scripts/build.sh
./bin/avalanche-network-runner

The summary of results is:

# Processor Non-portable running Non-portable passing Portable running Portable passing ANR runs
1 Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz 0 0 28 0 No
2 Intel(R) Core(TM) i5-8265U CPU @ 1.60GHz 28 28 28 0 No
3 AMD Ryzen 5 4600G 3700.01 MHz 28 28 28 28 Yes
4 AMD FX(tm)-8350 4000.42 MHz 0 0 28 0 No
5 AMD Ryzen 5 2600 3400.01 MHz 28 28 28 28 Yes

The details of the test that didn´t pass was too long for this issue so it is available at: https://github.com/vtamara/blst/wiki/Segmentation-fault-in-some-machines-and-not-in-others-using-OpenBSD-adJ74

@dot-asm
Copy link
Collaborator

dot-asm commented Mar 9, 2024

I don't really know what "adj74" means, but I've managed to reproduce the problem a crash on vanilla OpenBSD 7.4 and it appears to be wild. Can you test following on a problematic system? In blst/bindings/go directory

  1. execute env CGO_CFLAGS="-O2 -D__BLST_PORTABLE__" go test -c;
  2. execute ./go.test -test.v to confirm that it crashes;
  3. execute gdb go.test and issue run -test.v at (gdb) prompt, it will crash;
  4. issue print/x *(int*)$rbp at (gdb) prompt and note that it prints 0x428a2f98;
  5. quit gdb
  6. now ./go.test -test.v doesn't crash on me, what about you?

This is not a suggested solution, just a test to confirm that we're looking at the same problem.

@vtamara
Copy link
Author

vtamara commented Mar 9, 2024

Thanks for answering.

adJ is a disribution of OpenBSD, see https://en.wikipedia.org/wiki/AdJ

Note that I'm using go 1.22 while OpenBSD 7.4 has package of go 1.21. You could compile the port of go 1.22 available for current-OpenBSD, see https://cvsweb.openbsd.org/ports/lang/go/ or maybe the unsigned package for adJ 7.4 could work on OpenBSD, you could try with: doas pkg_add -D unsigned http://adJ.pasosdeJesus.org/pub/AprendiendoDeJesus/7.4-extra/go-1.22.0p0.tgz

1. Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz:

  • Confirmed that steps 2 and 3 crash
  • Confirmed that step 4 issues 0x428a2f98
  • But in step 6 ./go.test -test.v crashes again, some lines are:
=== PAUSE TestMultiScalarP2                                                                                                                                     
=== CONT  TestG1HashToCurve                                                                                                                                     
=== CONT  TestEmptySignatureMinSig                                                                                                                              
=== CONT  TestEmptyMessageMinPk                                                                                                                                 
=== CONT  TestSignVerifyAugMinPk                                                                                                                                
SIGSEGV: segmentation violation                                                                                                                                 
PC=0x4e19e9 m=0 sigcode=2 addr=0x4e2a80                                                                                                                         
signal arrived during cgo execution                                                                                                                             
                                                                                                                                                                
goroutine 6 gp=0xc000007dc0 m=0 mp=0x5113c0 [syscall]:                                                                                                          
runtime.cgocall(0x4bce90, 0xc00005ade0)                                                                                                                         
        /usr/local/go/src/runtime/cgocall.go:157 +0x4b fp=0xc00005adb8 sp=0xc00005ad80 pc=0x36a9ab                                                              
github.com/supranational/blst/bindings/go._Cfunc_blst_hash_to_g1(0xc000280090, 0x0, 0x0, 0xc000024180, 0x32, 0x0, 0x0)                                          
        _cgo_gotypes.go:429 +0x45 fp=0xc00005ade0 sp=0xc00005adb8 pc=0x4ac1a5                                                                                   
github.com/supranational/blst/bindings/go.HashToG1({0x5702a0, 0x0, 0x2410be?}, {0xc000024180, 0x32, 0xc00005aeb8?}, {0x0, 0x0, 0xc00005ae90?})                  
        /home/vtamara/comp/go/blst/bindings/go/blst.go:1934 +0xc5 fp=0xc00005ae30 sp=0xc00005ade0 pc=0x4b6ae5                                                   
github.com/supranational/blst/bindings/go.jsonG1HashToCurve(0xc00010e680, {0x24d7d2, 0x3d})                                                                     
        /home/vtamara/comp/go/blst/bindings/go/blst_htoc_test.go:74 +0x3fb fp=0xc00005af48 sp=0xc00005ae30 pc=0x49581b                                          
github.com/supranational/blst/bindings/go.TestG1HashToCurve(0xc00010e680)                                                                                       
        /home/vtamara/comp/go/blst/bindings/go/blst_htoc_test.go:87 +0x2e fp=0xc00005af70 sp=0xc00005af48 pc=0x495a4e                                           
testing.tRunner(0xc00010e680, 0x24eff0)                                                                                                                         
        /usr/local/go/src/testing/testing.go:1689 +0xfb fp=0xc00005afc0 sp=0xc00005af70 pc=0x43f77b                                                             
testing.(*T).Run.gowrap1()                                                                                                                                      
        /usr/local/go/src/testing/testing.go:1742 +0x25 fp=0xc00005afe0 sp=0xc00005afc0 pc=0x4407a5                                                             
runtime.goexit({})                                                                                                                                              
        /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc00005afe8 sp=0xc00005afe0 pc=0x3d4ba1                                                             
created by testing.(*T).Run in goroutine 1                                                                                                                      
        /usr/local/go/src/testing/testing.go:1742 +0x390                                                                                                        

2. Intel(R) Core(TM) i5-8265U CPU @ 1.60GHz

  • Confirmed that steps 2 and 3 crash
  • Confirmed that step 4 issues 0x428a2f98
  • Confirmed that after the previous procedure step 6 passes

3. AMD FX(tm)-8350 4000.42 MHz

  • Confirmed that steps 2 and 3 crash
  • Confirmed that step 4 issues 0x428a2f98
  • Confirmed that after the previous procedure step 6 passes

@dot-asm
Copy link
Collaborator

dot-asm commented Mar 9, 2024

Just in case you wonder, 0x428a2f98 is the first constant in K256 table. Even though one system kept crashing, the fact that gdb prints the expected value indicates that it ought to be the same problem I've experienced. On it. Thanks for the confirmation.

@dot-asm
Copy link
Collaborator

dot-asm commented Mar 9, 2024

Could you test sha256-rodata branch from https://github.com/dot-asm/blst of mine? Just clone it and test the Go bindings. [For reference, the problem is not specific to Go, Rust was crashing too...]

@vtamara
Copy link
Author

vtamara commented Mar 10, 2024

Could you test sha256-rodata branch from https://github.com/dot-asm/blst of mine? Just clone it and test the Go bindings. [For reference, the problem is not specific to Go, Rust was crashing too...]

Yes. I notice that the situation improved in the first CPU. So the idea of adding read only sections is good.

1. Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz

    env CGO_CFLAGS="-O2 -D__BLST_PORTABLE__" go test -v

passes all 28 tests withouth issue. Although:

% go test -v 
Caught SIGILL in blst_cgo_init, consult <blst>/bindings/go/README.md.
exit status 132
FAIL    github.com/supranational/blst/bindings/go       0.004s

2. Intel(R) Core(TM) i5-8265U CPU @ 1.60GHz

    env CGO_CFLAGS="-O2 -D__BLST_PORTABLE__" go test -v

fails, some lines that seems relevant to me:

=== PAUSE TestMultiScalarP2                                                                                                                                     
=== CONT  TestG1HashToCurve                                                                                                                                     
=== CONT  TestEmptySignatureMinPk                                                                                                                               
=== CONT  TestSignVerifyAggregateMinSig                                                                                                                         
=== CONT  TestSignVerifyAugMinPk                                                                                                                                
SIGSEGV: segmentation violation                                                                                                                                 
PC=0x4e19e9 m=3 sigcode=2 addr=0x4e2a80                                                                                                                         
signal arrived during cgo execution                                                                                                                             
                                                                                                                                                                
goroutine 28 gp=0xc00014c380 m=3 mp=0xc00004b008 [syscall]:                                                                                                     
runtime.cgocall(0x4bcf50, 0xc000143c60)                                                                                                                         
        /usr/local/go/src/runtime/cgocall.go:157 +0x4b fp=0xc000143c38 sp=0xc000143c00 pc=0x36a9ab                                                              
github.com/supranational/blst/bindings/go._Cfunc_blst_keygen(0xc00019c020, 0xc00019c000, 0x20, 0x0, 0x0)                                                                _cgo_gotypes.go:467 +0x45 fp=0xc000143c60 sp=0xc000143c38 pc=0x4ac4a5                                                                                   github.com/supranational/blst/bindings/go.KeyGen({0xc00019c000, 0x20, 0x0?}, {0x0, 0x0, 0x18?})                                                                         /home/vtamara/comp/go/blst/bindings/go/blst.go:233 +0x88 fp=0xc000143ca0 sp=0xc000143c60 pc=0x4af8c8                                                    
github.com/supranational/blst/bindings/go.genRandomKeyMinSig()                                                                                                  
        /home/vtamara/comp/go/blst/bindings/go/blst_minsig_test.go:481 +0x52 fp=0xc000143ce8 sp=0xc000143ca0 pc=0x4a7cd2                                        
github.com/supranational/blst/bindings/go.generateBatchTestDataUncompressedMinSig(0x1)                                                                          
        /home/vtamara/comp/go/blst/bindings/go/blst_minsig_test.go:518 +0x22a fp=0xc000143e20 sp=0xc000143ce8 pc=0x4a862a                                       
github.com/supranational/blst/bindings/go.TestSignVerifyAggregateMinSig(0xc0001489c0)                                                                           
        /home/vtamara/comp/go/blst/bindings/go/blst_minsig_test.go:204 +0x54 fp=0xc000143f70 sp=0xc000143e20 pc=0x4a3034                                        
testing.tRunner(0xc0001489c0, 0x24f050)                                                                                                                         
        /usr/local/go/src/testing/testing.go:1689 +0xfb fp=0xc000143fc0 sp=0xc000143f70 pc=0x43f77b                                                             
testing.(*T).Run.gowrap1()
        /usr/local/go/src/testing/testing.go:1742 +0x25 fp=0xc000143fe0 sp=0xc000143fc0 pc=0x4407a5
runtime.goexit({})

while

    go test -v

keeps passing all the 28 tests.

3. AMD FX(tm)-8350 4000.42 MHz

    env CGO_CFLAGS="-O2 -D__BLST_PORTABLE__" go test -v

fails, some lines that seems relevant to me:

=== CONT  TestG1HashToCurve
=== CONT  TestEmptySignatureMinPk
=== CONT  TestSignVerifyAggregateMinSig
=== CONT  TestEmptyMessageMinSig
SIGSEGV: segmentation violation
PC=0x4e19e9 m=3 sigcode=2 addr=0x4e2a80
signal arrived during cgo execution

goroutine 28 gp=0xc00014a380 m=3 mp=0xc00004d008 [syscall]:
runtime.cgocall(0x4bcf50, 0xc000141c60)
        /usr/local/go/src/runtime/cgocall.go:157 +0x4b fp=0xc000141c38 sp=0xc000141c00 pc=0x36a9ab
github.com/supranational/blst/bindings/go._Cfunc_blst_keygen(0xc00019c020, 0xc00019c000, 0x20, 0x0, 0x0)
        _cgo_gotypes.go:467 +0x45 fp=0xc000141c60 sp=0xc000141c38 pc=0x4ac4a5                                                                            
github.com/supranational/blst/bindings/go.KeyGen({0xc00019c000, 0x20, 0x0?}, {0x0, 0x0, 0x18?})
        /home/vtamara/comp/go/blst/bindings/go/blst.go:233 +0x88 fp=0xc000141ca0 sp=0xc000141c60 pc=0x4af8c8
github.com/supranational/blst/bindings/go.genRandomKeyMinSig()
        /home/vtamara/comp/go/blst/bindings/go/blst_minsig_test.go:481 +0x52 fp=0xc000141ce8 sp=0xc000141ca0 pc=0x4a7cd2
github.com/supranational/blst/bindings/go.generateBatchTestDataUncompressedMinSig(0x1)
        /home/vtamara/comp/go/blst/bindings/go/blst_minsig_test.go:518 +0x22a fp=0xc000141e20 sp=0xc000141ce8 pc=0x4a862a
github.com/supranational/blst/bindings/go.TestSignVerifyAggregateMinSig(0xc0001469c0)
        /home/vtamara/comp/go/blst/bindings/go/blst_minsig_test.go:204 +0x54 fp=0xc000141f70 sp=0xc000141e20 pc=0x4a3034

And

% go test -v 
Caught SIGILL in blst_cgo_init, consult <blst>/bindings/go/README.md.
exit status 132
FAIL    github.com/supranational/blst/bindings/go       0.003s

@dot-asm
Copy link
Collaborator

dot-asm commented Mar 10, 2024

I'm not convinced that you managed to exercise the updated code from the sha256-rodata branch on the systems that fail. And the reason for why I think that is because the difference between PC and addr in the SIGSEGV message has not changed in comparison to the original report. Compile the test binary with go test -c and execute nm go.test | grep K256. There will be two [matches] (one from Go), but examine the middle column. If you spot 't' in one of the lines, then it's not the sha256-rodata branch. Well, it might also happen that Go cache is in the way, and you need to execute go clean -cache to actually compile sha256-rodata code.

@vtamara
Copy link
Author

vtamara commented Mar 11, 2024

You right, I'm sorry. Cleaning the cache is necessary. The situation improved in all computers, in all of them all the tests pass with CGO_CFLAGS="-O2 -D__BLST_PORTABLE__" go test -v

The non-portable tests kept failing in two of them but I noticed that it is because those processor don´t have support for ADX (https://en.wikipedia.org/wiki/Intel_ADX) neither BMI2 (https://en.wikipedia.org/wiki/X86_Bit_manipulation_instruction_set#BMI2) instruction sets (in particular they don´t have the instructions ADOX or MULX).

In both dmesg | grep cpu0 | grep ADX | grep BMI2 produces an empty output, while in the CPUs where the non-portable tests pass it produces a non-empty output.

Since I guess that you don´t plan to add native more non-portable support for CPUs without ADX and BMI2 I think your patch works in OpenBSD/adJ.

What follows is result of some experiments, I did them by installing gdb from packages i.e doas pkg_add gdb and running it as egdb (the gdb included in the base system will not work with recent executables producing an error like Dwarf Error: wrong version in compilation unit header (is 4, should be 2) [in module /usr/libexec/ld.so]).

1. Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz

% go clean; go clean -cache; go test -v             
Caught SIGILL in blst_cgo_init, consult <blst>/bindings/go/README.md.
exit status 132
FAIL    github.com/supranational/blst/bindings/go       0.004s
% go test -c
% nm go.test | grep K256
00283cc0 r K256
0027f540 r K256
% egdb go.test
...
(gdb) run -test.v
Starting program: /home/vtamara/comp/go/blst-dot-asm/bindings/go/go.test -test.v                                                                                
                                                                                                                                                                
Program received signal SIGILL, Illegal instruction.                            
__mulx_mont_384 () at /home/vtamara/comp/go/blst-dot-asm/bindings/go/../../build/elf/mulx_mont_384-x86_64.s:2009                                                
2009            adoxq   %rdi,%r9                                                
(gdb) bt                                                                                                                                                        
#0  __mulx_mont_384 () at /home/vtamara/comp/go/blst-dot-asm/bindings/go/../../build/elf/mulx_mont_384-x86_64.s:2009                                            
#1  0x00000000004e6bab in sqrx_mont_384 () at /home/vtamara/comp/go/blst-dot-asm/bindings/go/../../build/elf/mulx_mont_384-x86_64.s:2442                        
#2  0x00000000004bc2e6 in blst_cgo_init () at /home/vtamara/comp/go/blst-dot-asm/bindings/go/blst.go:31                                                         
#3  0x0000000243f062c9 in _dl_call_init_recurse (object=0x29ad6a000, initfirst=0) at /usr/src/libexec/ld.so/loader.c:882                                        
#4  0x0000000243f037b5 in _dl_call_init (object=0x29ad6a000) at /usr/src/libexec/ld.so/loader.c:821                                                             
#5  _dl_boot (argv=<optimized out>, envp=<optimized out>, dyn_loff=<optimized out>, dl_data=0x708766f7e0b0) at /usr/src/libexec/ld.so/loader.c:730              
#6  0x0000000243f02366 in _dl_start () at /usr/src/libexec/ld.so/amd64/ldasm.S:61                                                                               
#7  0x0000000000000000 in ?? ()

...

% dmesg | grep cpu0 | grep ADX                       
%

2. Intel(R) Core(TM) i5-8265U CPU @ 1.60GHz

Since the branch sha256-rodata of the repository your repository works. To make it fail and understand I used the master branch in the following experiment:

% dmesg | grep cpu0 | grep ADX | grep BMI2
cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,SGX,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,MPX,RDSEED,ADX,SMAP,CLFLUSHOPT,PT,SRBDS_CTRL,MD_CLEAR,TSXFA,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,SKIP_L1DFL,MISC_PKG_CT,ENERGY_FILT,GDS_CTRL,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
% go clean; go clean -cache; CGO_CFLAGS="-O2 -D__BLST_PORTABLE__"  go test -v
...
=== CONT  TestEmptySignatureMinPk                                              
=== CONT  TestSignVerifyAggregateMinSig                                        
SIGSEGV: segmentation violation                                                
PC=0x4e19e9 m=3 sigcode=2 addr=0x4e2a80                                        
signal arrived during cgo execution                                            
                                                                               
goroutine 28 gp=0xc00014c380 m=3 mp=0xc00004b008 [syscall]:                    
runtime.cgocall(0x4bcf50, 0xc000143c60)                                        
        /usr/local/go/src/runtime/cgocall.go:157 +0x4b fp=0xc000143c38 sp=0xc000143c00 pc=0x36a9ab
github.com/supranational/blst/bindings/go._Cfunc_blst_keygen(0xc00019c020, 0xc00019c000, 0x20, 0x0, 0x0)
        _cgo_gotypes.go:467 +0x45 fp=0xc000143c60 sp=0xc000143c38 pc=0x4ac4a5   
github.com/supranational/blst/bindings/go.KeyGen({0xc00019c000, 0x20, 0x0?}, {0x0, 0x0, 0x18?})
        /home/vtamara/comp/go/blst/bindings/go/blst.go:233 +0x88 fp=0xc000143ca0 sp=0xc000143c60 pc=0x4af8c8
github.com/supranational/blst/bindings/go.genRandomKeyMinSig() 
...
% CGO_CFLAGS="-O2 -D__BLST_PORTABLE__"  go test -c
%  nm go.test | grep K256                        
004e2a80 t K256
0027f540 r K256
%  egdb go.test
...
(gdb) run -test.v
...
=== PAUSE TestMultiScalarP2
=== CONT  TestG1HashToCurve
=== CONT  TestBatchUncompressMinSig
=== CONT  TestEmptySignatureMinSig
[New thread 407121]
[New thread 617046]
[New thread 217894]
[New thread 143896]
[New thread 407266]

Thread 2 received signal SIGSEGV, Segmentation fault.
[Switching to thread 407121]
0x00000000004e19e9 in blst_sha256_block_data_order ()

(gdb) bt
#0  0x00000000004e19e9 in blst_sha256_block_data_order ()
#1  0x00000000004da121 in sha256_update ()
#2  0x00000000004bfe21 in expand_message_xmd ()
#3  0x00000000004dcc92 in hash_to_field ()
#4  0x00000000004c495b in blst_hash_to_g1 ()
#5  0x00000000004bcecd in _cgo_1ac5d8788b9b_Cfunc_blst_hash_to_g1 ()
#6  0x00000000003d4844 in runtime.asmcgocall () at /usr/local/go/src/runtime/asm_amd64.s:918
#7  0x000000c0000f5180 in ?? ()
#8  0x00000000003d2bca in runtime.systemstack () at /usr/local/go/src/runtime/asm_amd64.s:509
#9  0x0000000000080000 in ?? ()
#10 0x000000c000006fc0 in ?? ()
#11 0x000000033a54fa30 in ?? ()
#12 0x00000000003d2ac5 in runtime.mstart () at /usr/local/go/src/runtime/asm_amd64.s:394
#13 0x00000000004ef438 in crosscall1 ()
#14 0x00000000003d2ac0 in ?? ()
#15 0x000000c000006fc0 in ?? ()
#16 0x000077e8a60c7f60 in ?? ()
#17 0x00000000004ef2e0 in ?? ()
#18 0x000000033a54fa30 in ?? ()
#19 0x000000033a4ce4a8 in ?? ()
#20 0x00000000004ef30a in threadentry ()
#21 0x000000000036a9d5 in runtime.cgocall (fn=0x4bce90 <_cgo_1ac5d8788b9b_Cfunc_blst_hash_to_g1>, arg=0xc000100de8, ~r0=<optimized out>)
    at /usr/local/go/src/runtime/cgocall.go:175
#22 0x00000000004ac1a5 in github.com/supranational/blst/bindings/go._Cfunc_blst_hash_to_g1 (p0=<optimized out>, p1=0xc00001c0d8 "blst is a blast!! 0", p2=19, 
    p3=0x4f47c0 <github.com/supranational/blst/bindings/go..gobytes> "BLS_SIG_BLS12381G1_XMD:SHA-256_SSWU_RO_NUL_", p4=43, p5=0x0, p6=0, r1=...)
    at _cgo_gotypes.go:429
#23 0x00000000004b6ae5 in github.com/supranational/blst/bindings/go.HashToG1 (msg=..., dst=..., optional=..., ~r0=<optimized out>)
    at /home/vtamara/comp/go/blst/bindings/go/blst.go:1934
#24 0x00000000004a496d in github.com/supranational/blst/bindings/go.TestBatchUncompressMinSig (t=0xc0000f6d00)
    at /home/vtamara/comp/go/blst/bindings/go/blst_minsig_test.go:342
#25 0x000000000043f77b in testing.tRunner (t=0xc0000f6d00, fn={void (testing.T *)} 0xc000100fc8) at /usr/local/go/src/testing/testing.go:1689
#26 0x00000000004407a5 in testing.(*T).Run.gowrap1 () at /usr/local/go/src/testing/testing.go:1742
#27 0x00000000003d4ba1 in runtime.goexit () at /usr/local/go/src/runtime/asm_amd64.s:1695
#28 0x0000000000000000 in ?? ()

(gdb) x/i 0x00000000004e19e9
=> 0x4e19e9 <blst_sha256_block_data_order+169>:     add    0x0(%rbp),%r12d
(gdb) x/32w $rbp
0x4e2a80 <K256>:    0x428a2f98      0x71374491      0xb5c0fbcf      0xe9b5dba5
0x4e2a90 <K256+16>: 0x3956c25b      0x59f111f1      0x923f82a4      0xab1c5ed5
0x4e2aa0 <K256+32>: 0xd807aa98      0x12835b01      0x243185be      0x550c7dc3
0x4e2ab0 <K256+48>: 0x72be5d74      0x80deb1fe      0x9bdc06a7      0xc19bf174
0x4e2ac0 <K256+64>: 0xe49b69c1      0xefbe4786      0x0fc19dc6      0x240ca1cc
0x4e2ad0 <K256+80>: 0x2de92c6f      0x4a7484aa      0x5cb0a9dc      0x76f988da
0x4e2ae0 <K256+96>: 0x983e5152      0xa831c66d      0xb00327c8      0xbf597fc7
0x4e2af0 <K256+112>:        0xc6e00bf3      0xd5a79147      0x06ca6351      0x14292967
(gdb) print/x $r12d
$2 = 0x2d7e86b7
(gdb)  x $r12d
0x2d7e86b7:     Cannot access memory at address 0x2d7e86b7
(gdb) q
A debugging session is active.                                                 
                                       
        Inferior 1 [process 25564] will be killed.                             
                                                                               
Quit anyway? (y or n) y                                                        

But it is crazy that after this ./go.test will run withouth issue, even in other terminals, and even restarting the X-Window session. It will fail again after rebooting or after compiling again the tests.

Repeating the procedure we found that what makes that the problem "disappears" temporarily is reading from K256 (i.e x/32w $rbp or print/x *(int*)$rbp with the older gdb.

This could have relation with the security policy W^X of OpenBSD http://www.openbsd.org/papers/ven05-deraadt/mgp00029.html. But it is weird that:

  1. The test program is not writing to K256 but to the memory pointed by $r12d. And egdb fails trying to read that memory.
  2. The patch of the branch sha256-rodata that moves K256 to a .rodata/.rdata section seems to fix the problem, but I don´t understand why since I don't see it breaking W^X policy.

Could it be that the problem is not with the policy W^X but with the requirement of recent OpenBSD of position independent code?

3. AMD FX(tm)-8350 4000.42 MHz

% go clean; go clean -cache; go test -v             
Caught SIGILL in blst_cgo_init, consult <blst>/bindings/go/README.md.
exit status 132
FAIL    github.com/supranational/blst/bindings/go       0.004s
% go test -c
% nm go.test | grep K256
00283cc0 r K256
0027f540 r K256
% egdb go.test
...
(gdb) run -test.v
Starting program: /home/vtamara/comp/go/blst/bindings/go/go.test -test.v

Program received signal SIGILL, Illegal instruction.
sqrx_mont_384 () at /home/vtamara/comp/go/blst/bindings/go/../../build/elf/mulx_mont_384-x86_64.s:2441
2441            mulxq   %rdx,%r8,%r9
(gdb) bt
#0  sqrx_mont_384 () at /home/vtamara/comp/go/blst/bindings/go/../../build/elf/mulx_mont_384-x86_64.s:2441
#1  0x00000000004bc2a6 in blst_cgo_init () at /home/vtamara/comp/go/blst/bindings/go/blst.go:31
#2  0x00000002e1c344e9 in _dl_call_init_recurse (object=0x2d4be0000, initfirst=0) at /usr/src/libexec/ld.so/loader.c:882
#3  0x00000002e1c2d5d5 in _dl_call_init (object=0x2d4be0000) at /usr/src/libexec/ld.so/loader.c:821
#4  _dl_boot (argv=<optimized out>, envp=<optimized out>, dyn_loff=<optimized out>, dl_data=0x730354d391f0) at /usr/src/libexec/ld.so/loader.c:730
#5  0x00000002e1c2c186 in _dl_start () at /usr/src/libexec/ld.so/amd64/ldasm.S:61
#6  0x0000000000000000 in ?? ()
...
% dmesg | grep cpu0 | grep BMI2
%

@dot-asm
Copy link
Collaborator

dot-asm commented Mar 11, 2024

I guess that you don´t plan to add native support for CPUs without ADX

??? Portable builds are designed to run on CPUs without ADX, so what do you mean?

the initial blst had conflict with the security policy W^X of OpenBSD

W^X effectively translates to "if writable, then non-executable" and it's the only thing that x86 hardware can actually enforce. In sense that executable page can be non-writable, but it can't be non-readable. OpenBSD appears to refuse to map .text pages upon data references, but if a .text page is mapped through execution, it's not a problem to read it. So that it's not really a policy, but rather just a artificial hurdle. One can even argue it's counterproductive :-)

@vtamara
Copy link
Author

vtamara commented Mar 11, 2024

??? Portable builds are designed to run on CPUs without ADX, so what do you mean?

Sorry I meant non-portable builds for CPUs without ADX. But forget the comment please.

Focusing in OpenBSD/adJ on platforms amd64 I need to remember that the portable blst is generic for any x86-64 processor and the non-portable is optimized for processors with ADX and BMI2. However the definition of portable and non-portable of https://github.com/supranational/blst/blob/master/README.md is too different to what I need to remember. IMHO the problem with the instructions ADOX ad MULX in some (possibly old) processors will also happen in Windows and Linux.

@vtamara
Copy link
Author

vtamara commented Mar 12, 2024

the initial blst had conflict with the security policy W^X of OpenBSD

W^X effectively translates to "if writable, then non-executable" and it's the only thing that x86 hardware can actually enforce. In sense that executable page can be non-writable, but it can't be non-readable. OpenBSD appears to refuse to map .text pages upon data references, but if a .text page is mapped through execution, it's not a problem to read it. So that it's not really a policy, but rather just a artificial hurdle. One can even argue it's counterproductive :-)

After experimenting more I'm not so sure that the crash was due to a conflict with W^X. Could it be with the requirement of position indepenent executables of recent OpenBSD/adJ? (see PIE in https://www.openbsd.org/53.html and https://www.openbsd.org/papers/asiabsdcon2015-pie-slides.pdf possibly slide 22)

I reviewed http://www.openbsd.org/papers/ven05-deraadt/mgp00016.html and some experiments I had done before https://dhobsd.pasosdejesus.org/wxorx.html (sorry it is in spanish, I had to update the example to be Position Independent, the first example is a program that tries to modify its code, second one is a program that writes code in its data and tries to execute it, both produce SIGSEGV due to the W^X policy).

My understanding of W^X applied to ELF binaries that is our case is that there cannot be a section with the flag WRITE and with the flag EXEC. IMHO it is a secure practice (that has been adopted by several operating systems).

@dot-asm
Copy link
Collaborator

dot-asm commented Mar 16, 2024

My understanding of W^X applied to ELF binaries that is our case is that there cannot be a section with the flag WRITE and with the flag EXEC. IMHO it is a secure practice (that has been adopted by several operating systems).

Right, and sha256-x86_64 was not in violation of the policy in question. It had read-only data in the executable segment. And per x86 architecture specification no kernel can prevent it from working if the translation from logical to physical address for the relevant page is already established. And no OS had problems so far, and this is since the implementation inception in 2005. Yes, it means that if the page with K256 was mapped for whatever reason, it would have worked even here. As was demonstrated by "manually" mapping it in the debugger.

@dot-asm
Copy link
Collaborator

dot-asm commented Mar 16, 2024

The fix is committed. Thanks for the report!

@dot-asm dot-asm closed this as completed Mar 16, 2024
@vtamara
Copy link
Author

vtamara commented Mar 18, 2024

I have tested a lot your fix and it works, I agree before the fix you code was not in violation of the W^X policy. I guess that the change was necessary in OpenBSD/adJ with the answer of Rob at https://unix.stackexchange.com/questions/422985/openbsd-memory-protection-mechanisms-that-are-not-enabled-by-default: "W^X policy: A page may be both writable or executable, but not both. Hence W^X. The idea is to create a .rodata segment with the PROT_READ attribute only thus it loses the PROT_EXEC attribute."

In any case I think that your changes dae1f94 and 6cca12a improve the security of blst because it is more secure to have a read-only table (K256) in a read-only segment. Thanks a lot!

@vtamara
Copy link
Author

vtamara commented Mar 29, 2024

@dot-asm may I know when do you plan to publish a new release?

@dot-asm
Copy link
Collaborator

dot-asm commented May 30, 2024

New release it is out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants