Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade front-end & libs to v2.087.1+ #3093

Merged
merged 26 commits into from
Aug 6, 2019
Merged

Conversation

kinke
Copy link
Member

@kinke kinke commented Jun 21, 2019

No description provided.

@kinke
Copy link
Member Author

kinke commented Jun 28, 2019

TODO:

@kinke kinke marked this pull request as ready for review June 28, 2019 17:01
@kinke
Copy link
Member Author

kinke commented Jun 28, 2019

Ah, Iain seems to be keen on reworking the vthis2 thing (apparently after the gdc 9.1 release), so we might just wait and leave it unimplemented for now (but at least converting the ICEs to user errors).

@kinke kinke force-pushed the merge-2.087 branch 2 times, most recently from 6aa9150 to 9996d1b Compare June 29, 2019 17:11
@kinke kinke changed the title Upgrade front-end & libs to v2.087.0-beta.1 Upgrade front-end & libs to v2.087.0+ Jul 25, 2019
@jacob-carlborg
Copy link
Contributor

How do I run the tests for the shared libraries? Are those in druntime or does LDC have its own tests?

@kinke
Copy link
Member Author

kinke commented Jul 31, 2019

Looking at the CircleCI macOS-x64-sharedLibsOnly log, trying to run (almost?) any executable linked against the shared debug libs (-link-defaultlib-shared -link-defaultlib-debug) should fail, no need to build and run the unittests.

@jacob-carlborg
Copy link
Contributor

I've been able to reproduce it now, thanks. I'll see what I can do.

@kinke
Copy link
Member Author

kinke commented Jul 31, 2019

Thanks. Are you on 10.15 already? With an earlier version, one could re-add rt/osx_tls.c and additionally output that TLS range for cross-checking with the one resulting now.

@jacob-carlborg
Copy link
Contributor

Are you on 10.15 already?

Yes, on one of my computers.

Looks like this assert is the issue:

https://github.com/dlang/druntime/blob/bb0bce70f58d26030dc2d169810e3f84a98d1f9c/src/rt/sections_osx_x86_64.d#L269

What's strange is that there is no indication that the assert is triggered. But the execution doesn't continue after the assert either, at least not in the same function.

Anyway, removing that assert seems to fix the problem, which makes sense. I think removing the assert and including some of your changes can be upstreamed. I'll work on those changes when I have some more time.

BTW, is it ok to return null from tlsRange?

https://github.com/dlang/druntime/blob/54aef837a7ca48ba30af16eb97b69f86136a5d66/src/rt/sections_elf_shared.d#L140-L143

I'm not sure if it's possible it won't find a TLS symbol that is passed to getTLSRange.

@kinke
Copy link
Member Author

kinke commented Jul 31, 2019

Ah, very helpful, thx.

What's strange is that there is no indication that the assert is triggered. But the execution doesn't continue after the assert either, at least not in the same function.

I guess that's because the assert causes an AssertError to be GC-allocated, before any GC is registered, and that causing a fatal exit after printing that 'no GC registered'. See section_elf_shared's safeAssert().

including some of your changes can be upstreamed. I'll work on those changes when I have some more time.

Having getTLSRange() take a void* TLS address param would fully suffice IMO (and the upstream implementation embedding the dummy TLS symbol somewhere else, e.g., in the caller). So that I don't need to duplicate that function with the risk of going out-of-sync over time.

I'm not sure if it's possible it won't find a TLS symbol that is passed to getTLSRange.

It should find it, the TLS anchor is/should be there in each image built with LDC. But I guess the code chokes on some 3rd-party .dylib without TLS range.

Fixes runnable/test17559.d with -O on Win32 (needs debug druntime for
proper stack traces).
@kinke
Copy link
Member Author

kinke commented Aug 2, 2019

I cannot reproduce the remaining CircleCI crashes during 64-bit shared Phobos unittests either in my Ubuntu 18.04 VM.

This is the backtrace of the segfaulting thread (1st GC worker thread) from a gdb session of phobos2-test-runner-debug-shared std.ascii via SSH:

(gdb) info threads
  Id   Target Id         Frame 
  1    Thread 0x7f1500665880 (LWP 26934) "phobos2-test-ru" clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:78
* 2    Thread 0x7f1500671700 (LWP 30130) "phobos2-test-ru" do_lookup_x (undef_name=undef_name@entry=0x7f14f9e551e1 "__vdso_gettimeofday", 
    new_hash=new_hash@entry=2954611200, old_hash=old_hash@entry=0x7f150066e110, ref=0x7f150066e1c0, result=result@entry=0x7f150066e120, 
    scope=0x7f15007f69c8, i=0, version=0x7f150066e1f0, flags=0, skip=0x0, type_class=0, undef_map=0x7f15007f6710) at dl-lookup.c:338
  3    Thread 0x7f1500652700 (LWP 30131) "phobos2-test-ru" __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
  4    Thread 0x7f150064d700 (LWP 30132) "phobos2-test-ru" clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:78

(gdb) bt
#0  do_lookup_x (undef_name=undef_name@entry=0x7f14f9e551e1 "__vdso_gettimeofday", new_hash=new_hash@entry=2954611200, 
    old_hash=old_hash@entry=0x7f150066e110, ref=0x7f150066e1c0, result=result@entry=0x7f150066e120, scope=0x7f15007f69c8, i=0, 
    version=0x7f150066e1f0, flags=0, skip=0x0, type_class=0, undef_map=0x7f15007f6710) at dl-lookup.c:338
#1  0x00007f15005d81ef in _dl_lookup_symbol_x (undef_name=0x7f14f9e551e1 "__vdso_gettimeofday", undef_map=0x7f15007f6710, 
    ref=0x7f150066e1b8, symbol_scope=0x7f15007f6a98, version=0x7f150066e1f0, type_class=0, flags=0, skip_map=<optimized out>)
    at dl-lookup.c:813
#2  0x00007f14f9e07414 in _dl_vdso_vsym (name=name@entry=0x7f14f9e551e1 "__vdso_gettimeofday", vers=vers@entry=0x7f150066e1f0)
    at ../sysdeps/unix/sysv/linux/dl-vdso.c:39
#3  0x00007f14f9d72ad6 in __gettimeofday_ifunc () at ../sysdeps/unix/sysv/linux/x86/gettimeofday.c:42
#4  0x00007f15005dcf62 in elf_ifunc_invoke (addr=<optimized out>) at ../sysdeps/x86_64/dl-irel.h:32
#5  _dl_fixup (l=<optimized out>, reloc_arg=<optimized out>) at ../elf/dl-runtime.c:143
#6  0x00007f15005e47ca in _dl_runtime_resolve_xsavec () at ../sysdeps/x86_64/dl-trampoline.h:125
#7  0x00007f150073b61d in _D4core4sync6config7mktspecFNbNiKSQBg3sys5posix6signal8timespecZv (t=...) at config.d:36
#8  0x00007f150073b668 in _D4core4sync6config7mktspecFNbNiKSQBg3sys5posix6signal8timespecSQCk4time8DurationZv (t=..., delta=...)
    at config.d:46
#9  0x00007f150073baf9 in _D4core4sync5event5Event4waitMFNbNiSQBi4time8DurationZb (this=0x2, tmout=...) at event.d:254
#10 0x00007f15007549f8 in _D2gc4impl12conservativeQw3Gcx14scanBackgroundMFNbZv (this=0x1) at gc.d:2860
#11 0x00007f1500743182 in _D4core6thread20createLowLevelThreadFNbNiDFNbZvkDFNbZvZ20thread_lowlevelEntryUNbPvZQd (ctx=0x55e87356a0b0 "")
    at thread.d:6679
#12 0x00007f14f96ea6db in start_thread (arg=0x7f1500671700) at pthread_create.c:463
#13 0x00007f14f9dc188f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Partial backtrace of main thread:

#0  clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:78
#1  0x00007f14f96eaec5 in create_thread (thread_ran=<synthetic pointer>, stackaddr=0x7f150064ae80, stopped_start=<synthetic pointer>, 
    attr=0x7fff43aac770, pd=0x7f150064d700) at ../sysdeps/unix/sysv/linux/createthread.c:100
#2  __pthread_create_2_1 (newthread=<optimized out>, attr=<optimized out>, start_routine=<optimized out>, arg=0x55e873568d40)
    at pthread_create.c:797
#3  0x00007f15007430d3 in core.thread.createLowLevelThread(void() nothrow delegate, uint, void() nothrow delegate) (dg=..., 
    stacksize=16384, cbDllUnload=...) at thread.d:6691
#4  0x00007f15007530e9 in _D2gc4impl12conservativeQw3Gcx16startScanThreadsMFNbZv (this=0x1) at gc.d:2825
#5  0x00007f1500752b2b in _D2gc4impl12conservativeQw3Gcx12markParallelMFNbbZv (this=0x1, nostack=false) at gc.d:2732
#6  0x00007f150074c13e in _D2gc4impl12conservativeQw3Gcx11fullcollectMFNbbZm (this=0x1, nostack=false) at gc.d:2611
#7  0x00007f150074e5fc in _D2gc4impl12conservativeQw3Gcx8bigAllocMFNbmKmkxC8TypeInfoZPv (this=0x1, size=4965, 
    alloc_size=@0x7fff43aacce8: 0, bits=10, ti=0x7f15007d59f8 <initializer for TypeInfo_k>) at gc.d:1742
#8  0x00007f15007492a3 in _D2gc4impl12conservativeQw3Gcx5allocMFNbmKmkxC8TypeInfoZPv (this=0x1, size=4965, alloc_size=@0x7fff43aacce8: 0, 
    bits=10, ti=0x7f15007d59f8 <initializer for TypeInfo_k>) at gc.d:1613
#9  0x00007f15007491ad in _D2gc4impl12conservativeQw14ConservativeGC12mallocNoSyncMFNbmkKmxC8TypeInfoZPv (this=0x55e8735698b0, size=4965, 
    bits=10, alloc_size=@0x7fff43aacce8: 0, ti=0x7f15007d59f8 <initializer for TypeInfo_k>) at gc.d:389
#10 0x00007f15007490f2 in _D2gc4impl12conservativeQw14ConservativeGC__T9runLockedS_DQCeQCeQCcQCnQBs12mallocNoSyncMFNbmkKmxC8TypeInfoZPvS_DQEgQEgQEeQEp10mallocTimelS_DQFiQFiQFgQFr10numMallocslTmTkTmTxQCzZQFcMFNbKmKkKmKxQDsZQDl (this=0x55e8735698b0, 
    _param_0=@0x7fff43aacd10: 4965, _param_1=@0x7fff43aacd0c: 10, _param_2=@0x7fff43aacce8: 0, 
    _param_3=@0x7fff43aacd00: 0x7f15007d59f8 <initializer for TypeInfo_k>) at gc.d:254
#11 0x00007f1500749366 in _D2gc4impl12conservativeQw14ConservativeGC6qallocMFNbmkxC8TypeInfoZS4core6memory8BlkInfo_ (this=0x55e8735698b0, 
    size=4965, bits=10, ti=0x7f15007d59f8 <initializer for TypeInfo_k>) at gc.d:417
#12 0x00007f150074cd98 in _DThn16_2gc4impl12conservativeQw14ConservativeGC6qallocMFNbmkxC8TypeInfoZS4core6memory8BlkInfo_ () at gc.d:407
#13 0x00007f1500759241 in gc_qalloc (sz=4965, ba=10, ti=0x7f15007d59f8 <initializer for TypeInfo_k>) at proxy.d:173
#14 0x00007f15007382ca in _D4core6memory2GC6qallocFNaNbmkxC8TypeInfoZSQBqQBo8BlkInfo_ (sz=4965, ba=10, 
    ti=0x7f15007d59f8 <initializer for TypeInfo_k>) at memory.d:427
[...]
#28 0x00007f14fdc0226e in std.ascii.__unittest () at /root/project/runtime/phobos/std/digest/hmac.d:279
[...]

@rainers: can you perhaps make some sense out of this?

@rainers
Copy link
Contributor

rainers commented Aug 3, 2019

@rainers: can you perhaps make some sense out of this?

Not really. From the info it looks like the GC is in the process of creating its helper threads (thread 2 running, another just being created). The crashing thread does a lazy symbol lookup on gettimeofday inside a shared library. Maybe an explicit call to this function before the thread creation can work around the problem?

At the same time there seems to be another thread creation in thread 4 (maybe the stack of the other threads can shed some light on what's going on).

@kinke
Copy link
Member Author

kinke commented Aug 3, 2019

With the experimental switch from gettimeofday() to clock_gettime(), the .so issue just appears minimally postponed:

(gdb) info threads
  Id   Target Id                                           Frame 
  1    Thread 0x7f994a3f5880 (LWP 12988) "phobos2-test-ru" clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:78
* 2    Thread 0x7f994a788700 (LWP 12992) "phobos2-test-ru" 0x00007f9950b59571 in ?? () from /lib64/ld-linux-x86-64.so.2

(gdb) bt
#0  0x00007f9950b59571 in ?? () from /lib64/ld-linux-x86-64.so.2
#1  0x00007f9950b5a48f in ?? () from /lib64/ld-linux-x86-64.so.2
#2  0x00007f9950b5f0c0 in ?? () from /lib64/ld-linux-x86-64.so.2
#3  0x00007f9950b6627a in ?? () from /lib64/ld-linux-x86-64.so.2
#4  0x00007f994e6e232d in _D4core4time__T3durVAyaa7_7365636f6e6473ZQBaFNaNbNiNflZSQCcQCa8Duration (length=1564843814)
    at /root/project/runtime/druntime/src/core/time.d:1916
#5  0x00007f994a851340 in _D4core4sync6config7mvtspecFNbNiKSQBg3sys5posix6signal8timespecSQCk4time8DurationZv (t=..., delta=...) at config.d:54
#6  0x00007f994a851315 in _D4core4sync6config7mktspecFNbNiKSQBg3sys5posix6signal8timespecSQCk4time8DurationZv (t=..., delta=...) at config.d:47
#7  0x00007f994a851729 in _D4core4sync5event5Event4waitMFNbNiSQBi4time8DurationZb (this=0x1, tmout=...) at event.d:254
#8  0x00007f994a869c06 in _D2gc4impl12conservativeQw3Gcx14scanBackgroundMFNbZv (this=0x1) at gc.d:2860
#9  0x00007f994a8589f2 in _D4core6thread20createLowLevelThreadFNbNiDFNbZvkDFNbZvZ20thread_lowlevelEntryUNbPvZQd (ctx=0x561ab981d190 "") at thread.d:6679
#10 0x00007f994a416182 in start_thread (arg=<optimized out>) at pthread_create.c:486
#11 0x00007f994a699b1f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

(gdb) thread 1
(gdb) bt
#0  clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:78
#1  0x00007f994a414ddf in create_thread (pd=pd@entry=0x7f994a788700, attr=attr@entry=0x7ffdf544ebf0, stopped_start=stopped_start@entry=0x7ffdf544eb3e, 
    stackaddr=stackaddr@entry=0x7f994a785e80, thread_ran=thread_ran@entry=0x7ffdf544eb3f) at ../sysdeps/unix/sysv/linux/createthread.c:101
#2  0x00007f994a416a0a in __pthread_create_2_1 (newthread=<optimized out>, attr=0x7ffdf544ebf0, start_routine=<optimized out>, arg=<optimized out>)
    at pthread_create.c:826
#3  0x00007f994a858941 in core.thread.createLowLevelThread(void() nothrow delegate, uint, void() nothrow delegate) (dg=..., stacksize=16384, cbDllUnload=...)
    at thread.d:6691
#4  0x00007f994a8683b7 in _D2gc4impl12conservativeQw3Gcx16startScanThreadsMFNbZv (this=0x1) at gc.d:2825
#5  0x00007f994a867e1b in _D2gc4impl12conservativeQw3Gcx12markParallelMFNbbZv (this=0x1, nostack=false) at gc.d:2732
#6  0x00007f994a8615ce in _D2gc4impl12conservativeQw3Gcx11fullcollectMFNbbZm (this=0x1, nostack=false) at gc.d:2611
[...]

@rainers
Copy link
Contributor

rainers commented Aug 3, 2019

Maybe the collection is triggered during program startup from a shared static ctor and core.time isn't initialized yet (so div by ticksPerSec produces a division by zero)? There is already a weird workaround for core.cpuid in the GC.

@kinke
Copy link
Member Author

kinke commented Aug 3, 2019

Nope, the collection was triggered during the std.ascii unittest, as before; druntime should be fully initialized already, incl. a _d_initMonoTime() call before any module ctors.

…Linux

To hack around the unclear, apparently Circle-specific segfaults of the
64-bit shared Phobos testrunners.
@kinke
Copy link
Member Author

kinke commented Aug 5, 2019

I'm leaning towards releasing a first beta this weekend and see if the Linux issue with shared libs can be observed in the wild too. Disabling the new parallel marking feature should be a viable workaround, just in case.

@rainers
Copy link
Contributor

rainers commented Aug 5, 2019

I tried to reproduce the failure this morning (Ubuntu 19.04 in VM), but what I got is a fork error:

***** FAIL release64 std.stdio
core.exception.AssertError@std/stdio.d(1544): Fork crashed
----------------
??:? _d_assert_msg [0x7f2b3e8fa0c9]
??:? void std.stdio.File.__unittest_L1522_C13() [0x7f2b42d5e839]
??:? [0x7f2b42da603b]
??:? void test_runner.doTest(object.ModuleInfo*, ref core.runtime.UnitTestResult) [0x562df8a03cb4]
??:? int test_runner.testAll().__foreachbody1(object.ModuleInfo*) [0x562df8a03dc5]
??:? int rt.minfo.moduleinfos_apply(scope int delegate(immutable(object.ModuleInfo*))).__foreachbody2(ref rt.sections_elf_shared.DSO) [0x7f2b3e92ff01]
??:? int rt.sections_elf_shared.DSO.opApply(scope int delegate(ref rt.sections_elf_shared.DSO)) [0x7f2b3e931259]
??:? int rt.minfo.moduleinfos_apply(scope int delegate(immutable(object.ModuleInfo*))) [0x7f2b3e92feab]
??:? int object.ModuleInfo.opApply(scope int delegate(object.ModuleInfo*)) [0x7f2b3e91a12e]
??:? core.runtime.UnitTestResult test_runner.tester() [0x562df8a03ae2]
??:? runModuleUnitTests [0x7f2b3e900b90]
??:? void rt.dmain2._d_run_main(int, char**, extern (C) int function(char[][])*).runAll() [0x7f2b3e926a2a]
??:? _d_run_main [0x7f2b3e926877]
??:? __libc_start_main [0x7f2b3e667b6a]
??:? _start [0x562df8a038b9]

Maybe this is related to dlang/phobos#7111

@kinke
Copy link
Member Author

kinke commented Aug 5, 2019

That error seems familiar, and I think it's unrelated and has been there for a while. It only happens when running all tests via phobos2-test-runner; the CMake tests invoke it per module, and phobos2-test-runner std.stdio always passes. And I bet phobos2-test-runner-debug-shared std.ascii works just fine in your VM...

Thx for the link, I'll check if that fixes this long-standing inconvenience. Edit: It doesn't.

@rainers
Copy link
Contributor

rainers commented Aug 5, 2019

And I bet phobos2-test-runner-debug-shared std.ascii works just fine in your VM...

You are right, it passes.

@kinke kinke changed the title Upgrade front-end & libs to v2.087.0+ Upgrade front-end & libs to v2.087.1+ Aug 6, 2019
@kinke kinke merged commit 8da4e6f into ldc-developers:master Aug 6, 2019
@kinke kinke deleted the merge-2.087 branch August 6, 2019 20:21
@rainers
Copy link
Contributor

rainers commented Oct 6, 2019

This is the backtrace of the segfaulting thread (1st GC worker thread) from a gdb session of phobos2-test-runner-debug-shared std.ascii via SSH:

Maybe this is related: dlang/druntime#2816

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants