Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linker error leads to missing symbols on builds with lots of object files #13193

Open
bararchy opened this issue Mar 16, 2023 · 20 comments
Open
Labels
kind:bug A bug in the code. Does not apply to documentation, specs, etc. topic:compiler:codegen

Comments

@bararchy
Copy link
Contributor

We stumbled upon a strange behavior in our project.
It seems that when using a non --release version of the binary and trying to run it we get the error:

/bin/whatever: symbol lookup error: ./bin/whatever: undefined symbol: environ, version Hash(K, V)#update_linear_scan<String, Devtools::Protocol::Media::PlayersCreated.class, UInt32>:(Hash::Entry(String, Devtools::Protocol::ProtocolEvent+.class) | Nil)

It gets even stranger when looking at it with ldd -d

shards build whatever -p --release --debug --error-trace:

ldd -d bin/whatever
	linux-vdso.so.1 (0x00007ffc22d45000)
	libxml2.so.2 => /usr/lib/libxml2.so.2 (0x00007fa31cc98000)
	libgmp.so.10 => /usr/lib/libgmp.so.10 (0x00007fa31fae5000)
	libz.so.1 => /usr/lib/libz.so.1 (0x00007fa31facb000)
	libssl.so.3 => /usr/lib/libssl.so.3 (0x00007fa31cbf8000)
	libcrypto.so.3 => /usr/lib/libcrypto.so.3 (0x00007fa31c600000)
	libpcre.so.1 => /usr/lib/libpcre.so.1 (0x00007fa31cb7e000)
	libm.so.6 => /usr/lib/libm.so.6 (0x00007fa31ca96000)
	libgc.so.1 => /usr/lib/libgc.so.1 (0x00007fa31c594000)
	libevent-2.1.so.7 => /usr/lib/libevent-2.1.so.7 (0x00007fa31c542000)
	libgcc_s.so.1 => /usr/lib/libgcc_s.so.1 (0x00007fa31ca76000)
	libc.so.6 => /usr/lib/libc.so.6 (0x00007fa31c35b000)
	/lib64/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x00007fa31fbd8000)
	liblzma.so.5 => /usr/lib/liblzma.so.5 (0x00007fa31c328000)
	libicuuc.so.72 => /usr/lib/libicuuc.so.72 (0x00007fa31c000000)
	libicudata.so.72 => /usr/lib/libicudata.so.72 (0x00007fa31a200000)
	libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x00007fa319e00000)

shards build whatever -p --debug --error-trace:

ldd -d bin/whatever
	linux-vdso.so.1 (0x00007fff3ad53000)
	libxml2.so.2 => /usr/lib/libxml2.so.2 (0x00007f9459698000)
	libgmp.so.10 => /usr/lib/libgmp.so.10 (0x00007f94595f5000)
	libz.so.1 => /usr/lib/libz.so.1 (0x00007f945dc58000)
	libssl.so.3 => /usr/lib/libssl.so.3 (0x00007f9459555000)
	libcrypto.so.3 => /usr/lib/libcrypto.so.3 (0x00007f9459000000)
	libpcre.so.1 => /usr/lib/libpcre.so.1 (0x00007f94594db000)
	libm.so.6 => /usr/lib/libm.so.6 (0x00007f9458f18000)
	libgc.so.1 => /usr/lib/libgc.so.1 (0x00007f945946f000)
	libevent-2.1.so.7 => /usr/lib/libevent-2.1.so.7 (0x00007f945dc04000)
	libgcc_s.so.1 => /usr/lib/libgcc_s.so.1 (0x00007f945944f000)
	libc.so.6 => /usr/lib/libc.so.6 (0x00007f9458d31000)
	/lib64/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x00007f945dcc2000)
	liblzma.so.5 => /usr/lib/liblzma.so.5 (0x00007f9458cfe000)
	libicuuc.so.72 => /usr/lib/libicuuc.so.72 (0x00007f9458a00000)
	libicudata.so.72 => /usr/lib/libicudata.so.72 (0x00007f9456c00000)
	libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x00007f9456800000)
undefined symbol: environ, version Channel::SelectContext(S)#try_trigger:Bool	(bin/whatever)
undefined symbol: __libc_start_main, version Slice(T)#swap<Int32, Int32>:Slice(Tuple(Awscr::Signer::Header, String))	(bin/whatever)

More strangely, this can be also avoided with adding --single-module to the build command shards build whatever -p --debug --error-trace --single-module:

 ldd -d bin/whatever
	linux-vdso.so.1 (0x00007ffee6bed000)
	libxml2.so.2 => /usr/lib/libxml2.so.2 (0x00007f21aa498000)
	libgmp.so.10 => /usr/lib/libgmp.so.10 (0x00007f21ade0a000)
	libz.so.1 => /usr/lib/libz.so.1 (0x00007f21addf0000)
	libssl.so.3 => /usr/lib/libssl.so.3 (0x00007f21aa3f8000)
	libcrypto.so.3 => /usr/lib/libcrypto.so.3 (0x00007f21a9e00000)
	libpcre.so.1 => /usr/lib/libpcre.so.1 (0x00007f21aa37e000)
	libm.so.6 => /usr/lib/libm.so.6 (0x00007f21aa294000)
	libgc.so.1 => /usr/lib/libgc.so.1 (0x00007f21a9d94000)
	libevent-2.1.so.7 => /usr/lib/libevent-2.1.so.7 (0x00007f21a9d42000)
	libgcc_s.so.1 => /usr/lib/libgcc_s.so.1 (0x00007f21aa274000)
	libc.so.6 => /usr/lib/libc.so.6 (0x00007f21a9b5b000)
	/lib64/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x00007f21adefd000)
	liblzma.so.5 => /usr/lib/liblzma.so.5 (0x00007f21a9b28000)
	libicuuc.so.72 => /usr/lib/libicuuc.so.72 (0x00007f21a9800000)
	libicudata.so.72 => /usr/lib/libicudata.so.72 (0x00007f21a7a00000)
	libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x00007f21a7600000)

I sadly do not have any minimal code example to provide that I can say caused this, it only happens in our full project.
Some things I can maybe add is that we have a BIG project, I think at this point this is the largest Crystal codebase out there, maybe second only to the compiler.

--------------------------------------------------------------------------------
Language                      files          blank        comment           code
--------------------------------------------------------------------------------
Crystal                        1659          24326          18108         166374

When compiling we see 6102 objects in codegen stage (relevant? 🤷 )

If I can offer more information or data let me know 🙏

@bararchy bararchy added the kind:bug A bug in the code. Does not apply to documentation, specs, etc. label Mar 16, 2023
@HertzDevil
Copy link
Contributor

What is the output of crystal --version?

@bararchy
Copy link
Contributor Author

@HertzDevil

Crystal 1.7.3 (2023-03-07)

LLVM: 14.0.6
Default target: x86_64-pc-linux-gnu

But we tried with 1.6 as well, and...LLVM 12 if I remember correctly, on multiple different systems (Mint, Arch)
it seems to be the same behavior across.

@jwoertink
Copy link
Contributor

jwoertink commented Nov 20, 2023

I'm getting this too when running crystal spec.

❯ crystal spec 
/home/jeremy/.cache/crystal/crystal-run-spec.tmp: symbol lookup error: /home/jeremy/.cache/crystal/crystal-run-spec.tmp: undefined symbol: environ, version Float::Printer::print<Float32, String::Builder>:Nil

The error changes depending on which files I run specs again, and which classes I have commented out, but it's always a similar undefined symbol. I've tried Crystal 1.7, 1.8, 1.9, and 1.10. With LLVM 13, 14, and 15.

This seems to happen when I have too many classes required. If I comment out a chunk of my files, the specs pass. Also strange that I can actually boot my application just fine. I just can't run specs at all.

---------------------------------------------------------------------------------------
Language                             files          blank        comment           code
---------------------------------------------------------------------------------------
Crystal                               3754          32180          26479         171981

edit: interesting finding here... turns out if I run spec in release mode, it works fine.... It takes nearly 8 minutes to run, but it does run.

❯ crystal spec --release spec/operations/
............F...............................................................................................*...........................................................................................
❯ crystal spec spec/operations/
/home/jeremy/.cache/crystal/crystal-run-spec.tmp: symbol lookup error: /home/jeremy/.cache/crystal/crystal-run-spec.tmp: undefined symbol: environ, version JSON::ParseException#initialize<String, Int32, Int32, JSON::ParseException+>:JSON::ParseException+

Also, here's a link to the Discord discussion where I was trying different options

@jwoertink
Copy link
Contributor

@bararchy It looks like adding then ENV CC="cc -fuse-ld=lld" might work around the issue. I tried using the --single-module flag, but it seems like that flag doesn't work on specs?

CC="cc -fuse-ld=lld" crystal spec

That allows us to run our specs now. @HertzDevil if there's any info I can provide, or anything you'd like me to try out, let me know. I haven't been able to reduce this to small code yet, but it's very reproducible for me.

@jwoertink
Copy link
Contributor

Also wanted to link to a previous issue where this was happening #7177 (comment) @bcardiff had this solution.

@Blacksmoke16
Copy link
Member

Blacksmoke16 commented Dec 6, 2023

Not sure if it's the same issue, but was able to reproduce something very similar:

PII = 3.14

CONFIG = {float: 0.0}

{%
  CONFIG[:float] = PII
%}

CONFIG[:float] == PII

Results in:

$ crystal run test.cr
/usr/bin/ld: _main.o: in function `__crystal_main':
/home/george/test.cr:9:(.text+0x413): undefined reference to `PII'
clang-16: error: linker command failed with exit code 1 (use -v to see invocation)
Error: execution of command failed with exit status 1: clang "${@}" -o /home/george/.cache/crystal/crystal-run-test.tmp  -rdynamic -L/usr/bin/../lib/crystal -lpcre2-8 -lgc -lpthread -levent -lrt -lpthread -ldl
Linker Output ```sh clang _main.o S-lice40U-I-nt841.o P-ointer40U-I-nt841.o A-rgumentE-rror.o E-xception5858C-allS-tack.o A-rray40P-ointer40V-oid4141.o S-tring.o S-tring5858B-uilder.o G-C-.o U-I-nt64.o S-lice40T-41.o M-ath.o I-nt32.o O-verflowE-rror.o P-ointer40L-ibU-nwind5858E-xception41.o I-ndexE-rror.o C-har.o S-taticA-rray40U-I-nt84432441.o S-taticA-rray40U-I-nt8443212941.o I-nt64.o D-ivisionB-yZ-eroE-rror.o E-xception.o P-ointer40P-ointer40V-oid4141.o F-loat64.o U-I-nt8.o A-rray40S-tring41.o P-ointer40S-tring41.o D-ir.o C-rystal5858S-ystem5858D-ir.o E-N-V-.o C-rystal5858S-ystem5858E-nv.o T-uple40S-tring4432S-tring4432S-tring41.o E-numerable5858R-eflect40I-nt3241.o C-rystal5858S-ystem5858F-ile.o F-ile5858I-nfo.o E-rrno.o T-uple40E-rrno4432E-rrno41.o F-ile5858E-rror.o C-har5858R-eader.o U-I-nt32.o U-nicode.o A-rray40T-uple40I-nt324432I-nt324432I-nt324141.o P-ointer40T-uple40I-nt324432I-nt324432I-nt324141.o R-ange40B-4432E-41.o R-ange40I-nt324432I-nt3241.o T-uple40S-tring4432S-tring4432S-tring4432S-tring41.o W-inE-rror.o F-ile5858N-otF-oundE-rror.o F-ile5858A-lreadyE-xistsE-rror.o F-ile5858A-ccessD-eniedE-rror.o F-ile5858B-adE-xecutableE-rror.o F-ile5858E-rror43.o P-ath.o P-ath5858K-ind.o I-terator5858S-top.o P-ointer40V-oid41.o P-roc40I-nt324432N-il41.o S-taticA-rray40I-nt8443225641.o P-ointer40I-nt841.o F-iber5858S-tackP-ool.o D-eque40P-ointer40V-oid4141.o T-hread5858M-utex.o R-untimeE-rror.o F-ile5858P-ermissions.o F-iber.o T-hread5858L-inkedL-ist40F-iber41.o S-taticA-rray40U-I-nt644432241.o S-lice40U-I-nt6441.o P-ointer40U-I-nt6441.o C-rystal5858S-ystem5858R-andom.o C-rystal5858S-ystem5858S-yscall.o C-rystal5858S-cheduler.o T-hread.o A-tomic40U-I-nt841.o A-tomic40T-hread3212432N-il41.o P-roc40N-il41.o F-iber5858C-ontext.o T-hread5858L-inkedL-ist40T-hread41.o N-il.o N-ilA-ssertionE-rror.o C-rystal5858E-ventL-oop.o C-rystal5858L-ibE-vent5858E-ventL-oop.o C-rystal5858S-pinL-ock.o D-eque40F-iber41.o P-ointer40F-iber41.o T-ime5858S-pan.o C-rystal5858L-ibE-vent5858E-vent5858B-ase.o C-rystal5858L-ibE-vent5858E-vent.o L-ibC-5858T-imeval.o C-rystal5858S-ystem.o P-rocess.o F-ile.o E-xception5858C-allS-tack5858D-lP-hdrD-ata.o P-roc40P-ointer40-9fe377da67d81153f657f4c9b5577cda.o C-rystal5858T-hre-1a3c0c3e3b0f8107f598df47513fbad7.o H-ash40T-hread4432D-eque40F-iber4141.o P-ointer40H-ash58-ba622b1c6ea9db0b943e2b37ba0c5af2.o C-rystal5858T-hre-88d9776fa5d8db97bf74146336548ef8.o H-ash40T-hread443-3b6ede0851cfdf164aad881d7ad88d96.o P-ointer40H-ash58-38a019d6f87bdc3c07da3b863595c827.o A-tomic40T-41.o A-tomic40I-nt3241.o C-rystal5858S-ystem5858F-ileD-escriptor.o I-O-5858E-rror.o H-ash5858E-ntry40T-hread4432D-eque40F-iber4141.o C-rystal5858H-asher.o I-ntrinsics.o P-ointer40U-I-nt1641.o P-ointer40U-I-nt3241.o H-ash5858E-ntry40-92e06b23fc03f57dd1c2a9c041c20f04.o L-ibE-vent25858E-ventF-lags.o I-nt16.o I-O-5858T-imeoutE-rror.o S-lice40F-iber41.o C-rystal5858E-L-F-.o I-O-5858F-ileD-escriptor43.o C-rystal5858E-L-F-5858E-rror.o C-rystal5858E-L-F-5858E-ndianness.o C-rystal5858E-L-F-5858O-S-A-B-I-.o S-taticA-rray40U-I-nt844323276841.o I-O-5858E-O-F-E-rror.o C-rystal5858E-L-F-5858I-dent.o U-I-nt16.o I-O-5858B-yteF-ormat5858B-igE-ndian.o S-taticA-rray40U-I-nt84432241.o I-O-5858B-yteF-ormat5858L-ittleE-ndian.o C-rystal5858E-L-F-5858K-lass.o S-taticA-rray40U-I-nt84432841.o A-rray40C-rystal5858E-L-F-5858S-ectionH-eader41.o P-ointer40C-rystal5858E-L-F-5858S-ectionH-eader41.o I-O-5858S-eek.o C-rystal5858E-L-F-5858S-ectionH-eader.o I-O-5858D-ecoder.o C-rystal5858I-conv.o T-uple40S-tring4432S-tring41.o I-nvalidB-yteS-equenceE-rror.o C-rystal5858D-W-A-R-F-5858S-trings.o C-rystal5858D-W-A-R-F-5858L-ineN-umbers.o A-rray40A-rray40C-636f430173768b1ef8a4f1bb1b26b618.o P-ointer40A-rray4-591a888aad3429ab878e7889f98d6cb6.o C-rystal5858D-W-A-876b67c856caed70209876736196f578.o A-rray40U-I-nt841.o S-taticA-rray40U-I-nt84432141.o I-nt8.o C-rystal5858D-W-A-1be94dc7496191cebca63387242f7ebd.o A-rray40C-rystal5-6cb63606ee68c313b77145561af0d53f.o P-ointer40C-rysta-61663f71f8885996a1db49b7f505cfed.o C-rystal5858D-W-A-R-F-.o B-ool.o A-rray40C-rystal5-799a983d6bccb82d81b6344bc68004db.o P-ointer40C-rysta-d1877dbab0e5695ee010d30612429b43.o C-rystal5858D-W-A-9588971e31ade86daa0caec5c2a7a45f.o C-rystal5858D-W-A-R-F-5858F-O-R-M-.o C-rystal5858D-W-A-121f60c3f64440e2c280f2084766e41c.o C-rystal5858D-W-A-937402c393dd6d3042bf0fc736d67b34.o C-rystal5858D-W-A-R-F-5858L-ineN-umbers5858R-ow.o A-rray40C-rystal5-a16bd51dfcaca30f4af2725dbdd7bd16.o P-ointer40C-rysta-75a3b5062ef2a02254d15f4644e701c0.o C-rystal5858D-W-A-R-F-5858L-N-E-.o C-rystal5858D-W-A-R-F-5858L-N-S-.o A-rray40T-uple40U-997f9e533f193da5e605a2014e7947d9.o P-ointer40T-uple4-aa2454898accd295538d997a0365d41d.o C-rystal5858D-W-A-R-F-5858I-nfo.o C-rystal5858D-W-A-R-F-5858A-bbrev.o A-rray40C-rystal5858D-W-A-R-F-5858A-bbrev41.o P-ointer40C-rystal5858D-W-A-R-F-5858A-bbrev41.o A-rray40C-rystal5-85d16306ed99e9d40f38f1fabbf60a71.o P-ointer40C-rysta-412258c0a74d56232e18d2b8d2d1d319.o C-rystal5858D-W-A-R-F-5858A-bbrev5858A-ttribute.o A-rray40T-uple40C-e6d71a18f860dd68ac8c00d9270cb553.o P-ointer40T-uple4-496d35dba500b76cdb8737ca44397ad8.o U-I-nt128.o S-taticA-rray40U-I-nt844321641.o U-nicode5858C-aseO-ptions.o H-ash40I-nt324432-75cf0889665a4e7008b9d865daf34063.o P-ointer40H-ash58-82aeac833e95b8d7e4bebe696c9eebf5.o H-ash5858E-ntry40-2cbbc8b2d8a06e2b1695995f1b4f7cb2.o A-rray40T-uple40I-nt324432I-nt324141.o P-ointer40T-uple40I-nt324432I-nt324141.o C-rystal5858D-W-A-R-F-5858T-A-G-.o C-rystal5858D-W-A-R-F-5858A-T-.o T-ypeC-astE-rror.o S-tring5858T-oU-nsignedI-nfo40T-41.o S-tring5858T-oU-nsignedI-nfo40U-I-nt3241.o E-xception43.o T-uple40I-nt324432I-nt3241.o E-numerable5858E-mptyE-rror.o T-uple40C-har41.o T-uple40C-har4432C-har41.o T-uple40C-har4432-809e6e92b20f8837d04aeb284952aba9.o R-ange40C-har4432C-har41.o T-uple40C-har4432-da0bae55c641cf3666f281b8fea01988.o C-rystal5858S-mallD-eque40C-har4432241.o S-taticA-rray40C-har4432241.o P-ointer40C-har41.o P-ath5858P-artI-terator.o I-terator40T-41.o C-rystal5858A-tE-xitH-andlers.o A-rray40P-roc40I--412183da1a81233e2c20c488cb2e890d.o P-ointer40P-roc40-6252237c86bac28658266374d639ebc8.o I-O-5858F-ileD-escriptor.o S-taticA-rray40U-I-nt8443225641.o F-ile5858T-ype.o C-rystal5858S-ystem5858P-rocess.o I-nt128.o C-rystal5858S-ystem5858F-iber.o I-O-5858E-ncoder.o S-taticA-rray40U-I-nt84432102441.o T-ime5858L-ocatio-c92d7ae948ed8f4d38520c6d460d1469.o T-ime5858L-ocation5858I-nvalidL-ocationN-ameE-rror.o T-ime5858L-ocation5858I-nvalidT-Z-D-ataE-rror.o T-ime5858F-ormat5858E-rror.o T-ime5858F-loatingT-imeC-onversionE-rror.o R-egex5858E-rror.o P-ath5858E-rror.o F-ile5858B-adP-atternE-rror.o C-hannel5858C-losedE-rror.o B-ase645858E-rror.o E-numerable5858N-otF-oundE-rror.o N-otI-mplementedE-rror.o K-eyE-rror.o P-roc40F-iber4432-6eb246a0a45118d3c5507cc830b14a70.o C-rystal5858S-ystem5858S-ignal.o A-tomic5858F-lag.o C-rystal5858S-ystem5858S-igset.o L-ibC-5858S-igsetT-.o I-O-.o S-taticA-rray40I-nt324432241.o P-ointer40I-nt3241.o H-ash40S-ignal4432P-roc40S-ignal4432N-il4141.o P-ointer40H-ash58-0716a66562827a635360751c7a0bfb0b.o H-ash5858E-ntry40-37a9b13947932e73bc9389d755897071.o S-ignal.o C-rystal5858S-ystem5858S-ignalC-hildH-andler.o M-utex.o M-utex5858P-rotection.o H-ash40I-nt324432C-hannel40I-nt324141.o P-ointer40H-ash58-438550d1ae4863022fe79cbb17acca82.o H-ash5858E-ntry40I-nt324432C-hannel40I-nt324141.o C-hannel40I-nt3241.o C-hannel5858S-ender40I-nt3241.o P-ointer40C-hannel5858S-ender40I-nt324141.o C-rystal5858P-oin-c4f99d9688c0365d7ded28861381dd6c.o P-ointer40C-hannel5858R-eceiver40I-nt324141.o C-hannel5858R-eceiver40I-nt3241.o C-hannel5858S-electC-ontext40I-nt3241.o C-hannel5858S-electC-ontextS-haredS-tate.o A-tomic40C-hannel5858S-electS-tate41.o D-eque40I-nt3241.o C-hannel5858D-eliveryS-tate.o C-rystal5858P-oin-50c6dff1ac01adb5932ab2c073b74d07.o C-hannel5858S-electC-ontext40N-il41.o H-ash40I-nt324432I-nt3241.o P-ointer40H-ash5858E-ntry40I-nt324432I-nt324141.o H-ash5858E-ntry40I-nt324432I-nt3241.o L-ibC-5858S-tackT-.o L-ibC-5858S-igaction.o E-xception5858C-allS-tack5858R-epeatedF-rame.o P-roc40I-nt324432-2d8bddf8cd1ef75cabf4552d85b24bd1.o H-ash40T-uple40U-I-nt644432S-ymbol414432N-il41.o P-ointer40H-ash58-a9629771b522352703d78982a5dbc6bf.o H-ash40U-I-nt644432U-I-nt6441.o P-ointer40H-ash58-6ca1ab2004fcb02b740a9bf3b468352a.o R-egex5858P-C-R-E-2.o C-rystal5858T-hre-d462b6791eae976eb891ba449f0b759c.o H-ash40T-hread443-9de4f59fdab9a63273127e0d9365d20b.o P-ointer40H-ash58-435d0efaa51926606aace03060793bb4.o H-ash40S-tring443-d0ef6a484ca24b5e9e1cd150630d96a3.o P-ointer40H-ash58-78e716075e5ec360a5549c235b8e9c11.o C-rystal5858O-nceS-tate.o A-rray40P-ointer40B-ool4141.o P-ointer40P-ointer40B-ool4141.o P-ointer40B-ool41.o L-E-B-R-eader.o L-ibU-nwind5858A-ction.o L-ibU-nwind5858R-easonC-ode.o F-loat32.o C-rystal.o -o /home/george/.cache/crystal/crystal-run-test.tmp -rdynamic -L/usr/bin/../lib/crystal -lpcre2-8 -lgc -lpthread -levent -lrt -lpthread -ldl /usr/bin/ld: _main.o: in function `__crystal_main': ```

@ysbaddaden
Copy link
Contributor

I just reproduced the exact same linker error but with another symbol while trying to run the std spec. It always fails with:

std_spec: symbol lookup error: std_spec: undefined symbol: __libc_start_main, version Pointer(T)#+<Int32>:Pointer(Slice(Int32))

I can build execute any other crystal program successfully (including the compiler). I can also run the crystal std specs from a docker image (crystallang/crystal:1.10-alpine) so there is something wrong happening on my host.

My host: Ubuntu 20.04 / GCC 9.4 / LLVM 16. I also tried Clang 16 and of course clearing the crystal cache. I didn't try lld.

@straight-shoota
Copy link
Member

I suspect this has something to do with the size of the binary, or the number of symbols. Something like that. std_spec is fairly huge because it contains the entirety of the standard library plus the tests. It has much more code than most other Crystal programs.

@straight-shoota
Copy link
Member

straight-shoota commented May 9, 2024

I think this is exactly the same problem as #7177: The number of object files that we can pass as arguments to the linker on Linux is limited (apparently to 4096 in some configurations).

This only happens for very large code bases. Even the compiler seems to be fine. But the stdlib specs seem to exceed it, as do other large programs.
For the Crystal repo we added a workaround in the Makefile to use lld (if available) instead of ld (#8641). Apparently lld doesn't have this restriction (or a higher limit that hasn't been exceeded yet).

Configuring to use lld prevents the problem for a specific build (or configuration). But we need a generic solution to make large code bases work with the standard linker on linux.

There is actually a quite similar problem on Windows, except that the limit is on the entire string length of the process arguments, not the number of individual arguments. And this limit is far easier to reach even with smaller code bases.

We've had a proper fix on Windows for a long time (#9062): If the arguments are too long, write them to a file and tell the linker to read from there. The syntax for this is @file. It is supported on Windows (cl.exe) as well as on Linux (gnu-ld, lld gold, ...). So I presume we should be able to use the same mechanism on Linux.

This issue hasn't been reported on BSD-based systems (including macOS) yet, so I don't think it's an issue there.

@straight-shoota
Copy link
Member

Writing a patch for this is relatively straightforward. But validating if it fixes the issue is harder than I thought.
Does anyone have a simple reproduction with a (relatively) low number of object files?

I've managed to link up to about 31k object files using the code from #7177 (comment). But I can reach the same number without the patch as well (tested in docker images crystallang/crystal:1.7.3-build and crystallang/crystal:1.10.1-build).
More than 31k object files fails regardless of whether the arguments are written to a file or not.

The failing threshold is significantly higher than some numbers reported in #7177. So we might be scratching a completely different limit and the original issue was resolved at some point or it doesn't reproduce in the cases that I tested.

Maybe someone can try the patched compiler on their code (assuming the failing condition still reproduces with a stock compiler).

This is my branch: https://github.com/straight-shoota/crystal/tree/feat/link-command-args-file

Note: Regardless of whether this fixes a linking problem, I already find it quite pleasant to reduce the noise whith --verbose. If you want to check which linker command is used, you don't need to see all the object file names. Putting them in a file makes a lot of sense (you can still go look there if you're curious).

@straight-shoota straight-shoota self-assigned this May 9, 2024
@jwoertink
Copy link
Contributor

I'm not sure how to reduce the code for easier testing, but I can say that the branch gives me different results...

❯ crystal spec 
/home/jeremy/.cache/crystal/crystal-run-spec.tmp: symbol lookup error: /home/jeremy/.cache/crystal/crystal-run-spec.tmp: undefined symbol: environ, version Avram::Validations#validate_required<Avram::Attribute(String)+>:Bool

❯ /home/jeremy/Development/crystal/shoota-crystal/bin/crystal spec 
Using compiled compiler at /home/jeremy/Development/crystal/shoota-crystal/.build/crystal
Showing last frame. Use --error-trace for full trace.

In spec/actions/api/oauth/token_spec.cr:16:32

 16 | action = Api::Oauth::Token.with(
                                 ^---
Error: missing arguments: client_id, refresh_token

❯ CC="cc -fuse-ld=lld" GC_DONT_GC=1 crystal spec
.........................................................................................
Finished in 17.68 seconds
547 examples, 0 failures, 0 errors, 1 pending

The 2 arguments it says I'm missing have a default value of nil if that makes any different.

@straight-shoota
Copy link
Member

Thanks for testing!
My patch is just a single commit that only affects the linker command.
Maybe some other change on master is interfering here. Or anything else? It's pretty much impossible that my change has any effect on the semantics of a program.

Could you cherry-pick the last commit from my branch on top of the commit of the working compiler version and try again?

@jwoertink
Copy link
Contributor

Alright, I took the 1.12.1 branch, and applied that single commit and that gives me this

❯ /home/jeremy/Development/crystal/lang/bin/crystal spec 
Using compiled compiler at /home/jeremy/Development/crystal/lang/.build/crystal
/usr/bin/ld: /usr/lib/gcc/x86_64-linux-gnu/11/../../../x86_64-linux-gnu/Scrt1.o: in function `_start':
(.text+0x1b): undefined reference to `main'
collect2: error: ld returned 1 exit status
Error: execution of command failed with exit status 1: cc @/home/jeremy/.cache/crystal/home-jeremy-Sites-joysticktv-spec/object_names.txt -o /home/jeremy/.cache/crystal/crystal-run-spec.tmp  -rdynamic -L/usr/bin/../lib/crystal -lxml2 -lgmp -lyaml -lz `command -v pkg-config > /dev/null && pkg-config --libs --silence-errors libssl || printf %s '-lssl -lcrypto'` `command -v pkg-config > /dev/null && pkg-config --libs --silence-errors libcrypto || printf %s '-lcrypto'` -lpcre2-8 -lm -lgc -lpthread -levent -lrt -lpthread -ldl

@straight-shoota
Copy link
Member

Hm, that's not what I was expecting 🙈 It's quite interesting because it's now an error in the linker, not the loader (build error vs. run error).

And I can reproduce exactly the same behaviour with make std_spec EXPORT_CC=.

@straight-shoota
Copy link
Member

straight-shoota commented May 10, 2024

Oh no. I think I messed up the generation of the arguments file. Was working on this quite late yesterday... 🤦
I don't know why I didn't notice that. My isolated test case certainly worked, though 😕

Anyway, I pushed a fixup commit. But now we're back to the loader error 😢
So it seems to not matter where ld reads the object files from, it just can't handle too many of them. (although the error behaviour is still different, so maybe there's a chance to get this working?)

@straight-shoota straight-shoota changed the title missing symbols when using non-release build Linker error leads to missing symbols on builds with lots of object files May 10, 2024
@ysbaddaden
Copy link
Contributor

ysbaddaden commented May 13, 2024

@straight-shoota I'm trying your patch. I can't compile/run the whole std specs on my Ubuntu 20.04, I must always go through an Alpine docker image.

Nope: there are only 3844 objects. Let's tweak the number.

@ysbaddaden
Copy link
Contributor

Still no luck, even after fixing the object_names.txt file generation. I get:

$ .build/std_spec 
.build/std_spec: symbol lookup error: .build/std_spec: undefined symbol: __libc_start_main, version Array(T)#dup:Array(SemanticVersion)

The symbol exists but it has a GLIBC version, and trying to run the executable is failing to find the symbol and reports an unrelated symbol as the version 🤨

~/work/crystal-lang/crystal $ objdump -t .build/std_spec  | grep libc
0000000000000000       F *UND*  0000000000000000              gnu_get_libc_version@@GLIBC_2.2.5
0000000000000000       F *UND*  0000000000000000              __libc_start_main@@GLIBC_2.2.5
0000000000000000  w    O *UND*  0000000000000000              __libc_stack_end@@GLIBC_2.2.5
00000000022bd4b0 g     F .text  0000000000000065              __libc_csu_init
00000000022bd520 g     F .text  0000000000000005              __libc_csu_fini

More dabbling with objdump seems to confirm it:

$ objdump -x .build/std_spec > /dev/null
objdump: .build/std_spec: .gnu.version_d invalid entry
objdump: warning: private headers incomplete: bad value

Fails on Ubuntu 20.04 / GCC 9.4.0 / LD (binutils) 2.34
Succeeds on Alpine Linux (edge) / GCC 13.2.1 / LD (binutils) 2.42

@ysbaddaden
Copy link
Contributor

Maybe related: https://sourceware.org/bugzilla/show_bug.cgi?id=29566 (I'm not sure)

@straight-shoota
Copy link
Member

I updated my branch to always write object file names to a file and pass that file to the linker (unless there's just a single object file, e.g. with --single-module).
This is probably what we want in the end anyways because it makes --verbose output readable instead of dumping pages of mangled object names which usually don't tell much. If you still want to read them, you can look in the file.

However, as mentioned previously in #13193 (comment), the error is still happening. Putting object names in a file doesn't seem to fix anything (it's probably still a good thing, though).

@straight-shoota
Copy link
Member

I did a bit of comparison between different linkers. lld and mold work. ld (GNU linker) and gold experience the error we're discussing here.

@straight-shoota straight-shoota removed their assignment Sep 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:bug A bug in the code. Does not apply to documentation, specs, etc. topic:compiler:codegen
Projects
None yet
Development

No branches or pull requests

6 participants