Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

intel builds fail with internal compiler errors #1302

Closed
cz4rs opened this issue Mar 5, 2021 · 19 comments
Closed

intel builds fail with internal compiler errors #1302

cz4rs opened this issue Mar 5, 2021 · 19 comments

Comments

@cz4rs
Copy link
Contributor

cz4rs commented Mar 5, 2021

Describe the bug
Intel builds have begun to fail with internal errors recently.

icc-18

FAILED: tests/CMakeFiles/runtime_basic.dir/Unity/unity_0_cxx.cxx.o 
/usr/bin/ccache /opt/intel/install/bin/icpc -DFMT_HEADER_ONLY=1 -DFMT_USE_USER_DEFINED_LITERALS=0 -DHAS_DETECTION_COMPONENT=1 -I/vt/tests/unit -I/vt/lib/fmt -I/vt/lib/CLI -I/vt/lib/libfort/lib -Irelease -I/vt/src -isystem /vt/tests/extern/googletest/googletest/include -isystem /vt/tests/extern/googletest/googletest -isystem /build/checkpoint/install/include -isystem /build/detector/install/include -Wall -pedantic -Wshadow -Wno-unknown-pragmas -Wsign-compare -Werror -O3 -DNDEBUG -fPIC -std=c++14 -MD -MT tests/CMakeFiles/runtime_basic.dir/Unity/unity_0_cxx.cxx.o -MF tests/CMakeFiles/runtime_basic.dir/Unity/unity_0_cxx.cxx.o.d -o tests/CMakeFiles/runtime_basic.dir/Unity/unity_0_cxx.cxx.o -c tests/CMakeFiles/runtime_basic.dir/Unity/unity_0_cxx.cxx
": internal error: ** The compiler has encountered an unexpected problem.
** Segmentation violation signal raised. **
Access violation or stack overflow. Please contact Intel Support for assistance.

icpc: error #10105: /opt/intel/system_studio_2018/compilers_and_libraries_2018.4.253/linux/bin/intel64/mcpcom: core dumped
icpc: warning #10102: unknown signal(-176826192)
icpc: error #10106: Fatal error in /opt/intel/system_studio_2018/compilers_and_libraries_2018.4.253/linux/bin/intel64/mcpcom, terminated by unknown
compilation aborted for tests/CMakeFiles/runtime_basic.dir/Unity/unity_0_cxx.cxx (code 1)

icc-19

FAILED: src/CMakeFiles/vt.dir/Unity/unity_7_cxx.cxx.o 
/usr/bin/ccache /opt/intel/install/bin/icpc -DFMT_HEADER_ONLY=1 -DFMT_USE_USER_DEFINED_LITERALS=0 -DHAS_DETECTION_COMPONENT=1 -I/vt/lib/fmt -I/vt/lib/CLI -Irelease -I/vt/src -I/vt/lib/libfort/lib -isystem /build/checkpoint/install/include -isystem /build/detector/install/include -Wall -pedantic -Wshadow -Wno-unknown-pragmas -Wsign-compare -Werror -O3 -DNDEBUG -fPIC -std=c++14 -MD -MT src/CMakeFiles/vt.dir/Unity/unity_7_cxx.cxx.o -MF src/CMakeFiles/vt.dir/Unity/unity_7_cxx.cxx.o.d -o src/CMakeFiles/vt.dir/Unity/unity_7_cxx.cxx.o -c src/CMakeFiles/vt.dir/Unity/unity_7_cxx.cxx
": internal error: ** The compiler has encountered an unexpected problem.
** Segmentation violation signal raised. **
Access violation or stack overflow. Please contact Intel Support for assistance.

icpc: error #10105: /opt/intel/sw_dev_tools/compilers_and_libraries_2020.1.219/linux/bin/intel64/mcpcom: core dumped
icpc: warning #10102: unknown signal(-536442704)
icpc: error #10106: Fatal error in /opt/intel/sw_dev_tools/compilers_and_libraries_2020.1.219/linux/bin/intel64/mcpcom, terminated by unknown
compilation aborted for src/CMakeFiles/vt.dir/Unity/unity_7_cxx.cxx (code 1)

To Reproduce
Currently all up-to-date PRs are affected.
Running build in local environment (without unity) results in the same error in different stage of the build (while compiling runtime.cc).

Builds are passing again after merging DARMA-tasking/magistrate#183 - reverting that PR allows to reproduce the crashes.
For reduced examples see #1302 (comment) and #1302 (comment) .

Additional context
It seems that recent changes in checkpoint have triggered this behavior (DARMA-tasking/magistrate#181).

When running builds with checkpoint at DARMA-tasking/magistrate@60d6542 builds run fine in local environment (PR created to verify this in CI).

@cz4rs cz4rs added the type: bug label Mar 5, 2021
@cz4rs cz4rs changed the title intel builds failing with internal compiler errors intel builds fail with internal compiler errors Mar 5, 2021
@PhilMiller
Copy link
Member

I'll try to reproduce and reduce. I'll also check Intel 21, which we're having a hard time getting set up for CI

@cz4rs
Copy link
Contributor Author

cz4rs commented Mar 5, 2021

Confirmed that the builds work fine without the last PR in checkpoint - all checks passed at #1303 🤔
DARMA-tasking/magistrate#181 boils down to a simple change:

--- a/src/checkpoint/container/map_serialize.h
+++ b/src/checkpoint/container/map_serialize.h
@@ -96,7 +96,9 @@ inline void serializeUnorderedAssociativeContainer(
 
     auto bucket_count = cont.bucket_count();
     s | bucket_count;
-    cont.rehash(bucket_count);
+    if (s.isUnpacking() and bucket_count > cont.bucket_count()) {
+      cont.rehash(bucket_count);
+    }
   }
 
   serializeMapLikeContainer(s, cont);

It doesn't seem like we are hitting any resource limit here, but the change itself also looks pretty innocent.

@cz4rs
Copy link
Contributor Author

cz4rs commented Mar 5, 2021

A somewhat reduced example triggering the bug:

#include <checkpoint/checkpoint.h>

struct PhaseManager {
  std::unordered_map<int, std::map<int, int>> map_;

  template <typename SerializerT>
  void serialize(SerializerT& s) {
    s | map_;
  }
};


void printMemoryFootprint() {
  PhaseManager* comp;
  checkpoint::getMemoryFootprint(*comp, 0);
}

Compilation line:

/opt/intel/install/bin/icpc -DFMT_HEADER_ONLY=1 -DFMT_USE_USER_DEFINED_LITERALS=0 -DHAS_DETECTION_COMPONENT=1 -I/vt/lib/fmt -I/vt/lib/CLI -Irelease -I/vt/src -I/vt/lib/libfort/lib -Ilib/checkpoint/src -I/vt/lib/checkpoint/src -I/vt/lib/detector/src -Wall -pedantic -Wshadow -Wno-unknown-pragmas -Wsign-compare -O3 -DNDEBUG -fPIC -std=c++14 -MD -MT src/CMakeFiles/vt.dir/vt/runtime/runtime.cc.o -MF src/CMakeFiles/vt.dir/vt/runtime/runtime.cc.o.d -o src/CMakeFiles/vt.dir/vt/runtime/runtime.cc.o -c /vt/src/vt/runtime/runtime.cc

@PhilMiller
Copy link
Member

Oh, wow, well done. I'll toss that at creduce and see what comes out

@cz4rs
Copy link
Contributor Author

cz4rs commented Mar 5, 2021

Also: removing -O3 makes it work.

@PhilMiller
Copy link
Member

Compilation succeeds even with -O2, only failing at -O3

@PhilMiller
Copy link
Member

OK, I'm all set up for a creduce run. You can put manual efforts on hold.

@lifflander
Copy link
Collaborator

This is a surprising bug.

@PhilMiller
Copy link
Member

So, I don't know if this is the crash, but ...

namespace std {
  int& max(int, int);
}
void foo() {
 int a = std::max(0, 1);
}

The namespace std is apparently special - renaming that leads to successful compilation. It must be doing something to special case it in the optimization.

@PhilMiller
Copy link
Member

I'll pass the report along

@PhilMiller
Copy link
Member

It also crashes for 21.1.9

@PhilMiller
Copy link
Member

Per request, I tested Intel's next-generation C++ compiler icpx (I think Clang-based) version 21, and that does compile successfully. It's just not readily available anywhere, and has the same CI issue as the rest of the 21.x oneAPI suite.

@cz4rs
Copy link
Contributor Author

cz4rs commented Mar 8, 2021

I will merge DARMA-tasking/magistrate#183 once the CI passes. (done)

It might be worth noting that there is std::max call at the very beginning of footprinting code:

template <typename T>
std::size_t getMemoryFootprint(T& target, std::size_t size_offset) {
  return size_offset + std::max(
    dispatch::Standard::footprint<T, Footprinter>(target),
    sizeof(target)
  );
}

so it's even more surprising that the additional if has caused this crash to appear.

I assume that we should keep this issue open, I will update the top description once vt builds are confirmed to be passing again.

@PhilMiller
Copy link
Member

I think the compiler crash has something to do with a flaw in how it's implementing return value optimization (RVO), but I can't pin down what distinguishes a case that crashes from one that doesn't.

@lifflander
Copy link
Collaborator

@PhilMiller Do you have any update on this issue from the Intel compiler devs?

@PhilMiller
Copy link
Member

I just checked the repro in #1302 (comment) and found it still crashes 2021.5.0

@PhilMiller
Copy link
Member

Also, that's the compiler version incorporated in Intel oneAPI 2022:

[pbmille@ascic170 ~]$ which icc
/projects/empire/tools/x86_64/oneapi/oneapi-2022.1.2-117/compiler/latest/linux/bin/intel64/icc

[pbmille@ascic170 ~]$ icc --version
icc (ICC) 2021.5.0 20211109

@nlslatt
Copy link
Collaborator

nlslatt commented Jun 13, 2022

@cz4rs @PhilMiller Is this still a concern? We're no longer seeing the internal compiler error since I lowered the build parallelism, but I'm not sure if that's just masking the problem.

@cz4rs
Copy link
Contributor Author

cz4rs commented Jun 14, 2022

@cz4rs @PhilMiller Is this still a concern? We're no longer seeing the internal compiler error since I lowered the build parallelism, but I'm not sure if that's just masking the problem.

I think we can close / archive this issue, as there's nothing that we can do about it (until Intel fixes this).
The workaround that we use (DARMA-tasking/magistrate#183) is probably good enough - it lowers the correctness for unordered containers, but no one seems to complain.

@nlslatt nlslatt closed this as completed Jun 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants