Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault with C++ SDK #999

Closed
Omegastick opened this issue Aug 15, 2019 · 9 comments
Closed

Segfault with C++ SDK #999

Omegastick opened this issue Aug 15, 2019 · 9 comments
Labels
area/user-experience Pertaining to developers trying to use Agones, e.g. SDK, installation, etc kind/bug These are bugs.
Milestone

Comments

@Omegastick
Copy link

Omegastick commented Aug 15, 2019

What happened:
Using the C++ SDK, my application segfaults when trying to communicate with the sidecar.

What you expected to happen:
The application should return a grpc::Status object representing whether or not the call was successful.

How to reproduce it (as minimally and precisely as possible):
Relevant code included below:

bool use_agones = args[{"--agones"}];

std::thread health_thread;
if (use_agones)
{
    agones_sdk = std::make_shared<agones::SDK>();
    spdlog::info("Connecting to agones");
    if (!agones_sdk->Connect())
    {
        throw std::runtime_error("Could not connect to agones");
    }
    spdlog::info("Connected to agones");
    health_thread = std::thread(health_check, agones_sdk);
}

int port;
args({"-p", "--port"}, 7654) >> port;
spdlog::info("Serving on port: {}", port);

auto socket = std::make_unique<zmq::socket_t>(zmq_context, zmq::socket_type::router);
socket->bind("tcp://*:" + std::to_string(port));
server_communicator = std::make_unique<ServerCommunicator>(std::move(socket));

if (use_agones)
{
    spdlog::info("Marking server as ready");
    grpc::Status ready_call_status = agones_sdk->Ready();
    if (!ready_call_status.ok())
    {
        std::string error_message = fmt::format("Could not mark server as ready: {}",
                                                ready_call_status.error_message());
        throw std::runtime_error(error_message);
    }
}

My health check function:

static void health_check(std::shared_ptr<agones::SDK> agones_sdk)
{
    while (!stop)
    {
        bool ok = agones_sdk->Health();
        spdlog::info("Health ping {}", ok ? "sent" : "failed");
        std::this_thread::sleep_for(std::chrono::seconds(2));
    }
}

Anything else we need to know?:
The segmentation fault happens on bool ok = agones_sdk->Health(); and grpc::Status ready_call_status = agones_sdk->Ready();, depending on which comes first I think. Here's the output of the above code.

[08:41:21    info] Connecting to agones
[08:41:21    info] Connected to agones
[08:41:21    info] Serving on port: 7654
[08:41:21    info] Marking server as ready
[08:41:21    info] Health ping sent

As we can see, the first health ping is sent okay, but it segfaults afterwards (segfaults aren't shown in kubect logs, but can be seen in kubectl describe).

cpp-simple works, so I don't think it's an issue with the Kubernetes setup.

Environment:

@markmandel
Copy link
Member

@devjgm are you able to have a look at this?

@devjgm
Copy link
Contributor

devjgm commented Aug 15, 2019

I'm not sure if I have time to dig into this, but here are a few questions that'll help anyone who might look into this:

  1. Do you have a stack trace from the segv? If it dumped core, you should be able to get a stack trace from that. Otherwise, can you run your program in a debugger, then the segv should cause a trap in the debugger and you'll be able to get a stack trace. This would be hugely helpful.

  2. What function agones::SDK function are you calling when it segvs?

  3. Are you able to create a minimal repro case in a single file that demonstrates this problem? That would help someone else to be able to reproduce the problem locally.

@Omegastick
Copy link
Author

Omegastick commented Aug 16, 2019

Trying to debug it with the local SDK server, I've found that the problem only happens when running in a Docker container.

Here's a single file repro

#include <memory>
#include <iostream>

#include <agones/sdk.h>

int main(int /*argc*/, char * /*argv*/ [])
{
    std::cout << "Connecting to agones\n";
    auto agones_sdk = std::make_shared<agones::SDK>();
    if (!agones_sdk->Connect())
    {
        throw std::runtime_error("Could not connect to agones");
    }
    std::cout << "Connected to agones\n";

    std::cout << "Marking server as ready\n";
    grpc::Status ready_call_status = agones_sdk->Ready();
    if (!ready_call_status.ok())
    {
        std::string error_message = "Could not mark server as ready: {}" + ready_call_status.error_message();
        throw std::runtime_error(error_message);
    }

    grpc::Status shutdown_call_status = agones_sdk->Shutdown();
    if (!shutdown_call_status.ok())
    {
        std::string error_message = "Could not mark server as shutdown: " + shutdown_call_status.error_message();
        throw std::runtime_error(error_message);
    }

    return 0;
}

And here's the stack trace when it segfaults.

#0  0x00007ffff7e64def in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::~basic_string() () from /lib/x86_64-linux-gnu/libstdc++.so.6
#1  0x00000000004120b2 in grpc::Status::~Status (this=0x7fffffffe3c0, 
    __in_chrg=<optimized out>)
    at /opt/agones/sdks/cpp/build/install/gRPC/include/grpcpp/impl/codegen/status.h:31
#2  0x00000000004257bb in grpc::Status grpc::GenericDeserialize<grpc::ProtoBufferReader, agones::dev::sdk::Empty>(grpc::ByteBuffer*, google::protobuf::Message*) ()
#3  0x0000000000425d5c in grpc::internal::CallOpSet<grpc::internal::CallOpSendInitialMetadata, grpc::internal::CallOpSendMessage, grpc::internal::CallOpRecvInitialMetadata, grpc::internal::CallOpRecvMessage<agones::dev::sdk::Empty>, grpc::internal::CallOpClientSendClose, grpc::internal::CallOpClientRecvStatus>::FinalizeResult(void**, bool*) ()
#4  0x0000000000422d97 in grpc::CompletionQueue::Pluck(grpc::internal::CompletionQueueTag*) ()
#5  0x000000000041ab13 in agones::dev::sdk::SDK::Stub::Ready(grpc::ClientContext*, agones::dev::sdk::Empty const&, agones::dev::sdk::Empty*) ()
#6  0x000000000041336c in agones::SDK::Ready() ()
#7  0x0000000000411cd8 in main () at /app/src/test.cpp:18

It's segfaulting on Ready() in the code above, but a call to Health() will also segfault.

As I said in #1000, I'm going to try and make the REST API work (one less library to link against) so I wont be able to spend much time debugging this myself. But I have the commit with this problem in my history so if you do need any more info about this let me know.

@markmandel
Copy link
Member

This may show my limited C++ knowledge, but what OS/arch are you compiling on, and what is your Docker base OS?

@Omegastick
Copy link
Author

It's okay, It's probably some silly mistake I'm making. Building in Docker on the gcc:9 image, and running on the ubuntu:19.04 image. Ubuntu 18.04 64 bit host OS, if that matters.

@roberthbailey
Copy link
Member

Your single file repro looks quite similar to the cpp-simple example. The example builds with gcc:8 and runs on debian:stretch, so maybe there is some small difference there that is causing the issue.

@markmandel
Copy link
Member

Doing some issue cleanup. Unless there are any objections, I'll close this issue in 3 days.

@Omegastick
Copy link
Author

Just tried the Dockerfile I was using before again, and can confirm the problem is gone in v1.1.0.

@markmandel markmandel added this to the 1.2.0 milestone Nov 5, 2019
@markmandel markmandel added the area/user-experience Pertaining to developers trying to use Agones, e.g. SDK, installation, etc label Nov 5, 2019
@roberthbailey
Copy link
Member

I'm glad it's been fixed. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/user-experience Pertaining to developers trying to use Agones, e.g. SDK, installation, etc kind/bug These are bugs.
Projects
None yet
Development

No branches or pull requests

4 participants