Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Breakpad for crash dump generation #56014

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

hhyyrylainen
Copy link

@hhyyrylainen hhyyrylainen commented Dec 17, 2021

I started working on a simple Breakpad crash reporting integration to Godot related to this proposal: godotengine/godot-proposals#1896

Some current caveats:

  • I have not tested this with Godot 4, I originally made this against the 3.4 stable label and cherry picked to this branch (that other branch is here: https://github.com/Revolutionary-Games/godot/tree/crash_dumper_3.x). I only did a very quick compile fix and checked that Godot starts on this branch. There didn't seem to be a working sample project available in the asset library(?) so I couldn't test with a project.
  • There's some cleanup left to do regarding which files to include and which need to be compiled
  • My local install of clang-format doesn't work with the options in this repo so I couldn't format the files with it (clang-format failed with code 1: /home/hhyyrylainen/Projects/godot/.clang-format:151:1: error: unknown key 'SpacesInLineCommentPrefix') update to Fedora 35 fixed this by having a newer clang version
  • I noticed that Windows crash reporter on destruction doesn't disable itself. seems like nothing else either disables it on shutdown. Is that intended? It's not consistent with Linux. For my own testing I added the line to disable it in the destructor, but I didn't commit that here I added this code
  • I haven't yet tested whether the created crashdumps can be decoded with stackwalk (I'm going to work on my build scripts next to make this work, I'm not confident enough with the Godot build process yet to make a PR to the build scripts repo). Also Windows is less tested than Linux Windows crash dumps only work with the special mingw supporting stackwalk, but no special version is required in the Godot repo when building Godot, so this is just a minor inconvenience.
  • I don't have a mac to develop on, so support for that needs to be done later

@bruvzg

This comment has been minimized.

@hhyyrylainen hhyyrylainen marked this pull request as draft December 17, 2021 10:16
@hhyyrylainen
Copy link
Author

Oh yeah, thanks. I misremembered where that button was supposed to be, and couldn't see it there...

modules/breakpad/SCsub Outdated Show resolved Hide resolved
@fire
Copy link
Member

fire commented Dec 17, 2021

I have approved cicd.

modules/breakpad/SCsub Outdated Show resolved Hide resolved
@Calinou Calinou added this to the 4.0 milestone Dec 17, 2021
@fire
Copy link
Member

fire commented Dec 19, 2021

Some more static check failures and try to collapse the commits by squashing. We prefer one unless you have a logical reason (we allow to have a few commits).

@hhyyrylainen
Copy link
Author

I have ran clang-format and renamed that one place where a file was incorrectly referred to in a file copyright header. I haven't pushed those changes yet as I wanted to check the Windows compile and determine if some of the common files are even needed for any platform. I'll do a manual squash as the last thing once I don't need to tweak the code anymore, as that'll get rid of some intermediate files that would otherwise be in git history.

@hhyyrylainen
Copy link
Author

All issues detected by CI should now hopefully be fixed. I also removed the commented out parts in the build config file and did some last tweaks. Also made sure it compiles on Windows, I'm not super confident about how the string conversion should be used in 4.0, in 3.4 I think I was able to figure out it pretty well.

I think this now has all the changes on the engine side that are needed for crash dump generation. Though, I've not yet gotten to test the entire flow of creating a crash dump and being able to decode it. As an additional difficulty Windows MinGW builds seem a bit problematic regarding extracting symbols but this breakpad fork: https://github.com/DaemonEngine/breakpad seems to have a working tool for that.

@hhyyrylainen hhyyrylainen marked this pull request as ready for review December 19, 2021 15:26
@hhyyrylainen
Copy link
Author

hhyyrylainen commented Dec 19, 2021

I've just ran into a pretty big issue, it seems even though I found a tool that can dump the symbols of a mingw created exe on Linux, it doesn't seem to contain any symbols besides the standard library. It doesn't even detect that any of the Godot source files are included.
I tried to use a modified build script with LINKFLAGS=-Wl,--build-id=md5 CCFLAGS=-g CFLAGS=-g CXXFLAGS=-g however that didn't either result in any more found symbols. Is there some option I'm missing to get Windows debug symbols for Godot?
I don't want to fallback to compiling the Windows version on Windows as that basically requires Windows 10 to run correctly (though I suppose officially Windows 8 is only getting one more year of updates from Microsoft), and I have not found a way to make it work on earlier versions. According to all Microsoft documentation the universal C runtime that visual studio 2019 uses, should be installable on older Windows versions, but I haven't seen anyone able to get that working...

@fire
Copy link
Member

fire commented Dec 19, 2021

Let me get back to you but I think I posted a llvm-mingw workflow.

@fire
Copy link
Member

fire commented Dec 19, 2021

Here's the exact command I use.

PATH=/opt/llvm-mingw/bin:$PATH scons werror=no platform=windows target=release_debug -j`nproc` use_lto=no deprecated=no use_mingw=yes use_llvm=yes use_thinlto=yes LINKFLAGS=-Wl,-pdb= CCFLAGS='-g -gcodeview' debug_symbols=no"

The important part is llvm-mingw and LINKFLAGS=-Wl,-pdb= CCFLAGS='-g -gcodeview'

@hhyyrylainen
Copy link
Author

hhyyrylainen commented Dec 19, 2021

Kind of expected, but the Godot podman build image scripts don't install llvm, so I get this error with that:

sh: line 1: x86_64-w64-mingw32-clang++: command not found

it shouldn't be too difficult to modify my local copy to also install llvm in the build image to test out that way of making the build. First I'll try that thinlto one with the mingw gcc to see if that has any effect.

Edit: turning off lto and turning on the thin lto, seems to have increased the symbols file size to 100 MB, which now seems to have quite many Godot symbols in it.

SConstruct Outdated
@@ -135,6 +135,7 @@ opts.Add(BoolVariable("opengl3", "Enable the OpenGL/GLES3 video driver", True))
opts.Add("custom_modules", "A list of comma-separated directory paths containing custom modules to build.", "")
opts.Add(BoolVariable("custom_modules_recursive", "Detect custom modules recursively for each specified path.", True))
opts.Add(BoolVariable("use_volk", "Use the volk library to load the Vulkan loader dynamically", True))
opts.Add(BoolVariable("breakpad_enabled", "Enable Breakpad crash dump creation.", False))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The common pattern is "use_breakpad". Thoughts?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can change it easily enough.

Copy link
Member

@fire fire Dec 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you change this to "use_breakpad" unless you have reasoning against?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I can. I got stuck yesterday trying to fix that problem I commented about (#56014 (comment)), and I didn't want to switch over to changing this as I have a bunch of experimental local changes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed the scons option and the defines used in C++ as volk seemed to also use that format of preprocessor defines.

@hhyyrylainen
Copy link
Author

I ran into another issue: in release mode on Linux when Godot doesn't register the signal handlers, but instead I leave that to Breakpad, it doesn't work for some reason. I might need to do a workaround where the Godot handlers are always installed, or perhaps the breakpad version I got has a bug, but that would be a pretty serious slip by Google...

@fire
Copy link
Member

fire commented Dec 21, 2021

How do I recreate your bug case? Maybe posting some info here can reveal the answer as I'm not in this area.

@hhyyrylainen
Copy link
Author

After debugging with gdb I think I figured it out. The mono runtime actually messes with the signals. I noticed when putting breakpoints on signal and sigaction that the mono initialization actually overrides the handlers that breakpad installs, however for some reason those signal handlers from mono don't mess with signal handlers installed using the signal function so only breakpad is impacted as it uses the more extensive sigaction function.

Here's where mono messes with the signal handlers:

(gdb) bt
#0  __GI___sigaction (sig=35, act=0x0, oact=0x7fffffffc2d0) at sigaction.c:26
#1  0x0000000000cdcd6d in mono_threads_suspend_search_alternative_signal ()
#2  0x0000000000cdcf65 in mono_threads_suspend_init_signals ()
#3  0x0000000000add5ba in mini_init ()
#4  0x0000000000a353eb in mono_jit_init_version ()
#5  0x0000000000d80bb6 in (anonymous namespace)::gd_initialize_mono_runtime ()
    at modules/mono/mono_gd/gd_mono.cpp:199
#6  GDMono::initialize (this=0x4ce3bf0) at modules/mono/mono_gd/gd_mono.cpp:378
#7  0x0000000001088d1a in CSharpLanguage::init (this=0x4257ec0)
    at modules/mono/csharp_script.cpp:113
#8  0x000000000250b7df in ScriptServer::init_languages ()
    at core/script_language.cpp:170
#9  0x0000000000d4f4bb in Main::setup2 (p_main_tid_override=<optimized out>)
    at main/main.cpp:1489
#10 0x0000000000d547a3 in Main::setup (execpath=<optimized out>, 
    argc=<optimized out>, argv=<optimized out>, p_second_phase=<optimized out>)
    at main/main.cpp:1232
#11 0x00000000009f8469 in main (argc=1, argv=0x7fffffffd9f8)
    at platform/x11/godot_x11.cpp:48

This means that using a workaround of always registering the Godot crash handler signal handlers and passing that onto breakpad, should work. I don't have enough time to test that today, though.

@hhyyrylainen
Copy link
Author

Seems like not even adding code like:

#elif defined(BREAKPAD_ENABLED)

	signal(SIGSEGV, handle_crash);
	signal(SIGFPE, handle_crash);
	signal(SIGILL, handle_crash);

	initialize_breakpad(false);
#endif

works, even though that's the same as in debug mode. Note that the editor properly runs the crash handlers.

So there's definitely something in the Linux Mono runtime when running in release mode exported game that makes it override Godot's and also Breakpad's signal handlers. I doubt that even Godot's own crash handler was able to work in this mode, but then again it's always disabled in the release mode, so I guess no one has encountered this before.

It doesn't seem there's an easily tweakable place in gd_mono.cpp where this happens. So I might just have to put in a workaround in gd_mono.cpp (or some place higher up in the callstack), where I re-initialize Breakpad (on Linux) just after Mono has been loaded to fix the signals it messed up.

@hhyyrylainen
Copy link
Author

Thanks to @Chaosed0 this PR has been updated to be rebased on master and should be working in Godot 4. I haven't personally done a full end to end test as my game currently always crashes on shutdown (due to: #89941) so this wouldn't be helpful to me currently in any case.

Also I didn't yet squash the commits into one as this probably has some tweaks left to do and I was unsure how I should squash things in the case of having multiple authors on a PR.

void report_user_data_dir_usable();

// Due to Mono runtime initialization in release mode overriding signal handlers, Breakpad needs re-initialization after loading it
void report_mono_loaded_to_breakpad();
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like there are some leftover mono bits of the code. I can remove these bits when someone would be able to do a PR review / there's a chance that this PR could be betting any closer to being merged.

thirdparty/README.md Outdated Show resolved Hide resolved
@huwpascoe
Copy link
Contributor

Please could you squash those commits?

@hhyyrylainen
Copy link
Author

What's the policy on squashing commits from multiple authors? Adding co-authored by lines to the squashed commit is required to not lose attribution for the multiple people's work.

Also I didn't squash this yet as it seems unlikely that this would be any more likely to be merged sooner even if I did that. And I kind of suspect that this is going to break at least once more before this can be merged...

I can squash this up if it makes it any more likely that this PR would move forward (and I'd need to know how to handle squashing commits from multiple authors). When is the next possible merge window opening up?

@huwpascoe
Copy link
Contributor

huwpascoe commented Jul 11, 2024

Not sure, but 4.3 is nearly out so this is probably the best time to get it ready to guarantee in 4.4.

Considering how many issues there are with completely worthless crash messages, this PR should be considered top priority, at least I think so. I've been told by @akien-mga that testing PRs are more likely to get them merged. So this can be the pilot PR to prove it.

Please include this when reporting the bug to the project developer.
[1] error(-1): no debug info in PE/COFF executable

programmer humor

Checkout and Build

  • rebase
  • build (but with warnings)

Rebased onto master (383a6e4) without conflicts.
Build Windows 10, x64, MSVC use_breakpad = "yes"

warning STL4043: stdext::checked_array_iterator, stdext::unchecked_array_iterator, and related factory functions are non-Standard extensions and will be removed in the future. std::span (since C++20) and gsl::span can be used instead.
You can define _SILENCE_STDEXT_ARR_ITERS_DEPRECATION_WARNING or _SILENCE_ALL_MS_EXT_DEPRECATION_WARNINGS to suppress this warning.

Test

I placed a bug into a known script-callable function.

#pragma warning(push, 0)
	int *bug = nullptr;
	*bug = 123;
#pragma warning(pop, 0)

After building I removed the pdb to ensure a worthless log.
Tested with both the editor and the running app.

Result

  • pass

In addition to the usual log, a .dmp file was created in "%appdata%\Godot\app_userdata\testproj\crashes"
Opening the .dmp with visual studio pointed to source file and the bug.

@Chaosed0
Copy link
Contributor

Appreciate your time testing the PR. In case it helps speed things along, I really don't care about credit for this - hhyyrylainen did most of the work anyway, and really I just want crash dumps :) so squash my commits as necessary.

@huwpascoe
Copy link
Contributor

@hhyyrylainen
4.3 is out, here's your chance.

@hhyyrylainen
Copy link
Author

Well as far as I know this is still good to go after being fixed in June. So if anyone with merge permissions would like to review this, that'd be pretty great. Or can someone tell me where I'd need to go to ask someone to review this?

This isn't yet super time critical for me as for my game it seems only Godot 4.4 will stop it from crashing 100% of the time, so currently for me these changes are not useful yet.

@hhyyrylainen hhyyrylainen requested a review from fire August 16, 2024 05:52
@huwpascoe
Copy link
Contributor

First it needs to be squashed, then sometime next week I'll start campaigning for this to be merged.

@akien-mga
Copy link
Member

It's not a requirement to squash before it has been reviewed. Once approved, it can be squashed, but there will likely be more work needed once someone does a deep dive to test and review.

Copy link
Member

@Calinou Calinou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested locally, it works as expected.

Example crash dumps: crashes.zip

Example output (I added CRASH_NOW_MSG("Crash message example.") to project_manager.cpp for testing):

ERROR: Crash message example.
   at: _notification (./editor/project_manager.cpp:103)
Crash dump created at: /home/hugo/.local/share/godot/app_userdata/[unnamed project]/crashes/bff06af4-d567-4e06-600635bf-35b8f3ae.dmp

================================================================
handle_crash: Program crashed with signal 4
Engine version: Godot Engine v4.4.dev.custom_build (c1f12813bf9d07cff283fae793aea385272640f9)
Dumping the backtrace. Please include this when reporting the bug to the project developer.
[1] /lib64/libc.so.6(+0x40d00) [0x7ff362c4fd00] (??:0)
[2] bin/godot.linuxbsd.editor.x86_64() [0x4cc0130] (/home/hugo/Documents/Git/godotengine/godot/./editor/project_manager.cpp:103 (discriminator 2))
[3] bin/godot.linuxbsd.editor.x86_64() [0x4cefd9f] (/home/hugo/Documents/Git/godotengine/godot/./editor/project_manager.h:56 (discriminator 14))
[4] bin/godot.linuxbsd.editor.x86_64() [0x9b46534] (/home/hugo/Documents/Git/godotengine/godot/./core/object/object.cpp:873)
[5] bin/godot.linuxbsd.editor.x86_64() [0x5c52cd5] (/home/hugo/Documents/Git/godotengine/godot/./scene/main/node.cpp:279)
[6] bin/godot.linuxbsd.editor.x86_64() [0x5c52c45] (/home/hugo/Documents/Git/godotengine/godot/./core/templates/hash_map.h:455)
[7] bin/godot.linuxbsd.editor.x86_64() [0x5c6b713] (/home/hugo/Documents/Git/godotengine/godot/./scene/main/node.cpp:3204)
[8] bin/godot.linuxbsd.editor.x86_64() [0x5d24c41] (/home/hugo/Documents/Git/godotengine/godot/./scene/main/scene_tree.cpp:473)
[9] bin/godot.linuxbsd.editor.x86_64() [0x2b1c28b] (/home/hugo/Documents/Git/godotengine/godot/./servers/display_server.h:60)
[10] bin/godot.linuxbsd.editor.x86_64() [0x2b11771] (/home/hugo/Documents/Git/godotengine/godot/platform/linuxbsd/godot_linuxbsd.cpp:85)
[11] /lib64/libc.so.6(+0x2a088) [0x7ff362c39088] (??:0)
[12] /lib64/libc.so.6(__libc_start_main+0x8b) [0x7ff362c3914b] (??:0)
[13] bin/godot.linuxbsd.editor.x86_64() [0x2b11565] (??:?)
-- END OF BACKTRACE --
================================================================
[1]    101724 IOT instruction (core dumped)  bin/godot.linuxbsd.editor.x86_64

Some comments:

  • When the crash happens within the editor or project manager, the dump is saved in a project-specific folder. It should go to the editor data folder instead (EditorPaths::get_data_dir()).

  • Could the dump filename be prefixed with an ISO 8601 timestamp? This would allow it to be sorted by date even if filesystem timestamps get overwritten for some reason (or when sharing the file to others).

    • For reference, we use YYYY-MM-DDTHHMMSS for editor screenshots (where T is a literal "T"). I suggest using the same format for consistency.
  • I suggest rewording the message to make it easier to understand for people not familiar with Breakpad:

Breakpad crash dump created at: /path/to/crash.dmp
Please attach this file when reporting issues to the project developer.
  • Interestingly, GitHub doesn't allow uploading .dmp files in issue comments, even though the error message says it's one of the allowed extensions:

We don’t support that file type.
Try again with GIF, JPEG, JPG, MOV, MP4, PNG, SVG, WEBM, CPUPROFILE, CSV, DMP, DOCX, FODG, FODP, FODS, FODT, GZ, JSON, JSONC, LOG, MD, ODF, ODG, ODP, ODS, ODT, PATCH, PDF, PPTX, TGZ, TXT, XLS, XLSX or ZIP.

  • use_breakpad is False by default, which means this feature is disabled by default. This might surprise people compiling from source, so maybe it should be enabled by default? I'm not sure, since official builds won't be able to benefit from it until debugging symbols are distributed or at least preserved somewhere.

  • Is the use_breakpad SCons option needed when there's module_breakpad_enabled, which could default to False? We do something similar for C#, where module_mono_enabled is False by default.

Comment on lines +14 to +16
# TODO: find out when these are needed (if at all)
dwarf_module = False
stabs_module = False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did anyone find that these were needed in the end? If not, I'd remove the relevant code (and perhaps keep a second branch on your fork with this code present).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I remember right, no one has tried this out on mac yet. So there is a chance that these would be required on mac.

If it is determined necessary to get this PR merged, I now have a mac I can use for development so I can check things there to make sure the mac version compiles and can create crash dumps.

@hhyyrylainen
Copy link
Author

@Calinou thanks for the review. Most of that sounds doable.

When the crash happens within the editor or project manager, the dump is saved in a project-specific folder. It should go to the editor data folder instead (EditorPaths::get_data_dir()).

Sounds easily doable. I'll make changes soon (I'm preparing a new release of my game this week which I has to get done this week so that takes priority).

Could the dump filename be prefixed with an ISO 8601 timestamp? This would allow it to be sorted by date even if filesystem timestamps get overwritten for some reason (or when sharing the file to others).

I think this is going to be pretty difficult as I just tried looking and it doesn't seem like breakpad allows overwriting the file name. If I found the right place in the Breakpad code then a crash dump is always generated with a random GUID as the name. So it might be that the only way to update that part is to patch the library itself.

If I remember right, minidumps should have the crash time in the headers, so the actual crash time can be verified by reading the minidump file headers. And I did take a quick peek at the Breakpad code and it looks like it has some header update logic in the minidump writer class that updates the time: header.get()->time_date_stamp = time(NULL);

I suggest rewording the message to make it easier to understand for people not familiar with Breakpad:

I'll do this. Also if it isn't a bad idea, I'll add a tiny bit of delay like half a second (and trying to flush output) before quitting after printing that message. I'm hoping it helps with my usecase where a parent process reading the output for some reason never sees the crash dump info, though it does always appear in terminal. Also could slightly help with command prompt usage if it closes with a slight delay there's a chance the user kind of can see that something was printed.

use_breakpad is False by default, which means this feature is disabled by default. This might surprise people compiling from source, so maybe it should be enabled by default? I'm not sure, since official builds won't be able to benefit from it until godotengine/godot-proposals#1342 or at least preserved somewhere.

I guess that would be fine, but the official builds should not be compiled with Breakpad enabled until symbols can be distributed, because otherwise exported games will create crash dumps that no one will be able to decode. So I think those would cause more confusion than help with anything. If there's some way to default to on only when compiling outside the official release builds, I could do that.

Is the use_breakpad SCons option needed when there's module_breakpad_enabled, which could default to False? We do something similar for C#, where module_mono_enabled is False by default.

I didn't actually know that there was a separate mechanism already to enable / disable parts of the engine. I think I saw some other feature also having a similar SCons option so I followed that, but I can switch to just having a module_breakpad_enabled as the toggle for this feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants