Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: use WER for GOTRACEBACK=wer on Windows #57441

Closed
qmuntal opened this issue Dec 22, 2022 · 47 comments
Closed

runtime: use WER for GOTRACEBACK=wer on Windows #57441

qmuntal opened this issue Dec 22, 2022 · 47 comments

Comments

@qmuntal
Copy link
Member

qmuntal commented Dec 22, 2022

Edit: proposal shifted to enable WER using GOTRACEBACK=wer. See #57441 (comment) and #57441 (comment) for more context.

WER is a Windows built-in infrastructure provided by Microsoft that helps troubleshooting hardware and software problems. Many systems rely on the core dumps generated by WER for post-mortem analyses. These dumps are collected by a dedicated process running on the host and uploaded to the WER server and\or stored locally (see Collecting User-Mode Dumps).

#20498 added support for WER when setting GOTRACEBACK=crash, but was reverted and replaced by #49471. The reason to revert it was because WER uploads dumps to Microsoft by default, and this bit was not discussed in the original issue. See this comment #49471 (comment) to understand why I'm counterproposing #49471 after implementing it.

Here I'm proposing to revert the revert, enabling WER when GOTRACEBACK=crash is set. The arguments come next, but first the definition of the crash mode:

GOTRACEBACK=crash is like “system” but crashes in an operating system-specific manner instead of exiting. For example, on Unix systems, the crash raises SIGABRT to trigger a core dump.

I have two simple arguments in favor of WER:

  • WER is the Windows specific manner to crash applications. It is the default experience and many production systems rely on WER to do post-mortem analysis, not only from within Microsoft.

  • WER can be used for collecting user-mode dumps without adding any complexity to the Go runtime.

It's true that some users might not want their diagnosis data to be uploaded to Microsoft servers, but my point is that they already have enough knobs to turn it off or limit what's uploaded:

  • GOTRACEBACK=crash is not the default runtime behavior, one have to manually set via environment variable or programmatically via debug.SetTraceback. We can inform about the WER gotchas in the release notes and in the GOTRACEBACK docs.
  • On Windows, GOTRACEBACK=crash does nothing over GOTRACEBACK=system. If this proposal is approved, the only reason to set GOTRACEBACK=crash would be to opt into WER.
  • Privacy-concerned users will most likely opt-out to uploading diagnosis data when setting up windows for the first time, which disables WER.
  • WER can easily be disabled by running Disable-WindowsErrorReporting on PowerShell. One can also disable just some applications using the system registry.

Side note: #20498 is a bug report, not a proposal, so I think it makes sense to create this as a formal proposal.

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Dec 22, 2022
@ianlancetaylor ianlancetaylor changed the title runtime: use WER for GOTRACEBACK=crash on Windows proposal: runtime: use WER for GOTRACEBACK=crash on Windows Dec 22, 2022
@gopherbot gopherbot added this to the Proposal milestone Dec 22, 2022
@ianlancetaylor ianlancetaylor moved this to Incoming in Proposals Dec 22, 2022
@ianlancetaylor
Copy link
Member

CC @zx2c4 @zhizheng052 @bhiggins @alexbrainman @aarzilli @aclements @golang/windows

@zx2c4
Copy link
Contributor

zx2c4 commented Dec 22, 2022

Wasn't this already discussed and rejected in an earlier proposal? The privacy concerns seem real, especially given how hard opting out has become on Windows, and how little trust exists on the user side these days...

@ianlancetaylor
Copy link
Member

This is my personal summary: The earlier attempt was removed due to privacy concerns (which you raised on https://go.dev/cl/307372 -- thanks). Then we decided to adopt minidump in #49471. However, the actual implementation of minidump (https://go.dev/cl/458955) turns out to have some problems (see #49471 (comment)). So this proposal suggests that we reconsider WER despite the privacy issues.

That is, I don't think we made a clear decision that we could not live with the privacy issues, because it seemed that minidump was a viable alternative. Now that we have learned more, it's possible that minidump is not a viable alternative. So let's consider whether we can live with the privacy issues.

I don't have an opinion myself.

@bhiggins
Copy link

IMO, it would be nice to have (and simple to implement) an opt-in only means for Go programs on Windows to work with WER. In addition to being opt-in only, there could be a warning in documentation that states that it's possible for WER to be configured to send crash dumps to Microsoft.

@qmuntal
Copy link
Member Author

qmuntal commented Dec 23, 2022

The privacy concerns seem real, especially given how hard opting out has become on Windows, and how little trust exists on the user side these days.

If this proposal is approved, WER would still be opt-in, so only users that need it would have it.

Opting out is just one PS command, Disable-WindowsErrorReporting, so I wouldn't say its hard. Additionally, the previous command only disables uploading dumps to the server, one can still use WER to generate local dumps.

@alexbrainman
Copy link
Member

@qmuntal Thank you for creating this proposal. I agree with your sentiment.

I always supported this functionality #49471 (comment) .

And it looks silly that Delve supports Go Windows minidumps, but Go tools do not provide tools to generate these dumps (see #49471 (comment) ).

Alex

@zx2c4
Copy link
Contributor

zx2c4 commented Dec 25, 2022

My process address space can contain private keys. I can't put those in special memory regions because Go's GC. Ordinary users use my software who can't be fiddling around with powershell. Sometimes my software crashes. When this happens, uploading private keys to some error reporting server is a nonstarter. It must not happen. I can't imagine I'm the only one writing software with this situation, and I wouldn't want to choose the dangerous thing for other developers who might not even be aware that their private keys are sensitive - like casual users of TLS/HTTPS.

@aarzilli
Copy link
Contributor

aarzilli commented Dec 25, 2022

My opinion remains as in #49471 (comment) (i.e. in favor of this).

@bhiggins
Copy link

bhiggins commented Dec 25, 2022 via email

@qmuntal
Copy link
Member Author

qmuntal commented Dec 25, 2022

Can we go with an explicit opt-in method to crash in a WER-compatible way
(which really just means not swallowing the error)? This could involve a
new GOTRACEBACK=crash-wer option and/or a runtime function.

We could, but I don't see how crash-wer would bring any benefit over crash, considering that the later is not currently implemented on Windows. It would add confusion to an already overloaded knob.

@zx2c4 ordinary users will not disable WER using a PS command, but neitheir will set GOTRACEBACK=crash unless someone guides them. I completly understand your privacy concerns, but I don't get why WER being opt-in is not enough of a mitigation.

Additionally, by default WER does not upload the full virtual memory, just the data segment, which can contain global variables, but not the heap nor the stack.

@iglendd
Copy link

iglendd commented Dec 27, 2022

Thank you @qmuntal again for bring it forward again. I would actually argue that the proposal title is a bit misnomer and least as far as intent of the proposal is concerned. The proposal effectively intends to implement Go documented behavior “crashes in an operating system-specific manner”. No more nor less (never mind that this is on par with other platforms). On Windows it means either WER, custom crash handling, debugger activation, documented and used report entry in Windows Event log and usage of built-in and 3rd party tools. Another word, it is expected behavior for bazillion of applications running on Windows for many decades. Besides, it is a Go-runtime debugging and troubleshooting feature. If one is concerned with some side-effects for whatever reasons, do not activate it.

@danicat
Copy link
Contributor

danicat commented Jan 3, 2023

I got to this topic from #20498 and I've spend the last hour or so trying to catch up with all the back and forths, so I'm sorry if I'm missing the point, but I just wanted to give my 2 cents.

As @qmuntal expressed in their opening statement, the documentation on GOTRACEBACK=crash is very opinionated:

GOTRACEBACK=crash is like “system” but crashes in an operating system-specific manner instead of exiting. For example, on Unix systems, the crash raises SIGABRT to trigger a core dump.

Based on that, IMHO, whether the SO sends or sends not the crash information to Microsoft is out of context from a Go program perspective. Microsoft says in their documentation:

Beginning with Windows Vista, Windows provides crash, non-response, and kernel fault error reporting by default without requiring changes to your application.

Then they further elaborate:

Because Windows automatically reports unhandled exceptions, application should not handle fatal exceptions.

First impression is that the decision to send or not send WER data to MS is in the scope of the system admin, not the developers. It is a bit old, but they document the controls of this feature in this article here.

Also, it is not clear to me if the way runtime is currently treating error is effectively "handling fatal exceptions" from a Windows perspective and that's why we are losing this functionality. If that's the case, GOTRACEBACK=crash should only disable any special treatment we do and let the application... er... crash. :) [By that I mean I wouldn't try to add any side effects like generating different types of dumps other than the default]

Another concern is that we are not generating Event Log ID 1000, as previously mentioned by @iglendd in #49471 (comment). This was the main reason why I got into this rabbit hole.

Finally, I'm more than happy to help with any coding if needed. I'm way too invested in this now :)

@kevpar
Copy link
Contributor

kevpar commented Feb 17, 2023

Wanted to chime in with some additional support for this proposal. I think it is worth considering the value of Windows local crash dump handling with WER, which is somewhat separate from uploading of reports to Microsoft (and the latter can be disabled, as @qmuntal points out).

  • WER allows local crash dumps to be collected and stored in a directory. This is documented at Collecting User-Mode Dumps.
  • WER allows the user to register their own application that is invoked when another application crashes. For instance, you could write a program that collects your own dumps, or hooks up a debugger to the crashing process. This is documented at Enabling Postmortem Debugging.

As mentioned above, the current approach of calling MiniDumpWriteDump from the crashing application can deadlock the process. It can also result in corrupted dumps due to the state of the process changing while the dump is being collected. So I think it is worth supporting a mechanism that uses the system facility for handling crashes.

@rsc rsc moved this from Incoming to Active in Proposals Mar 1, 2023
@rsc
Copy link
Contributor

rsc commented Mar 1, 2023

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@rsc
Copy link
Contributor

rsc commented Mar 8, 2023

GOTRACEBACK=crash does not upload things on any system today. It sounds like we should preserve that property, even if it's not the default privacy profile on Microsoft systems. It's also the case that code might contain debug.SetTraceback("crash") intending core dumps on Unix and then accidentally get reports to Microsoft on Windows.

Is there some way to force the WER code to write the local crash but not upload?

If not, then adding GOTRACEBACK=wer seems OK and very clear. Worth noting that debug.SetTraceback("wer") in a library would be interesting and maybe unexpected, but libraries are unlikely to do that.

@qmuntal
Copy link
Member Author

qmuntal commented Mar 8, 2023

Is there some way to force the WER code to write the local crash but not upload?

It is possible with admin permissions either running a PS script or modifying the registry, but AFAIK not possible within a process.

If not, then adding GOTRACEBACK=wer seems OK and very clear. Worth noting that debug.SetTraceback("wer") in a library would be interesting and maybe unexpected, but libraries are unlikely to do that.

I'm not particularly attached to crash, I just wanted to avoid having to set two knobs. GOTRACEBACK=wer looks clean and the intent is 100% clear, so I'm OK with this solution, which ideally should act like crash but enabling WER. Edit: I still think crash is a better fit, but I understand that it can't be used in this context if the not-upload-things-by-default property should be preserved.

@iglendd
Copy link

iglendd commented Mar 8, 2023

Regarding GOTRACEBACK=crash vs GOTRACEBACK=wer I guess whatever works is fine but to me crash is better then wer to be honest. I guess I am with qmuntal on that point.

Regarding privacy and upload concerns I would argue that this is a completely orthogonal issue. Semantically and conceptually we are following exactly the same logic and behavior as on Unix - allowing system to decide what to do. And we would follow Go documentation literally (https://pkg.go.dev/runtime)

GOTRACEBACK=crash is like “system” but crashes in an operating system-specific manner instead of exiting. For example, on Unix systems, the crash raises SIGABRT to trigger a core dump.

In addition, it is not something a Go process would normally do. A user (or rather a developer) would set it up only for troubleshooting just to get their hands on a crash dump to find out a root cause. There are, by the way, more than a few ways to capture crash dump on Windows. In addition to WER one can also use external application like procdump or Dr.Watson, one can attach register post-mortem or debugger (and all of them are based on the existence of unhandled exception). And so, IMHO, the term wer is a bit ambiguous and misleading in this context. I think the term crash is more aptly describes intent,. behavior and semantics for both Go and system side.🙏

@rsc
Copy link
Contributor

rsc commented Mar 15, 2023

We can always take additional steps later, but for now it seems like the simplest, most minimal step is to make GOTRACEBACK=wer use WER and GOTRACEBACK=crash keep doing what it does, which is to be the same as GOTRACEBACK=system. We can see how well that works, it will provide WER for people who need WER, and we can fine-tune later if more evidence accumulates.

On non-Windows systems, GOTRACEBACK=wer would be interpreted the same as GOTRACEBACK=somethingunexpected, which is to say it would be ignored.

Do I have that right?

@aclements
Copy link
Member

On non-Windows systems, GOTRACEBACK=wer would be interpreted the same as GOTRACEBACK=somethingunexpected, which is to say it would be ignored.

It turns out GOTRACEBACK=somethingunexpected has the effect of suppressing all tracebacks, but the code that decides that is so labyrinthine that I don't think that behavior is intentional. It certainly doesn't seem like the right behavior to me.

@qmuntal
Copy link
Member Author

qmuntal commented Mar 15, 2023

Do I have that right?

Yep.

It turns out GOTRACEBACK=somethingunexpected has the effect of suppressing all tracebacks, but the code that decides that is so labyrinthine that I don't think that behavior is intentional. It certainly doesn't seem like the right behavior to me.

I tried to understand how the effective traceback level is selected, and I was also stunned with its complexity.

@rsc
Copy link
Contributor

rsc commented Mar 29, 2023

Based on the discussion above, this proposal seems like a likely accept.
— rsc for the proposal review group

@rsc rsc moved this from Active to Likely Accept in Proposals Mar 29, 2023
@rsc
Copy link
Contributor

rsc commented Apr 6, 2023

No change in consensus, so accepted. 🎉
This issue now tracks the work of implementing the proposal.
— rsc for the proposal review group

@rsc rsc moved this from Likely Accept to Accepted in Proposals Apr 6, 2023
@rsc rsc changed the title proposal: runtime: use WER for GOTRACEBACK=wer on Windows runtime: use WER for GOTRACEBACK=wer on Windows Apr 6, 2023
@rsc rsc modified the milestones: Proposal, Backlog Apr 6, 2023
@qmuntal
Copy link
Member Author

qmuntal commented Apr 6, 2023

I've updated https://go.dev/cl/474915 so it implements final version of this proposal.

@ChristoWolf
Copy link

To clarify, will this be part of Go 1.21?
And I guess that wer can currently (i.e. Go 1.20) also not be used with debug.SetTraceback, right?

@qmuntal qmuntal modified the milestones: Backlog, Go1.21 Apr 14, 2023
@qmuntal
Copy link
Member Author

qmuntal commented Apr 14, 2023

To clarify, will this be part of Go 1.21? And I guess that wer can currently (i.e. Go 1.20) also not be used with debug.SetTraceback, right?

Yep, support for wer in GOTRACEBACK and debug.SetTraceback will be available in the next release, go1.21.

@ChristoWolf
Copy link

ChristoWolf commented Apr 14, 2023

@qmuntal: Wow that was a quick response, thanks!
Sounds great, I am excited :)

Sorry for the probably stupid question, I couldn't fully grasp all the information I found regarding this, but is there currently a way to automatically create mini dumps of crashing Go applications on Windows?

@aarzilli
Copy link
Contributor

You can build go 1.21 from source and use it.

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/499515 mentions this issue: doc: document WER and SEH changes in Go 1.21

gopherbot pushed a commit that referenced this issue May 31, 2023
While here, I've removed the CL 472195 TODO, which I marked as
RELNOTE=yes by mistake.

For #57441
For #57302

Change-Id: I7563140bf307f8d732f0154d02b8fa0735527323
Reviewed-on: https://go-review.googlesource.com/c/go/+/499515
Reviewed-by: Ian Lance Taylor <[email protected]>
TryBot-Result: Gopher Robot <[email protected]>
Reviewed-by: Michael Knyszek <[email protected]>
Run-TryBot: Quim Muntal <[email protected]>
JonasProgrammer pushed a commit to nhochdrei/go that referenced this issue Feb 20, 2024
GOTRACEBACK=wer is a new traceback level that acts as "crash" and
also enables WER. The same effect can be achieved using
debug.SetTraceback("wer").

The Go runtime currently crashes using exit(2), which bypasses WER
even if it is enabled. To best way to trigger WER is calling
RaiseFailFastException [1] instead, which internally launches the
WER machinery.

This CL also changes how GOTRACEBACK=crash crashes, so both "wer" and
"crash" crash using RaiseFailFastException, which simplifies the
implementation and resolves a longstanding TODO.

Fixes golang#57441
Fixes golang#20498

[1] https://learn.microsoft.com/en-us/windows/win32/api/errhandlingapi/nf-errhandlingapi-raisefailfastexception

Change-Id: I45669d619fbbd2f6413ce5e5f08425ed1d9aeb64
Reviewed-on: https://go-review.googlesource.com/c/go/+/474915
Reviewed-by: Davis Goodin <[email protected]>
TryBot-Result: Gopher Robot <[email protected]>
Reviewed-by: David Chase <[email protected]>
Reviewed-by: Than McIntosh <[email protected]>
Reviewed-by: Alex Brainman <[email protected]>
Run-TryBot: Quim Muntal <[email protected]>
@rsc rsc removed this from Proposals Apr 10, 2024
@golang golang locked and limited conversation to collaborators May 30, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests