Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement tooling support to connect using custom IPC channels. #2374

Merged

Conversation

lateralusX
Copy link
Member

@lateralusX lateralusX commented Jun 17, 2021

dsrouter presents same default IPC channel to tools as a regular coreclr process so there is a risk of collision with its own coreclr runtime default IPC channel (if used).

dsrouter supports custom named IPC channels using its -ipcs argument, but current tooling can't connect to custom named IPC channels, only supporting the default IPC channel through use of --process-id or --name arguments.

This PR extend tooling support in dotnet-counters and dotnet-trace extending the --diagnostic-port command accepting custom IPC channels to connect to, instead of default channels based on process id.

@lateralusX lateralusX requested a review from a team as a code owner June 17, 2021 14:37
@lateralusX
Copy link
Member Author

//CC @josalem

@lateralusX
Copy link
Member Author

lateralusX commented Jun 18, 2021

Thanks @jander-msft, @noahfalk for all feedback so far!

Feels like I need to rebase thinking a little. Motivation of "replacing" the default IPC channel in dsrouter was to make it transparent to users running diagnostic tooling with --process-id or --name, and tooling would take care of getting all traffic over to mobile device (or some other container or remote running app).

dsrouter can run using a custom IPC channel using its -ipcs argument, but currently diagnostic tooling like dotnet-trace can't connect to anything expect the default IPC channel.

We could always rely on -ipcs and add option in dsrouter to connect towards that channel, but that "complicates" the default scenario for dsrouter since it adds additional config both when launching dsrouter as well as running diagnostic tooling.

Since it looks like we can't do any decent heuristics in diagnostic tooling to connect to a different named IPC channel if it exists (due to potential error cases), maybe we should add a simple way to inform diagnostic tooling that user intends to connect to dsrouter's default IPC channel instead of runtimes default IPC channel in targeted process. If we make that argument "light" it won't be to much for the user to type, possible drawback/error case, user misses to include argument and start to trace dsrouter instead of remote target and won't see any difference, so that not optimal either.

So that brings us back to a situation where the tooling needs to know that it's going to connect to dsrouter and pick the right default IPC channel to do the job, and doing that automatically would have simplified things and reduced errors.

So based on all above, this is where I'm currently at.

Open up so diagnostic tooling can connect "custom" named IPC channels, not just the default. We already have --diagnostic-port for the reverse connect scenario, so we will need something additional telling tooling to connect using the --diagnostic-port and not setup a reverse diagnostic server in the tool, to keep arguments short, maybe we should just have a --connect option to connect instead of listen. This should be straight forward and a generic extension to all tooling.

Teach tooling about dsrouter and that the default IPC channel is named differently, since dsrouter is an important component in the diagnostics toolset, I believe it is OK to include some specific knowledge about it into the tools. Maybe a more "safe" approach could include the following heuristics in TryGetDefaultAddress:

  • Check name of process and if its dotnet-dsrouter* check for dsrouter IPC channel.
  • If above checks fail, fallback to default IPC channel.

This will only get us into looking for a different named IPC channel if the process is named dotnet-dsrouter*. This won't work if dsrouter was executed using dotnet run, but that is not how the tooling gets installed by end users and running tooling using dotnet run already breaks other process discovery scenarios, like using --name, so think it could be a decent approach.

If heuristics fails there is always the option to start dsrouter using -ipcs and then pass --diagnostic-port and --connect arguments to tooling, but that should only be needed in cases where dsrouter is not run using dotnet-dsrouter command.

Possible drawback with this is that the user won't know to what IPC channel the tooling is connecting, and that is the whole point, but if we would need to run diagnostics on dsrouter itself, then we need to bypass the heuristics, so maybe that again speaks for a argument to explicit tell to connect against dsrouters IPC channel, like expanding proposed --connect to take arguments, like default|dsrouter.

So if user want to connect to dsrouter's default IPC channel, they just pass --connect dsrouter, but still use --process-id or --name as before.

If you user want to connect towards a custom IPC channel, then use --diagnostic-port mycustomchannel --connect.

I draft that up in this PR and we can take discussion from there.

@noahfalk
Copy link
Member

Teach tooling about dsrouter and that the default IPC channel is named differently

Is the goal behind this to avoid users unintentionally tracing the proxy app when they meant to trace the app being proxied? What if we left the default IPC channel alone and made the tools give a warning like this when they recognized the process name:

> dotnet-trace --processId 1234
ProcessId 1234 is a known proxy 'dotnet-dsrouter'.
If you intended to connect to the app being proxied use the --connect argument instead with the
proxy's advertised address.
If you intended to diagnose the proxy itself, re-run this command with --force to ignore this warning.

If you user want to connect towards a custom IPC channel, then use --diagnostic-port mycustomchannel --connect.

What if we defined that --connect mycustomchannel is the more compact way of expressing the same thing? Do folks think users would find this more confusing?

possible drawback/error case, user misses to include argument and start to trace dsrouter instead of remote target and won't see any difference, so that not optimal either

Hopefully we can avoid having one command be a subset of the other. For example no forgotten argument or typo would convert dotnet-trace --connect mycustomchannel into dotnet-trace --processId 1234. Certainly confusing the two scenarios remains a possibility but I think we've got options to help people avoid that mistake if it proved to be a problem.

Open up so diagnostic tooling can connect "custom" named IPC channels

This sounds good to me. Even without dsrouter I expect it was only a matter of time. For example it would be useful to tunnel across container boundaries.

@lateralusX
Copy link
Member Author

Is the goal behind this to avoid users unintentionally tracing the proxy app when they meant to trace the app being proxied? What if we left the default IPC channel alone and made the tools give a warning like this when they recognized the process name:

> dotnet-trace --processId 1234
ProcessId 1234 is a known proxy 'dotnet-dsrouter'.
If you intended to connect to the app being proxied use the --connect argument instead with the
proxy's advertised address.
If you intended to diagnose the proxy itself, re-run this command with --force to ignore this warning.

Yeah, sounds reasonable, was hoping we could have this scenario a little more transparent to the end users, but that in turn opens up a range of other problems, so a warning above would at least make sure users are doing the right thing and since dsrouter already have a default or custom IPC channel printed on startup, that could be used with --connect argument.

@josalem
Copy link
Contributor

josalem commented Jun 18, 2021

I think Noah's points are all valid. I also think having too many ways of specifying a connection point might end up getting confusing, though. I would hope we can keep the tools from having 1:1 connection specifying flags to runtime connection options. Perhaps we should look at unifying things under a specific flag with metadata on it?

I'm imagining something like:

# default patterns
dotnet trace collect -p <pid>
dotnet trace collect -n <app name>

# more complex patterns
dotnet trace collect --address tcp://localhost:1234,connect
dotnet trace collect --address tcp://localhost:1234,listen
dotnet trace collect --address uds:///Users/joshmo/myunixdomain.sock,connect
dotnet trace collect --address uds:///Users/joshmo/myunixdomain.sock,listen
dotnet trace collect --address namedpipe://./pipes/mynamedpipe,connect
dotnet trace collect --address namedpipe://./pipes/mynamedpipe,listen

This gives us a framework for adding more complex connection scenarios without necessarily adding new flags or more complex user invocations at the cost of a slightly more complex config parsing phase on our part. We could add as many tags to the end of the address as we want without necessarily increasing the cognitive load of specifying a simple scenario as long as our defaults are reasonable. We get the added benefit of the address syntax matching the syntax for the DOTNET_DiagnosticPorts as well. We could also reuse the --diagnostic-port flag and just update the way it is used. I understand the case for not making a breaking change to usage patterns, but we could accept both forms for a version or two and print a warning about the behavior changing.

CC @shirhatti in case you have thoughts from a user E2E perspective.

@lateralusX
Copy link
Member Author

I had similar thoughts combining this into the same argument, like how our DOTNET_DiagnosticPorts work, but was a little afraid that the user experience would suffer making it to complex, if we are going to combine it maybe we should just use the one we already have diagnostic-port and just add connect argument to it, listen will be default to keep current default behaviour.

@noahfalk
Copy link
Member

I'm imagining something like:

Were you intentionally changing --diagnostic-port to --address? It felt like we could do the same thing using the argument we already have and there would be no need for anyone to learn a new one?

# default patterns
dotnet trace collect -p <pid>
dotnet trace collect -n <app name>

# more complex patterns
dotnet trace collect --diagnostic-port tcp://localhost:1234,connect
dotnet trace collect --diagnostic-port tcp://localhost:1234,listen
dotnet trace collect --diagnostic-port uds:///Users/joshmo/myunixdomain.sock,connect
dotnet trace collect --diagnostic-port uds:///Users/joshmo/myunixdomain.sock,listen
dotnet trace collect --diagnostic-port namedpipe://./pipes/mynamedpipe,connect
dotnet trace collect --diagnostic-port namedpipe://./pipes/mynamedpipe,listen

# I assume if we used --diagnostic-port we would still support the shortened address forms accepted elsewhere?
dotnet trace collect --diagnostic-port localhost:1234
dotnet trace collect --diagnostic-port /Users/joshmo/myunixdomain.sock
dotnet trace collect --diagnostic-port //./pipes/mynamedpipe

@lateralusX
Copy link
Member Author

lateralusX commented Jun 18, 2021

I also believe it needs to have good defaults, so users don't need complex address formats unless really needed.

@josalem
Copy link
Contributor

josalem commented Jun 21, 2021

Were you intentionally changing --diagnostic-port to --address?

When I wrote that I was trying to avoid a breaking change to how --diagnostic-port is used, but I'm also okay reusing that flag as well. It seems perfectly reasonable to me.

I assume if we used --diagnostic-port we would still support the shortened address forms accepted elsewhere?

That's how I imagine it as well.

We would make assumptions based on platform and have reasonable default values. e.g., on Windows we would assume a schema of namedpipe:// and on non-Windows we would assume a schema of uds://, etc. Same goes for tags, e.g., we would assume connectlisten if unspecified to preserve existing behavior of the tools.

@jander-msft
Copy link
Member

Same goes for tags, e.g., we would assume connect if unspecified to preserve existing behavior of the tools.

You mean listen (establishes its own server to listen for runtimes to connect to it) is the default, correct? I know that dotnet-monitor implicitly uses listen when using --diagnostic-port, and I believe dotnet-counters/trace do the same.

@josalem
Copy link
Contributor

josalem commented Jun 21, 2021

ah, yes, you're right. The default for the --diagnostic-port flag would be the listen tag since using that flag would make the tool start a Diagnostics IPC server.

@lateralusX
Copy link
Member Author

Sound like we are inline with changes needed. I will adjust this PR accordingly.

@lateralusX lateralusX changed the title Check for dsrouter IPC channel collision and fallback to different default. Implement tooling support to connect using custom IPC channels. Jun 22, 2021
@lateralusX
Copy link
Member Author

First set of changes done, I adjusted the tools that already supported --diagnostic-port argument, dotnet-trace and dotnet-counters to handle extended format, I also added dotnet-stack into the mix, other tools (like dotnet-gcdump, dotnet-dump) are still not touched since they didn't support --diagnostic-port in first place (and we don't currently support them on MonoVM).

I didn't add any protocol parsing logic to reduce the initial changes, so right now no protocol prefix is handled and you only use the default protocol for the platform (namedpipe or uds).

The default for --diagnostic-port, if nothing else is specified is listen mode, to be backward compatible, and I added support for connect mode. Parsing logic is inline with parsing of DOTNET_DiagnosticPorts in runtime, so:

--diagnostic-port mycustomchannel,connect

will tell the tool to connect to a IPC channel called mycustomchannel.

and if you just run like before:

--diagnostic-port mycustomchannel

means tool will setup a reverse IPC server and listen for incoming connections, indentical to how --diagnostic-port works today and is equivalent to:

--diagnostic-port mycustomchannel,listen

@lateralusX lateralusX force-pushed the lateralusX/check-for-dsrouter-ipc-channel branch from caacef9 to 355edfb Compare June 22, 2021 16:42
@lateralusX lateralusX force-pushed the lateralusX/check-for-dsrouter-ipc-channel branch 2 times, most recently from 75f0125 to d28a6dc Compare June 23, 2021 13:00
@lateralusX
Copy link
Member Author

lateralusX commented Jun 23, 2021

Added schema parsing rules to --diagnostic-port:

Default will be platform specific IPC tech, like namedpipe or uds, so not needed to add schema prefix to path, but if done, it needs to follow regular URI syntax as defined by https://datatracker.ietf.org/doc/html/rfc3986 and implemented by System.Uri.

The following is accepted:

UDS

myport => myport
uds:myport => myport
/User/mrx/myport.sock => /User/mrx/myport.sock
uds:/User/mrx/myport.sock => /User/mrx/myport.sock
uds://authority/User/mrx/myport.sock => /User/mrx/myport.sock
uds:///User/mrx/myport.sock => /User/mrx/myport.sock

NamedPipe

myport => myport
namedpipe:myport => myport
\\.\pipe\myport => myport (dropping \\.\pipe\ is inline with implemented namedpipe client/server)
namedpipe://./pipe/myport => myport (dropping authority and /pipe/ is inline with implemented namedpipe client/server)
namedpipe:/pipe/myport  => myport (dropping /pipe/ is inline with implemented namedpipe client/server)
namedpipe://authority/myport => myport
namedpipe:///myport => myport

Unsupported schema will fail parsing.

NOTE, using schema:// is normally used when having an authority as part of the URI, for namedpipe . is considered localhost, but nothing similar for uds. Since the authority will be dropped for namedpipe and uds, they could in practice be anything and we don't enforce any rules on it.

@lateralusX lateralusX force-pushed the lateralusX/check-for-dsrouter-ipc-channel branch 3 times, most recently from e615e92 to 318f390 Compare June 24, 2021 13:46
Copy link
Contributor

@josalem josalem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple comments

@lateralusX
Copy link
Member Author

@noahfalk @jander-msft @josalem Thank you so much for all great feedback! I believe I have worked through most of them and implemented changes accordingly.

@noahfalk
Copy link
Member

Thank you so much for all great feedback!

Np! Thanks for implementing it all : D

Copy link
Contributor

@josalem josalem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a look through your changes. LGTM :) thanks!

@lateralusX lateralusX force-pushed the lateralusX/check-for-dsrouter-ipc-channel branch from b7d4255 to 1a68f7c Compare July 7, 2021 18:25
@lateralusX lateralusX merged commit 2a4c122 into dotnet:main Jul 8, 2021
@github-actions github-actions bot locked and limited conversation to collaborators Jan 15, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants