Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add runtime resume option to dotnet-trace and dotnet-counters. #2343

Merged
merged 3 commits into from
Jun 18, 2021

Conversation

lateralusX
Copy link
Member

@lateralusX lateralusX commented Jun 6, 2021

Mono has option to push diagnostic server TCP/IP listener to device loopback interface. This opens up the ability to run dotnet-dsrouter in server-client mode, where diagnostic tooling connect using regular IPC server (using --process-id) and dsrouter will connect runtime as a TCP/IP client. Users/tooling use adb (on Android) or mlaunch (for all iOS devices), to do port forwarding/tcp tunnelling working in the same way on simulator/emulator as well as physical device connected over usb.

Another advantage of this configuration is security, since it opens up the ability and simplify working with loopback interface all the way over to device, meaning that dsrouter only acts as IPC server over namedpipes/unix domain sockets and as a TCP/IP client connecting 127.0.0.1:x (using port forwarding) and device just listens on local loopback interface 127.0.0.1:x reduce any needs to bind any interface that can be access from outside local machine.

We current have limitation using this schema for suspended ports since no tooling will send a resume command to runtime if not using --diagnostic-port command (or run child process) using the reverse connect scenario. This PR adds a new option to dotnet-counters and dotnet-trace, --resume-runtime, that can be used to mitigate this limitation, open up ability to do start-up tracing on suspended runtime using dsrouter, without need to use reverse connect scenario, complicating configuration when targeting mobile devices. Since diagnostics server can always be resumed (even if not in suspend state), the default behaviour is changed to always send a resume command after setting up session, if user is interested to start multiple tools and keep runtime suspended, dotnet-counters and dotnet-trace accepts --resume-runtime:false that will turn of sending resume command to diagnostic server.

@lateralusX lateralusX requested a review from a team as a code owner June 6, 2021 14:47
@lateralusX
Copy link
Member Author

//CC @josalem

@noahfalk
Copy link
Member

noahfalk commented Jun 9, 2021

I've been loosely aware of the dsrouter work but admitedly haven't tracked it closely. Is there any good writeup of what kind of user experience we are trying to get end-to-end? Needing to add this extra resume option to every tool feels like our proxy abstraction is leaky and the onus is falling on our users to take extra steps to fix it up. If its the best option we've got I can still get behind it, but with the context I've got so far it felt disjoint which causes me pause and makes me want to understand better.

@josalem
Copy link
Contributor

josalem commented Jun 9, 2021

@noahfalk, Johan has documented some of the end-to-end designs in this PR: dotnet/runtime#53302. I'm not sure if everything you're looking for is in there, though.

@lateralusX - am I understanding this correctly? The intent here is to enable a TCP based listen port (e.g., DOTNET_DiagnosticPorts=tcp://127.0.0.1,listen,suspend;)with the suspend flag and for the individual tools to know how to send the resume command? I agree with Noah that this starts to feel leaky. I would prefer if users were unaware of the suspend/resume interaction when using dotnet-trace or the other tools. Is there any way to identify when we are in this configuration and automatically send the resume command?

@lateralusX
Copy link
Member Author

lateralusX commented Jun 10, 2021

Use case is to open up for diagnostic server tcp/ip listener on device using loopback in combination with adb port forwarding on Android or usbmux on iOS talking to device over usb for both suspend and nosuspend through dsrouter, giving us one config that could work for Android/iOS simulators and devices connected over usb, all using loopback tcp/ip interface. If we need to fallback to reverse connect that will complicate iOS/Android device config, probably push that config over to wifi, exposing an unwanted attack vector when really not needed (at least iOS).

One problem I identified is that the suspend feature pushes this into the hands of users, since you need to know when you would like/need to resume a port and that depends on how the runtime has been configured using nosuspend/suspend option. In the end, only the end user will now how many sessions to establish before resume runtime (if the runtime was suspended to begin with).

I looked at doing this in dsrouter but that increase complexity since dsrouter will need to start analyze traffic, detecting when we have started a session on on connection, so it can connect and resume runtime on a second connection, in that case we also need to decide when we should “auto” resume, always after first created session to a newly connected runtime instance or some other scenario, in theory a user might be interested in starting three different sessions before resuming runtime, or two or just one.

A third option considered was implement this in runtime, having an auto resume config option so runtime would resume right after we created first session on a suspended port. This solution suffers from the same issue as above that resuming after creating first session might not be what end user really would like to do, or maybe it is.

Based on above reasoning the suggestion in this PR was the simplest implementation still leaving full control to the end user, since I saw it as something the end user might be interested in controlling. If thats not the case, then we could have an alternative implementation, always do a resume after creating a session (like in reverse connect) directly in the tooling (regardless if port is a suspended port or not). I rather not try to implement this in dsrouter since that will be to complex, but maybe pushing it all the way down to runtime could make sense, then it will go with ports suspend config, something like “auto resume” config keyword on port, but that will take away control from end users interested in starting multiple sessions before resuming port.

Thoughts/reflection/ideas?

@noahfalk
Copy link
Member

Thank you both, there was a lot of good info there! This is where I am at in my thinking so far:

  1. The dsrouter case appears very similar to a local case where we launched a process with DOTNET_DiagnosticPorts=~/myport.sock,listen,suspend. So arguably this isn't a new concern nor any leaky abstraction specific to dsrouter, its just a 2nd scenario that is hitting a case we hadn't dealt with up until now.

  2. I would hypothesize (please say if you disagree) that when a process is launched suspended, the very common case will be that the user connects a single tool and the rare case will be connecting two or more tools. That suggests we should bias our configuration so that the one tool case is as easy as possible even if it makes the multi-tool case more complicated. Some possible design options that could play into that:
    a. The tools always invoke resume by default unless the user explicitly configures them not to. If the runtime was already resumed it should be harmless to do it again. The command-line option used in the rare multi-tool case might be --runtime-resume: false
    b. When users want to attach multiple tools to a suspended runtime they could create multiple ports and give one port to each tool, for example DOTNET_DiagnoticPorts=~/countersTool.sock,listen,suspend;~/traceTool.sock,listen,suspend. The runtime only resumes once all ports have received the resume command so each tool can automatically send a resume command and once both ports have received one execution will continue. In the case of dsrouter maybe this would involve proxying a few ports rather than just one.

  3. Has there been much investigation about potential deeper integration between the tools and the dsrouter proxy? Just spitballing, what if we could launch and trace an app on the remote device using the local dev box command-line only?

dotnet-trace --machine MyAndroidDevice -- \bin\myapp.exe app_arg1

I don't want to change the scope of the current work, this is just speculating about possibilities for down the road and how our story could grow.
4. Another one in the long term bucket: dotnet-monitor and dsrouter seem similar at the 10,000 ft view. They are both attempting to take machine local IPC communication and proxy it over a broader network. Certainly there are differences as well, but at first glance it feels like these tools might be mergable. Does that feel reasonable, crazy, something else? : )

If the long term questions (3 and 4) are distracting I'm glad to fork them into their own separate conversation.

@lateralusX
Copy link
Member Author

lateralusX commented Jun 10, 2021

@noahfalk, thanks for all great info/thoughts/reflection, much appreciated and it sounds like we share a lot of thoughts and ideas regarding this. I try to address each of your items from above and then round up with what I think we should do with this PR.

  1. Yes, exactly the point I was trying to make above (happy I was able to).
  2. I think that would be the most reasonable, making "auto resume" an opt out feature in tooling since most people won't care about it, and if you do you are running advanced scenarios and not to troubled by and additional command line argument to do the work you intended. Since I only enabled the "listen" feature in diagnostic server when you build it to support TCP/IP instead of local IPC, it is currently only available for MonoVM running on mobile platforms at the moment, so one alternative could be to make the "auto resume" default in tooling, wait with the new command line option to disable behaviour until we really see a need for it.
  3. There was a lot of thinking (at least on my end) around scenarios/integration etc. One important topic that drove some decisions is the fact that dsrouter adds TCP/IP into the tooling (but I'm trying real hard to solve as much as possible using loopback only), so it will need its security compliance review and before that is done, we didn't want to bring it into all tooling, since in theory that would need compliance review of all tooling integrating it. Internally I made several options, first I added TCP/IP support into reverse diagnostic server, so it's prepared to be used with TCP/IP, but only for clients creating it with a specific constructor argument, so tooling currently don't do that, only dsrouter, but in theory we could light up TCP/IP stack directly in tooling, IF we would like that. It could however complicate the tools and that's where my second option comes into play, dotnet-dsrouter is actually just a command line wrapper around an internal component, Microsoft.Diagnostics.NETCore.Client.DiagnosticsServerRouterRunner, so idea is that the complete router could be embedded into any tool as is, but again, didn't want to do that until we have the security compliance review of the dsrouter in place. That means we could merge dsrouter with any tooling we would like in the future. Having dsrouer as it's own tool has also been great since it is isolated from rest of diagnostics making it easier to develop and no need to do anything with other diagnostics tools (you can actually run pretty old dotnet-trace against it), mainly focusing on the mobile/remote MonoVM targets that it primarily needs to support. Developing it as a standalone/isolated component/tool was also done to handle a third option, give integrating SDK/IDE's ability to use dsrouter as a copy/past codebase getting their own copy of it directly into their own libraries and tweak it to their needs. We could open up the internal component, but there was concerns putting it out as a public API from diagnostics libraries, but could be done as well to simplify that process even more.
  4. There are definitely similarities, but I also believe they have different duties and use cases. From my understanding monitor is a HTTP server giving access to IPC over REST, while dsrouter is a lower level, component that do IPC->TCP/IP as well as a local representation of the running process on remote target, so you could in theory have many running dsrouter instances at the same time and they are very cheap (especially when doing the IPC server -> TCP client configuration). Based on details from 3 above, merging them would be simple, but not just sure how much value it will add in the end, complicating both tools in the process.

Phew!

So for this PR, sounds like letting the tools always pass a resume command after creating a session is simplest, least complex alternative. I also believe we could wait with the --runtime-resume: false option until we know it is really needed (today, the suspend scenario uses reverse connect that already have this behaviour). If that sounds reasonable, I will adjust this PR accordingly.

@noahfalk
Copy link
Member

So for this PR, sounds like letting the tools always pass a resume command after creating a session is simplest, least complex alternative. I also believe we could wait with the --runtime-resume: false option until we know it is really needed (today, the suspend scenario uses reverse connect that already have this behaviour). If that sounds reasonable, I will adjust this PR accordingly.

Sounds good to me, @josalem I'd also like to get your thoughts.

One important topic that drove some decisions is the fact that dsrouter adds TCP/IP into the tooling...

If the tools can operate primarily as TCP clients I'd guess the security review would be much easier than if they are operating as servers, but either way fully agree that security would have to be checked off the list as we moved into that space. Part of the integration I was thinking about was letting more of the SDK tools' current commands (like launch, enumerate processes, attach to process by name, etc) work. Maybe that winds up not being something dsrouter can do because it isn't running any code on the target device?

Thanks for the great feedback @lateralusX : )

@lateralusX
Copy link
Member Author

lateralusX commented Jun 10, 2021

If the tools can operate primarily as TCP clients I'd guess the security review would be much easier than if they are operating as servers,

@noahfalk jupp and that why it try really hard to open up all scenarios running using device TCP/IP listener, binding loopback and using adb port forwarding over usb for Android and usbmux for iOS and leaving dsrouter using IPC server - TCP client. dsrouter still needs support for IPC server - TCP server and IPC client-TCP server(diagnostic tooling reverse connect) since MonoVM might end up in scenarios where we can't do local binds on device, so you will need reverse connect from runtime back to dsrouter. All modes is currently implemented and works in both dsrouter as well as MonoVM.

@lateralusX
Copy link
Member Author

lateralusX commented Jun 10, 2021

Part of the integration I was thinking about was letting more of the SDK tools' current commands (like launch, enumerate processes, attach to process by name, etc) work. Maybe that winds up not being something dsrouter can do because it isn't running any code on the target device?

@noahfalk complete runtime is running on remote sandboxed environment (either in simulator or on physical device), so dsrouter acts more like the instance that other tools will view as the runtime to connect/attach to, so from that perspective, enum/attach commands doesn't make any sense, since its the local representation/proxy of the remote runtime. What I however have implemented is the launch capabilities in dsrouter, primarly to simplify the whole process running dsrouter with a running instance on device. Using the launch, you can actually run a script or other commands that will do full launch of app on device and it is possible for dsrouter to pass along the TCP/IP connectivity info over to launched process and running both as a unit that will both terminate when one of them does.

Example:
dontet-dsrouter server-client -tcpc 127.0.0.1:9000 -- mlaunch [app-id] [device-id] --set-env=DOTNET_DiagnosticPorts=${DOTNET_DiagnosticPorts}

That will launch dsrouter, setup local IPC server (for diagnostic tooling to use), run mlaunch with all arguments but replace ${DOTNET_DiagnosticPorts} with 127.0.0.1:9000,nosuspend,listen (depending on scenario). All is then ready for regular diagnostic tooling to connect the dsrouter's IPC server and do what ever needed, running parallel sessions against MonoVM that can run in simulator/device on Android or iOS (or potential other remote sandboxed environments going forward).

And if you would like to disconnect the lifetime of the processes, just run dsrouter:

dontet-dsrouter server-client -tcpc 127.0.0.1:9000 &

and have it running in the background, you can then deploy or launch the app using tools, directly from XCode over and over again, as long as it binds and use the same port as dsrouter is setup to route.

@noahfalk
Copy link
Member

What I however have implemented is the launch capabilities in dsrouter

awesome, I missed that before. I should spend more time with it and check out what you've got : )

@lateralusX
Copy link
Member Author

lateralusX commented Jun 10, 2021

@noahfalk maybe #2344 makes more sense after sharing above information?

Keeping option to turn of resume runtime using --resume-runtime:false.
@lateralusX
Copy link
Member Author

lateralusX commented Jun 15, 2021

@noahfalk, @josalem Changed default of --resume-runtime argument to true, meaning that dotnet-couters and dotnet-trace will always send a resume command after session has been created, so all will be hidden to end users, but open up use case to run diagnostic server in suspend,listen mode and first tool (dotnet-counters or dotnet-trace) that connects will setup session and resume runtime (same behaviour as reverse connect). There is still option to prevent tools from doing a resume, by passing --resume-runtime:false.

Copy link
Contributor

@josalem josalem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@josalem josalem merged commit 7b253a5 into dotnet:main Jun 18, 2021
@github-actions github-actions bot locked and limited conversation to collaborators Jan 15, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants