Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

End-to-end distributed tracing support #939

Closed
cgillum opened this issue Sep 18, 2019 · 38 comments
Closed

End-to-end distributed tracing support #939

cgillum opened this issue Sep 18, 2019 · 38 comments
Assignees

Comments

@cgillum
Copy link
Member

cgillum commented Sep 18, 2019

Problem

Azure Functions has integration with Application Insights and supports end-to-end tracing for various binding types. However, there is no support for end-to-end tracing in Durable Functions.

Proposed Solution

Durable Functions should support end-to-end tracing and associated visualization in Application Insights.

  • A user should be able to see the order of activity function calls from an orchestration, including any sub-orchestrations
  • User logs should be able to correlate with these traces
  • Ideally this works with external events as well
  • Durable HTTP APIs should correctly flow end-to-end trace IDs
  • Calls to external services supported by App Insights should flow the correct correlation info
  • Should be compatible with "App Map" visualizations

The first step to supporting this is to flow correlation information through the DTFx libraries. Once that is done, it should be surfaced to the Azure Functions / Durable Functions layers. The DTFx work is currently being tracked here: Azure/durabletask#261

Workarounds

We do have tracing to Application Insights which includes the orchestration instance ID across orchestrator and activity function traces. However, it is not correlated with user logs. It's also not possible to directly see the call chain between orchestrations and specific activity function invocations. It also doesn't work with App Insights visualization features.

@idg10
Copy link

idg10 commented Oct 17, 2019

One aspect of this that you've not mentioned in the bullet list in your proposed solutions is how Eternal Orchestrations https://docs.microsoft.com/en-us/azure/azure-functions/durable/durable-functions-eternal-orchestrations should fit into this.

My worry would be that once Durable Functions integrate with Application Insights in the same way that other functions triggers do today, all work started from an Eternal Orchestration would appear to be part of the same massive operation.

If possible, what I'd like to see is each iteration of an Eternal Orchestration effectively appearing as a distinct Operation in Application Insights. So each time you ContinueAsNew you start a new Operation for monitoring purposes.

@TsuyoshiUshio
Copy link
Collaborator

Hi @idg10 ,
Thank you for the comment.
Durable Functions Distributed Tracing support won't be the same as the other functions. It will be a separate tracing mechanism. Since, some people need correlation in Eternal Orchestration scenario, however, now I know some people want start from new when it is ContinueAsNew. Which feature does it fit your use case?

  • Having a configuration to make ContineAsNew starts new tracing for each time.
  • Having a configuration to make a specific Orchestrator ContinueAsNew start new tracing for each tiem. But other's is not.

Or would you prefer other solution?
@cgillum If you have any comment to that spec, please leave a comment.

@GavinOsborn
Copy link

@cgillum thank you for this, your proposed solution sounds exactly like what I am hoping for.
If there is a build of any of this that I could play with I would be happy to help provide feedback.

@GavinOsborn
Copy link

Is this to be a v3 feature or will this be back-ported into a v2 release also?

@cgillum
Copy link
Member Author

cgillum commented Feb 26, 2020

@GavinOsborn sorry for the late response. Most likely we will target Functions 3.0 for this because .NET Core 3 contains a lot of improvements in this area that we intend to take advantage of.

@MedAnd
Copy link

MedAnd commented Apr 2, 2020

@cgillum - any update on this feature? Would be great to have an end to end sample & documentation for how to correlate across HttpTrigger, Durable Functions + Activities, Service Bus messages etc?

Also if using .Net Core 3+, would be good to clarify if one should use Application Insights libraries directly or the new OpenTelemetry support?

@cgillum
Copy link
Member Author

cgillum commented Apr 3, 2020

@MedAnd yes, we're making progress on this work, I believe this PR contains the latest. @TsuyoshiUshio should be able to answer your questions about Application Insights vs. OpenTelemetry.

@TsuyoshiUshio
Copy link
Collaborator

TsuyoshiUshio commented Apr 8, 2020

Hi @MedAnd @cgillum

We are currently support AppInsights. Both W3C trace context and HttpCorrelation Protocol.
https://devblogs.microsoft.com/aspnet/observability-asp-net-core-apps/ We cuspport .NetCore 3+.

For the OpenTelemetry, our telemetry model is created as open-closed principal, so it is easy to add a new protocol. If people want the tracing, I'd happy to create an issue for it.

If you want to know about the feature of the distributed tracing, you can refer this PR. You can find the documentation, video, and samples are there. It is now on review, so the contents could change, however, it might be a good place what it looks like. It also explain how the durable functions distributed tracing work with outside of the durable world. #1298

@MedAnd
Copy link

MedAnd commented Apr 8, 2020

Hi @TsuyoshiUshio, thank you & will take a look at the PR documentation, video, and samples...

I do note the following are not supported scenarios and wondering when they will be supported:

Durable Entitry Orchestration (not supported)
HTTP endpoints (not supported)
Human Interaction (not supported)

Above is directly related to some feedback I shared here: #1305

Update:

  1. Have looked at the PR and videos and this looks exactly like what we need!
    When will this feature ship 🙂 ?

  2. From the PR, cannot see if correlation will also work with / across the RaiseEventAsync API? If it does work with the RaiseEventAsync API, can this approach be used to track / correlate a RaiseEventAsync with the progress of state within an orchestration? See correlation state problem description [B] in this issue: Azure Functions Durable Orchestration enhancements #1305

@TsuyoshiUshio
Copy link
Collaborator

Hi @MedAnd
Thank you for the questions and comment! The not supported one is just not implemented. :) Not technical blocker. However, I'd like to implement one by one. These stories requires additional change for the other part rather than core implementation.

Raise event with Distributed Tracing

One exception is Human Interaction also, RaiseEvent one. This one requires discussion. Maybe no one doing this trace before. (or just I don't know.) This scenario required multiple parent. The correlation system (Activity) assumes one parent. However, RaiseEvent (or Human interaction) could wait several event happens. That means, they could have multiple parent. In this case we need to discuss how to handle it. Do you have any idea, it is very welcome!

Currently I have some idea for this scenario.

Idea of Raise event correlation

We have

  1. Orchestration, wait for wait for event coming. We have two parent(start orchestration and event raise. It has OperationId 1, 2 respectively.)
  2. Assume the parent is the orchestration one. However, when we receive the event on the orchestrator, it will create a custom telemetry as a child of orchestration(1), that has a property to point the Id coming from raise event source. At the same time, create another custom telemetry that as a child of raise event(2) that has a property to point the Id coming from orchestration.
  3. Discuss with Azure Portal team to enable the link.

However, I haven't discussed with anyone. Just an idea. So I'm going to discuss how to achieve this. What do you think, @cgillum @anthonychu ?

@TsuyoshiUshio
Copy link
Collaborator

Sorry for the late response, however, I released pre-release version. For more details, you can find the announcement, video, documentation, samples and getting started!
https://medium.com/@tsuyoshiushio/durable-functions-distributed-tracing-71426fe2246f

@cgillum cgillum modified the milestones: Backlog, v2.3.0 Jun 16, 2020
@cgillum
Copy link
Member Author

cgillum commented Jun 16, 2020

Now that we have the alpha release done, let's use this issue to track a public preview release that is enabled by default.

@olitomlinson
Copy link
Contributor

@TsuyoshiUshio Have there been any developments on handling the raised event scenario? Many thanks!

@davidmrdavid
Copy link
Contributor

davidmrdavid commented Aug 21, 2020

Hi @olitomlinson! I'll be taking over the Distributed Tracing feature. I don't think there has been much progress on that scenario, but I should be picking this project up in the coming month or so, so I expect work on this to ramp up shortly after!

@cgillum cgillum self-assigned this Aug 27, 2020
@TsuyoshiUshio
Copy link
Collaborator

TsuyoshiUshio commented Sep 16, 2020

Hi @olitomlinson
Sorry for late reply. I noticed that in case of Event Raised, the tracing is broken. I wrote some fix for it.
Azure/durabletask#430
Also we need to discuss how to correlate the raise event cases. It has two parents. My current idea is, the primary parent(orchestration client call) will be the parent, and raise event traceparent( or activity id) will be stored as a property.

@santo2
Copy link

santo2 commented Oct 26, 2020

is this expected to be worked on soon? We were waiting for it but we see that it hopped to the 2.4.0 version :-)

@davidmrdavid
Copy link
Contributor

@santo2 , I'm actively working on it now! I started working on it just yesterday. We're first looking to merge the correlation feature branch into master in DurableTask, which you can track here: Azure/durabletask#422

After that, we'll begin doing some validation of correctness and prioritizing the missing bits. So, in short, it's happening now!

⚡ ⚡

@davidmrdavid
Copy link
Contributor

Azure/durabletask#422 has been merged into DTFx! That doesn't mean we're done, but it's an important step!

@davidmrdavid
Copy link
Contributor

davidmrdavid commented Nov 25, 2020

As a follow-up, we just merged the distributed tracing bits for durable-extension into the dev branch. As seen here: #1571

We'll provide more details about this in the upcoming release notes :) . Do note that this is still a work-in-progress feature.

@santo2
Copy link

santo2 commented Nov 26, 2020

Hi @davidmrdavid ,
Thanks for the update! What is the easiest way to also get notified on the release of that merge?

@davidmrdavid
Copy link
Contributor

Hi @santo2

The easiest would be click the "Watch" button in this repo, and selecting a "custom" notification scheme where you make sure you select to get notifications for releases. That should give you an update every time we update this page: https://github.com/Azure/azure-functions-durable-extension/releases

@NickSevens
Copy link

NickSevens commented Dec 1, 2020

It's working very well with the 2.4.0 package! Thanks @davidmrdavid!

I do have a question: I'm performing some calls to external APIs in my functions, but they do not show up as a dependency in the "orchestration structure". Is there a way to also include calls (from HttpClient for example) in the entire dependency tree? Would be nice to see the complete picture in 1 go.

@ConnorMcMahon
Copy link
Contributor

I don't believe we currently support that @NickSevens. I think that we may already support thatin our IDurableOrchestrationContext.CallHttpAsync() API, but adding some easier way to support that in activity functions may be a good idea.

I believe the best thing to do for now is to wrap those external HTTP calls in a custom service using dependency injection. You can then have this custom service construct a telemetry client and manually trace the HTTP request yourself.

If you have any ideas for how to more easily support this, it would probably be best to track that as a separate issue. The more details you put in the proposal, the more we have to work with as a development team to decide if and how we implement that functionality.

@ConnorMcMahon
Copy link
Contributor

Closing the issue since this is now out officially in v2.4.0. I am sure there is still room for improvements, but we can track those issues as they come in.

@NickSevens
Copy link

NickSevens commented Dec 1, 2020

Problem there is I'm using the Graph SDK for example, or SharePoint CSOM libraries, which do use HttpClient internally. Not sure how to implement custom logging in there... 😊

Thank you for the explanation though!

@ConnorMcMahon
Copy link
Contributor

Interesting...

Unfortunately, it really depends on how those libraries retrieve their HTTP clients, so we may not have a way to hijack their http clients to add the telemetry ourselves.

I would still file an issue, because App Insights SDK may provide an extension point where we can intercept their outbound dependency tracking logic to add correlation automatically without the need to use a custom HttpClient.

@tommck
Copy link

tommck commented Oct 7, 2021

Is this still in alpha after all this time?

@ConnorMcMahon
Copy link
Contributor

It is still in preview, with plans to get it GA in the next few months. You can track the state of the improvements we are tracking for GA here.

@Shivam60
Copy link

Is this still in alpha?

@cgillum
Copy link
Member Author

cgillum commented Jan 18, 2022

Yes, this is still in alpha. We made the decision to revisit the implementation to make it more compatible with the alternate storage providers (the current implementation is tightly coupled to Azure Storage) and to make it compatible with OpenTelemetry (the current prototype is tightly coupled to Application Insights).

The work to update the implementation is currently in-progress (see Azure/durabletask#648), but the team hasn't yet committed to an ETA.

FYI @bachuv and @AnatoliB

@olitomlinson
Copy link
Contributor

the current implementation is tightly coupled to Azure Storage) and to make it compatible with OpenTelemetry (the current prototype is tightly coupled to Application Insights)

Great move!

@santo2
Copy link

santo2 commented Jan 19, 2022

we have been waiting for a looooong time ;-) I revisit this topic each month in the hopes that it's done!

@kepikoi
Copy link

kepikoi commented Feb 2, 2022

We have also been waiting months if not years for this and the lack of trace correlation is still a serious problem. Makes debugging of custom (winston) logs in durable activities rocket science

@oising
Copy link

oising commented Feb 2, 2022

Yes, this is still in alpha. We made the decision to revisit the implementation to make it more compatible with the alternate storage providers (the current implementation is tightly coupled to Azure Storage) and to make it compatible with OpenTelemetry (the current prototype is tightly coupled to Application Insights).

The work to update the implementation is currently in-progress (see Azure/durabletask#648), but the team hasn't yet committed to an ETA.

FYI @bachuv and @AnatoliB

Go @cgillum and team :) We know you're doing the right thing here, and it's painful to undo the appinsights stuff, but OT is The Way. Thank you!

@marnilss
Copy link

How can this issue be closed, when apparently feature is still is in alpha and you are working on it (see comments above)?
It is so difficult for us, end-users, of libraries to know what is supposed to work and not. I've been investigating and troubleshooting my own stuff because end-to-end distributed tracing does not work, and this issue was closed with text: "Closing the issue since this is now out officially in v2.4.0. I am sure there is still room for improvements, but we can track those issues as they come in."
And after reading the comments (posted after it was closed) I find out that it never reached GA, but is still in alpha state.

I think the documenation of Azure Functions, Durable Functions etc. is very hard to get a grip on. I've worked a lot more with ASP.NET Core and their documentation is in one place (docs....) and up to date. For functions you need to look at docs..., blog-articles and read all the comments in github-issues.

@cgillum
Copy link
Member Author

cgillum commented Feb 10, 2022

How can this issue be closed, when apparently feature is still is in alpha and you are working on it (see comments above)?

The issue was closed when the planned feature was released. GitHub issues aren't necessarily used to distinguish between a feature's preview status and its GA status.

I think the documenation of Azure Functions, Durable Functions etc. is very hard to get a grip on.

I'd be interested to get your feedback on how we can improve the docs for Durable Functions. Would you mind opening a discussion thread for this?

@dcbrown16
Copy link

Hi team, we have a customer who is calling an Angular single-page application. In their app, the client calls an HTTP Trigger Function, which calls an Orchestrator, which calls an Activity. They want to get the same operation_id across all of these Functions in data collected to Application Insights.
The latest progress thread linked (Azure/durabletask#648) points to a .NET library (https://github.com/Azure/azure-sdk-for-net/blob/main/sdk/monitor/Azure.Monitor.OpenTelemetry.Exporter/README.md). Will Angular/JavaScript also be supported by this feature at some point?

@cgillum
Copy link
Member Author

cgillum commented Jun 16, 2022

Yes, this should work fine as long as the Angular client includes the necessary distributed tracing headers in the HTTP request that triggers the HTTP function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests