Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Process crashes and hangs when passing parameters that contain CancellationToken #1158

Open
cgillum opened this issue Aug 29, 2024 · 5 comments

Comments

@cgillum
Copy link
Member

cgillum commented Aug 29, 2024

There have been several reports of CLR crashes and process hangs when using DTFx, specifically when a call to an activity function, continue-as-new, etc. passes in a parameter that contains a reference to CancellationToken. This also happens when OrchestrationContext.CreateClient<T> is used and methods in T interface with CancellationToken in their signatures.

The root issue is that the serialization code used by DTFx will serialize CancellationToken objects in a way that can corrupt CLR memory, resulting in unexpected behavior.

The mitigation is to remove CancellationToken parameters from interfaces that are wrapped using OrchestrationContext.CreateClient<T> or from argument values that are passed to ScheduleTask*, ContinueAsNew, etc.

@Emeka-MSFT
Copy link

Could another workaround be to change the default serialization class and/or parameters?

@Emeka-MSFT
Copy link

By default, all DTFx async methods must support cancellation token as the last parameter. From what I noticed, it is a current implementation gap, i.e. bug.

@cgillum
Copy link
Member Author

cgillum commented Sep 17, 2024

Could another workaround be to change the default serialization class and/or parameters?

Yes, but this would be a breaking change under normal circumstances. It might be possible to add some customization to look for this case, specifically, and avoid serializing CancellationToken if we see it, though I'm not sure how exactly it would need to be implemented.

By default, all DTFx async methods must support cancellation token as the last parameter. From what I noticed, it is a current implementation gap, i.e. bug.

If this was in reference to my point about removing CancellationToken parameters from wrapped interfaces, I was referring to user-provided interfaces, not DTFx interfaces.

@mkraghavan
Copy link

By default, all DTFx async methods must support cancellation token as the last parameter. From what I noticed, it is a current implementation gap, i.e. bug.

Do we have an ETA for when this will be fixed? This is hitting us bad in our production workloads and we are forced to restart orchestrators intermittently to work around this issue.

@cgillum
Copy link
Member Author

cgillum commented Sep 23, 2024

@mkraghavan what fix are you referring to? The process crash mentioned in this issue can be worked around by removing CancellationToken from your list of parameters. Also, the sentence you're quoting is not related and not something that is going to be changed anytime soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants