Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API Proposal: Add Async validation #329

Closed
almostchristian opened this issue Jun 28, 2024 · 5 comments
Closed

API Proposal: Add Async validation #329

almostchristian opened this issue Jun 28, 2024 · 5 comments

Comments

@almostchristian
Copy link

The new validator should include an async API to avoid calls to TaskHelper.Await. During my performance profiling sessions, I found using TaskHelper.Await, or even Task.GetAwater().GetResult() makes the traces chaotic. Also, I feel it has a significant scalability impact when used with network bound resource resolvers.

Proposed new API

The three interfaces use the default interface method implementation for the new API. If the validator does not need to be async (i.e., it doesn't call any async api such as other other validators), it doesn't need to implement the new async method.

public class Validator
{
   Task<OperationOutcome> ValidateAsync(ElementNode instance, string? profile, CancellationToken cancellationToken);

   Task<OperationOutcome> ValidateAsync(Resource instance, string? profile, CancellationToken cancellationToken);
}

public interface IElementSchemaResolver
{
   ValueTask<ElementSchema?> GetSchemaAsync(Canonical schemaUri)
      => new(GetSchema(schemaUri));
}


public interface IValidatable
{
   ValueTask<ResultReport?> ValidateAsync(IScopedNode input, ValidationSettings vc, ValidationState state, CancellationToken cancellationToken)
      => new(Validate(input, vc, state));
}


public interface IGroupValidatable
{
   ValueTask<ResultReport?> ValidateAsync(IEnumerable<IScopedNode> input, ValidationSettings vc, ValidationState state, CancellationToken cancellationToken)
      => new(Validate(input, vc, state));
}

Performance

The async validate API also appears to be faster according to benchmarks:

Method Resource Mean Error StdDev Ratio
LegacyValidator Hippocrates.practitioner 3,438.0 μs 67.57 μs 109.11 μs 1.00
NewValidator Hippocrates.practitioner 464.2 μs 8.45 μs 16.48 μs 0.14
NewValidatorAsync Hippocrates.practitioner 170.4 μs 3.34 μs 5.58 μs 0.05
LegacyValidator Levin.patient 47,261.2 μs 931.72 μs 1,035.60 μs 1.00
NewValidator Levin.patient 1,009.2 μs 20.03 μs 47.60 μs 0.02
NewValidatorAsync Levin.patient 733.7 μs 13.67 μs 24.99 μs 0.02
LegacyValidator MainBundle.bundle 71,836.7 μs 1,367.22 μs 1,342.79 μs 1.00
NewValidator MainBundle.bundle 5,487.6 μs 116.04 μs 340.33 μs 0.08
NewValidatorAsync MainBundle.bundle 5,212.4 μs 122.50 μs 361.20 μs 0.07
LegacyValidator patient-clinicalTrial.xml 37,239.2 μs 673.64 μs 597.17 μs 1.00
NewValidator patient-clinicalTrial.xml 3,630.5 μs 71.53 μs 147.71 μs 0.10
NewValidatorAsync patient-clinicalTrial.xml 2,726.8 μs 54.42 μs 113.60 μs 0.07
LegacyValidator Weight.observation 55,374.9 μs 1,086.31 μs 1,958.84 μs 1.00
NewValidator Weight.observation 2,121.3 μs 71.24 μs 210.06 μs 0.04
NewValidatorAsync Weight.observation 1,707.1 μs 33.73 μs 85.87 μs 0.03
@ewoutkramer
Copy link
Member

This is interesting, as we initially had an async interface, but due to the fact that many validations (cardinality, string length, patterns, etc) were merely very short computes, the overhead of Task/await on all those nested validates became noticable, so we removed it. I guess it depends on the ratio of I/O bound validations and simple, compute validations. And as we know, async tends to spread everywhere, so we cannot have non-async validates for simple computes, and async validates for i/o bound ones. At least I think so, I'd love to be proven wrong ;-) As it stands, I don't feel like re-introducing async anytime soon after our previous experiences.

@almostchristian
Copy link
Author

In your previous testing, did you use ValueTask or Task for the validators? I agree that superfluous use of async can hurt performance and it may take testing many use cases to definitively say which one is better for performance (and in the end the answer may be 'it depends'). I think the validator should support both synchronous and asynchronous validation, and it will be up to the end users to validate which is the best one to use for their use case.

For now, the main advantage to using async is that it preserves the traces in flame graphs which makes it easier to diagnose performance issues. In the synchronous version, the occasional calls to TaskHeper.Await changes the stack traces

Flame graph (synchronous)
image

Flame graph (async)
Screenshot 2024-07-02 125520

@brianpos
Copy link

brianpos commented Jul 2, 2024

My QR validator has a hybrid of both approaches.
https://github.com/brianpos/fhir-net-web-api/blob/66e90f72c6adcdf36f436a313e8e886fa61abf9e/src/Hl7.Fhir.StructuredDataCapture/QuestionnaireResponseValidator.cs#L742
It has an async top level, with a List to track any async requestst that are made, which it then waits on before returning.
So anything that has an async result is added to the list and everything else just continues synchronously.
All issues are added into the OutcomeIssues list too, which is what is returned at the end.

@almostchristian
Copy link
Author

One benefit to an async validtion API is that potentially, validation could take a long time and it is possible that the client has cancelled the validation request. Respecting the cancellation token can help with scalability.

@mmsmits
Copy link
Member

mmsmits commented Aug 29, 2024

We think async will lead to too much overhead current, since most validation are so fast that starting up a task will take more time than actually validating the validation blocks.

@mmsmits mmsmits closed this as completed Aug 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants