Add support for sharing state across analyzer instances #6324

mavasani · 2015-10-26T18:05:52Z

Currently, our diagnostic analyzer API provides no support for sharing any state between diagnostic analyzers. This causes extreme pain or performance issues for our customers when they have some expensive computation and/or large data to share across analyzer instances. For example:

Group of analyzers that need identical configuration settings from an additional file - they have identical computation logic and resulting data, and would get a real performance benefit if they can share this data across analyzer instances without doing any complicated lifetime management.
StyleCop analyzers want to walk through all trees in a compilation and identify generated files - they want to do it just once per compilation across all analyzer instances. In absence of any API support, they ended up creating a static conditional weak table keyed on compilation, which caused GC related perf problems when executing in the IDE. See Improve state management in analyzers DotNetAnalyzers/StyleCopAnalyzers#1671 for the workaround they put to avoid CWT.

An ideal solution here is to provide an analyzer API to share per-compilation and per-additional file data across analyzer instances, so that analyzer authors don't have to explicitly deal with lifetime management for such state. I am going to propose an initial API for this feature, and we can iterate over the best design for it.

Principles:

API should guarantee easy state access/creation within all analyzer callbacks.
Analyzer author needs to handle concurrent access/update to shared state instance. (@JohnHamby proposes we should think about enhancing the API to have this as an opt-in rather a requirement)
[Implementation] No conditional weak tables for storing share state: CWT are evil in most instances, the shared state should be strongly referenced within CompilationWithAnalyzers/AnalyzerDriver instance.

Proposed API (per-compilation state):
Add the below API to each DiagnosticAnalysisContext parameter type for analyzer callbacks:

TState GetOrCreateSharedState<T, TState>(Key<T> key, Func<Key<T>, TState> createSharedState) where T: object, TState: object

where public sealed class Key<T> { } is a separate definition.

The text was updated successfully, but these errors were encountered:

mavasani · 2015-10-26T18:07:04Z

Tagging people who might be interested in this feature/design discussion: @sharwell @tmeschter @srivatsn @JohnHamby @heejaechang @pharring

msJohnHamby · 2015-10-26T18:16:43Z

Can we do better in terms of the safety of the concurrency model?

msJohnHamby · 2015-10-26T18:19:01Z

If we are going to require analyzers that share state to be in the same assembly, can't any analyzers that want to share state be written as a single analyzer with multiple actions?

msJohnHamby · 2015-10-26T18:20:41Z

Adding @gafter @mattwar

sharwell · 2015-10-26T18:27:28Z

Can't any analyzers that want to share state be written as a single analyzer with multiple actions?

We would prefer to avoid that, as it complicates the implementation of single analyzers. For example, it's currently easy to review the behavior of SA1018 (NullableTypeSymbolsMustNotBePrecededBySpace) against the documentation for that rule. It would not be nearly so easy if the analysis behavior for 150+ other violations was included in the same analyzer.

sharwell · 2015-10-26T18:33:59Z

📝 Another thing to consider here. In DotNetAnalyzers/StyleCopAnalyzers#1683 and DotNetAnalyzers/StyleCopAnalyzers#1689, we dramatically reduced the number of allocations required to register our analyzers with a new compilation. However, due to our use of shared state we are currently unable to remove the closure created here, which results in an allocation for every call we make to RegisterSyntaxNodeAction (there are perhaps 1000 such calls overall).

An ideal solution to state management would allow us to reduce or eliminate these per-compilation allocations.

heejaechang · 2015-10-26T18:44:16Z

@mavasani why limiting sharing state per assembly make threading model more safe? I thought we only serialize analyzer execution not assembly? are we saying if an analyzer from a assembly register shared state, we will serialize all analyzer executions from that assembly regardless whether a particular analyzer from the assembly uses shared state or not?

sharwell · 2015-10-26T18:45:27Z

Allow state sharing only between analyzer instances from the same assembly - this is needed to ensure a secure threat model, we don't want unrelated analyzers to be able to access/manipulate each other's state.

In my opinion, we don't need this goal. The intended outcome has questionable value at best, but the impact could result in very real limitations for analyzer developers. For example, in the future we may want to provide a common API for detecting generated code in a StyleCop-compliant manner, which many analyzers then use.

API should guarantee easy state access/manipulation within all analyzer callbacks. However, analyzer author needs to handle concurrent access/update to shared state instance.

We've found that simply having a singly-initialized value associated with a compilation would be sufficient. For example, all we need for generated files support is to associate a single (initially empty) instance of ConcurrentDictionary<SyntaxTree, bool> with a Compilation instance, and then access that dictionary in each analyzer. We never need to modify the dictionary instance which is associated with the compilation, even though we do have code to modify the contents of that instance.

I believe the priorities for the API should be:

Allow lazily- or singly-initialized object to be associated with a compilation (allocating one dictionary even eagerly would not be a problem)
Reduce the number of allocations required when registering analyzer callbacks for the case where shared state is used

[Implementation] No conditional weak tables for storing share state: CWT are evil in most instances, the shared state should be strongly referenced within CompilationWithAnalyzers/AnalyzerDriver instance.

More specifically, after the data is initialized we need access to this data to be lock-free. The implementation of CWT does not allow us to meet this requirement.

heejaechang · 2015-10-26T18:47:34Z

@mavasani ah, I misinterpret the meaning of thread model. sorry. ignore my previous comment.

sharwell · 2015-10-26T18:47:39Z

@heejaechang

why limiting sharing state per assembly make threading model more safe?

The original post says threat, which I interpreted as a discussion of security, as opposed to thread which would be a discussion of concurrency.

heejaechang · 2015-10-26T18:50:04Z

@mavasani so this reduce repeated work per each analyzer but analyzer need to repeat the same work per compilation, right? I think that is good goal. we should clear the shared state as soon as all analysis of a compilation has finished. otherwise we could get into memory leak.

mavasani · 2015-10-26T18:51:48Z

@heejaechang @sharwell We need some kind of threat model/security to ensure unrelated analyzers cannot view/modify state or worse throw exceptions based on the assumption about the data type of the state. I was proposing we start with drawing a boundary around the assembly in which analyzers exists, and can extend the API later to support sharing data across assemblies. However, we do need some measures to prevent unrelated analyzers from accidentally/purposefully corrupt shared data.

heejaechang · 2015-10-26T18:55:50Z

@mavasani @sharwell 2 questions.

I think we can change "key" from string to object. by doing it, the author can decide whether they want to share their state or not by making the object key shared between analyzers. since we don't restrict the state as a serializable object, I think we can just use object as key.
why limit TState as object if we are not going to use CWT?

sharwell · 2015-10-26T18:55:59Z

so this reduce repeated work per each analyzer but analyzer need to repeat the same work per compilation, right?

We've actually been successful in accomplishing this, but it's been a challenge with many pitfalls. We've completely changed the way we manage per-compilation state twice as a result of performance problems that snuck in. The primary benefit of a common API for us will be the elimination of many additional allocations (several hundred at least) which occur during the registration of our analyzers.

We need some kind of threat model/security to ensure unrelated analyzers cannot view/modify state or worse throw exceptions based on the assumption about the data type of the state.

If you change the key to be a Type which is compared by reference (as opposed to a string), then the simple use of a type which is not publicly exposed as a key would be sufficient to address this concern.

pharring · 2015-10-26T18:56:04Z

If you use "object" as the key instead of "string", then you can leave it up to the analyzer authors to determine "sharability". If you need private state, then use a "private static readonly object" key. If you need something visible to other analyzers in the same assembly, then use "internal", etc.

#5542 had that

mavasani · 2015-10-26T19:03:14Z

@pharring that is surely a better approach, let me update the original comment accordingly.

heejaechang · 2015-10-26T19:04:11Z

@mavasani can we remove this API.
bool TryGetSharedState(string key, out TState value) where TState: object

can one always use GetOrCreate ? that will remove people to assume there is any dependency we guarantee.

heejaechang · 2015-10-26T19:07:49Z

@mavasani also, can we remove "State" from the API and rather put "Cache" in them. so that people doesn't use it as a state but use as a "Cache". implication is, "state" can break functionality but cache can't. this also remove assumption on dependency and also, we don't need to re-think our statefull and stateless analyzer story.

mavasani · 2015-10-26T19:14:52Z

can we remove this API.
bool TryGetSharedState(string key, out TState value) where TState: object

Yes, probably but API design probably want to decide that. If the analyzer wants to have null as an allowable value for the data for exceptional circumstances, then a reader invoking GetOrCreate needs to do an additional null check. Just a minor API preference..

can we remove "State" from the API and rather put "Cache" in them. so that people doesn't use it as a state but use as a "Cache".

I thought we explicitly want it to be strongly referenced state, that lives the lifetime of the compilation. We don't want to create conditional weak table or weakly referenced property bags or any kind of cache, that has already proven to be bad for IDE perf. Am I misunderstanding your comment?

gafter · 2015-10-26T19:26:12Z

I would prefer this API be typesafe, for example as in #5708.

heejaechang · 2015-10-26T19:26:46Z

@mavasani

"Yes, probably but API design probably want to decide that. If the analyzer wants to have null as an allowable value for the data for exceptional circumstances, then a reader invoking GetOrCreate needs to do an additional null check. Just a minor API preference.."

user can still return null from GetOrCreateXXX if they want to.

"I thought we explicitly want it to be strongly referenced state, that lives the lifetime of the compilation. We don't want to create conditional weak table or weakly referenced property bags or any kind of cache, that has already proven to be bad for IDE perf. Am I misunderstanding your comment?"

this is not what I meant. implementation-wise, leave it exactly as you suggested.

what I meant was just remove "State" from API and rather put "Catch" in them. and tell them you can't depends on the data for your functional correctness. so stateless or statefull analyzer both still can use this API without making them stateful analyzer. conjunction with the removing of "TryXXXX" API, all user of the API, which uses GetOrCreateXXX should be able to create the cache if it doesn't exist.

sharwell · 2015-10-26T19:37:56Z

I like the simplicity of a single GetOrCreateXXX method, and it resolves every single case we have where we need to register per-compilation analyzer callbacks as opposed to calling back to static methods which are registered once for the lifetime of the analyzer assembly.

For this to work, it is essential that the fast path (case where the item already exists) be very efficient in a concurrent environment, because it could be invoked in each of the analysis callbacks (potentially dozens of times per SyntaxNode depending on the number of analyzers enabled).

I would prefer this API be typesafe, for example as in #5708.

We could get this by replacing the object key in the proposed API with Key<T>.

mavasani · 2015-10-26T19:38:07Z

@gafter I have updated the proposed API to take a TKey instead of object key, hopefully this is what you meant.

@heejaechang

tell them you can't depends on the data for your functional correctness

Why would we do so? The fact that you want per-compilation shared state makes you a stateful analyzer, and we should be able to guarantee functional correctness by keeping this object alive as long as the underlying compilation lives. We can just store the state on CompilationWithAnalyzers instance, which has strong reference to the underlying compilation on which analyzers are executed. Analyzers can safely assume that we will initialize this state only once per compilation.

gafter · 2015-10-26T19:41:16Z

@mavasani No, it doesn't enforce any relationship between the type of the key and the type of the value. The implementation of TryGetSharedState will contain a cast that can fail. Please see #5708 for a model of type safety for this kind of API.

sharwell · 2015-10-26T19:42:37Z

I have updated the proposed API to take a TKey instead of object key, hopefully this is what you meant.

Define a class Key<T> as follows:

public sealed class Key<T> { }

Then update the API to require the key's type match the created value's type:

TState GetOrCreateSharedState<TState>(Key<TState> key, Func<object, TState> createSharedState)
    where TState: class

gafter · 2015-10-26T19:43:41Z

@sharwell Yes, almost like that. get rid of the object:

TState GetOrCreateSharedState<TState>(Key<TState> key, Func<Key<TState>, TState> createSharedState)
    where TState : class

heejaechang · 2015-10-26T19:46:09Z

@mavasani first analyzer become stateful analyzer. you get one benefit and you lost one - concurrency since state basically implies dependency.

since it is state that they relies on for functionality, the order of state mutation must be correct and observable from all callers.

making it not state is much simpler and stateless analyzer can use it as well.

msJohnHamby · 2015-12-04T23:13:39Z

Here's a proposal to cover reading an external resource from multiple analyzers. It does not cover sharing mutable state between analyzers.

A significant aspect of the issues discussed here is a need for multiple analyzers to read configuration, options, or directive context from external resources. Reading and parsing a resource can be expensive, motivating a desire to read a given resource once per compilation or per analysis session rather than once per analyzer per compilation. Here is a proposed API with the following goals:
-- Minimize the number of times a resource is read and parsed.
-- Guarantee that a resource will be reread if it changes between accesses.

To be added to:
-- AnalysisContext
-- CompilationStartAnalysisContext
-- CompilationAnalysisContext

We propose:
bool TryGetResource<T>(Key<T> key, SourceText text, Func<SourceText, T> reader, out T result)

Semantics:
-- At first invocation for a given key value and a given text value, invoke the reader function with the text value. If the reader returns normally, set result to the value returned by the reader and return true. If the reader terminates with an exception, set result to Default(T) and return false.
-- A subsequent invocation with an equivalent key value and equivalent text value is permitted (and is likely) to bypass invoking the reader, set result to the previous result, and return the same value as the prior invocation. However, it is also permitted to invoke the reader function again.
-- A subsequent invocation with a different key value or a different text value must invoke the reader.

Caching can cross contexts. For example, if there is a call to AnalysisContext.TryGetResource in one analyzer and an equivalent call to CompilationStartAnalysisContext.TryGetResource in another, the resource will likely be read only once. The lifetime of a cached result is arbitrary, but is expected to be as long as practical without introducing excessive memory pressure.

Expectations on clients:
-- Every invocation of a given reader with equivalent text will produce equivalent results.
-- All invocations of these methods for a given key will supply readers with equivalent behavior and text values intended to represent (versions of) the same resource.
-- The value returned by a reader will not be mutated carelessly.
-- For a given key, a single analyzer will make no more than one invocation of AnalysisContext.TryGetResource and no more than one invocation per compilation of CompilationStartAnalysisContext.TryGetResource or CompilationAnalysisContext.TryGetResource, and will invoke no more than one of these methods.

Violating these expectations can produce difficult-to-explain behavioral inconsistencies.

sharwell · 2015-12-05T01:55:27Z

For a given key, a single analyzer will make no more than one invocation of AnalysisContext.TryGetResource and no more than one invocation per compilation of CompilationStartAnalysisContext.TryGetResource or CompilationAnalysisContext.TryGetResource, and will invoke no more than one of these methods.

I don't understand the need for either of these limitations. Can you clarify how this benefits the implementation and the reason it needs to be different from the behavior described by your other expectations?

msJohnHamby · 2015-12-05T03:37:01Z

Violating these restrictions might produce a surprise. If the resource changes between invocations, the result you get back will also change. If the resource has not changed and you supply an equivalent reader delegate, you might get back an identical object from two invocations, or you might get different (but equivalent) results. If you supply a different delegate to two invocations with the same key, the second invocation might return the results of the first and not call the delegate supplied with the second invocation.

The implementation won't enforce these restrictions. It also won't do anything to guarantee predictable behavior if they are violated.

sharwell · 2015-12-05T05:26:40Z

If the resource changes between invocations, the result you get back will also change. If the resource has not changed and you supply an equivalent reader delegate, you might get back an identical object from two invocations, or you might get different (but equivalent) results. If you supply a different delegate to two invocations with the same key, the second invocation might return the results of the first and not call the delegate supplied with the second invocation.

Based on the description in the previous comment, none of these behaviors is unexpected. I think what you currently have documented as an expectation on clients would be better worded as a recommendation that care be taken, along with clear examples like you provided in the follow-up. With the expectation in place, clients will not be able to leverage this functionality even if the developers understand the special considerations and know that they would not negatively impact their analyzers.

mavasani · 2015-12-09T23:15:41Z

Thanks @JohnHamby for the proposal.
@sharwell @paul1956 Apart from the minor modifications to the wording of the semantics/expectations, does the proposal seem adequate for your requirements?

sharwell · 2015-12-09T23:22:08Z

Apart from the minor modifications to the wording of the semantics/expectations, does the proposal seem adequate for your requirements?

Overall, yes. Some questions remain which may or may not be in scope of this:

Where does the SourceText instance come from (second parameter of proposed API)?
Would a similar API be added for code fix contexts?

mavasani · 2015-12-10T22:11:40Z

Where does the SourceText instance come from (second parameter of proposed API)?

Additional files: http://source.roslyn.io/Microsoft.CodeAnalysis/R/b33eb1a9beed263a.html
Regular source files: http://source.roslyn.io/Microsoft.CodeAnalysis/R/5967ebe19d531e5d.html

Would a similar API be added for code fix contexts?

Seems reasonable to me. I'll file a separate issue for it.

…ross analyzer actions and also across different analyzer instances. Fixes dotnet#6324

mavasani added Area-Analyzers Concept-API This issue involves adding, removing, clarification, or modification of an API. Feature Request labels Oct 26, 2015

srivatsn added Story Feature Request and removed Feature Request Story labels Nov 24, 2015

srivatsn mentioned this issue Nov 29, 2015

Analyzer API improvements for enabling performant analyzers #7098

Closed

2 tasks

srivatsn assigned mavasani Nov 30, 2015

jmarolf mentioned this issue Jan 13, 2016

Problems with referencing Microsoft.CodeAnalysis.Workspaces from an analyzer #7929

Closed

This was referenced Jan 23, 2016

Keeping Analyzer state between projects of the same solution #7766

Closed

"Go to Implementation" across projects with an Analyzer? #8144

Closed

mavasani mentioned this issue Feb 4, 2016

Enable per-file state caching in analyzer. The state can be shared ac… #8375

Merged

mavasani added a commit to mavasani/roslyn that referenced this issue Feb 5, 2016

Enable per-file state caching in analyzer. The state can be shared ac…

c42a411

…ross analyzer actions and also across different analyzer instances. Fixes dotnet#6324

mavasani closed this as completed in #8375 Feb 6, 2016

Karvan mentioned this issue Mar 29, 2018

GCop140 Geeksltd/GCop#6

Closed

jzabroski mentioned this issue Jul 18, 2018

SQL translation should be more generic and operate on end results dotnet/efcore#12552

Closed

allantargino mentioned this issue May 3, 2024

Attribute Binding Roslyn Analyzers microsoft/durabletask-dotnet#301

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for sharing state across analyzer instances #6324

Add support for sharing state across analyzer instances #6324

mavasani commented Oct 26, 2015

mavasani commented Oct 26, 2015

msJohnHamby commented Oct 26, 2015

msJohnHamby commented Oct 26, 2015

msJohnHamby commented Oct 26, 2015

sharwell commented Oct 26, 2015

sharwell commented Oct 26, 2015

heejaechang commented Oct 26, 2015

sharwell commented Oct 26, 2015

heejaechang commented Oct 26, 2015

sharwell commented Oct 26, 2015

heejaechang commented Oct 26, 2015

mavasani commented Oct 26, 2015

heejaechang commented Oct 26, 2015

sharwell commented Oct 26, 2015

pharring commented Oct 26, 2015

mavasani commented Oct 26, 2015

heejaechang commented Oct 26, 2015

heejaechang commented Oct 26, 2015

mavasani commented Oct 26, 2015

gafter commented Oct 26, 2015

heejaechang commented Oct 26, 2015

sharwell commented Oct 26, 2015

mavasani commented Oct 26, 2015

gafter commented Oct 26, 2015

sharwell commented Oct 26, 2015

gafter commented Oct 26, 2015

heejaechang commented Oct 26, 2015

msJohnHamby commented Dec 4, 2015

sharwell commented Dec 5, 2015

msJohnHamby commented Dec 5, 2015

sharwell commented Dec 5, 2015

mavasani commented Dec 9, 2015

sharwell commented Dec 9, 2015

mavasani commented Dec 10, 2015

Add support for sharing state across analyzer instances #6324

Add support for sharing state across analyzer instances #6324

Comments

mavasani commented Oct 26, 2015

mavasani commented Oct 26, 2015

msJohnHamby commented Oct 26, 2015

msJohnHamby commented Oct 26, 2015

msJohnHamby commented Oct 26, 2015

sharwell commented Oct 26, 2015

sharwell commented Oct 26, 2015

heejaechang commented Oct 26, 2015

sharwell commented Oct 26, 2015

heejaechang commented Oct 26, 2015

sharwell commented Oct 26, 2015

heejaechang commented Oct 26, 2015

mavasani commented Oct 26, 2015

heejaechang commented Oct 26, 2015

sharwell commented Oct 26, 2015

pharring commented Oct 26, 2015

mavasani commented Oct 26, 2015

heejaechang commented Oct 26, 2015

heejaechang commented Oct 26, 2015

mavasani commented Oct 26, 2015

gafter commented Oct 26, 2015

heejaechang commented Oct 26, 2015

sharwell commented Oct 26, 2015

mavasani commented Oct 26, 2015

gafter commented Oct 26, 2015

sharwell commented Oct 26, 2015

gafter commented Oct 26, 2015

heejaechang commented Oct 26, 2015

msJohnHamby commented Dec 4, 2015

sharwell commented Dec 5, 2015

msJohnHamby commented Dec 5, 2015

sharwell commented Dec 5, 2015

mavasani commented Dec 9, 2015

sharwell commented Dec 9, 2015

mavasani commented Dec 10, 2015