-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarify use (or extend design) of SARIF to express hierarchical diagnostics. #572
Comments
Context: in Visual Studio we're working on hierarchical diagnostics to present compiler errors in a more structured way than plain text. Clang is also working on a similar project. A C++ example: struct dog{};
struct cat{};
void pet(dog);
void pet(cat);
template <class T>
concept pettable = requires(T t) { t.pet(); };
template <pettable T>
void pet(T);
struct donkey {};
int main() {
pet(donkey{});
} MSVC outputs an error message like this in plain text:
However, there's a logical structure which isn't represented here, which looks something like:
In order to represent this hierarchy, we want to be able to encode it in SARIF. However, there's currently no facility for this. The way we do this today is by placing the nested diagnostics in a For example: {
"ruleId":"C2665",
"level":"error",
"message":{
"text":"'pet': no overloaded function could convert all the argument types"
},
"locations":[
{
"physicalLocation":{
"artifactLocation":{
"uri":"diag.cpp"
},
"region":{
"startLine":35,
"startColumn":4
}
}
}
],
"relatedLocations":[
{
"id":0,
"physicalLocation":{
"artifactLocation":{
"uri":"diag.cpp"
},
"region":{
"startLine":11,
"startColumn":5
}
},
"message":{
"text":"could be 'void __cdecl pet(struct dog)'"
}
},
{
"id":1,
"physicalLocation":{
"artifactLocation":{
"uri":"diag.cpp"
},
"region":{
"startLine":35,
"startColumn":4
}
},
"message":{
"text":"'void __cdecl pet(struct dog)': cannot convert argument 1 from 'int' to 'struct dog'"
},
"properties":{
"nestingLevel":1
}
},
{
"id":2,
"physicalLocation":{
"artifactLocation":{
"uri":"diag.cpp"
},
"region":{
"startLine":35,
"startColumn":8
}
},
"message":{
"text":"No constructor could take the source type, or constructor overload resolution was ambiguous"
},
"properties":{
"nestingLevel":1
}
}
]
} I'd like for all C++ compilers to agree on how to encode hierarchical diagnostics, and having it standardised in SARIF seems to be the best way to achieve this. |
Clang is in the middle of working out how to restructure its diagnostics internally, which is driven by the desire for this. |
Sorry for the delay on this topic. We are discussing at tomorrow's TC meeting. |
There's some appetite in the ISO C++ standards group for converging on SARIF with an extension like this. Has this item been discussed by the SARIF group yet? |
Sy posted this updated proposal to SG15: P3358R0 SARIF for Structured Diagnostics; I'm taking a look from the GCC side. |
@TartanLlama Hi, thanks for your various proposals. FWIW I now have a working (but messy) prototype of your latest proposal in GCC: As I understand it, every I'm about to try adding support for capturing "include chains" to GCC's SARIF output, which is going to mean adding lots of So I think this might be better expressed as a new "kind" within FWIW I'm also interested in capturing information about macro expansions, which might be another kind of location relationship; see #650. Your proposal also mentions using JSON-RPC for sending sarifResult objects one at a time. FWIW I've filed this RFE to implement something like this in GCC (no promises though!) |
To clarify: clearly you want to preserve the ordering of the messages, but if, say, there are include chains in some of the messages, we need to refer to the locations of the If we omit But ideally if a bunch of locations appear in one Sorry if the above is unclear |
Thank for these thoughts, @davidmalcolm (sorry for the delay in responding, I've been on vacation). Yeah, everything you wrote makes sense to me, I can see how the current design can have issues in these cases. Let's say we did as you suggest and made a new "result": {
"message": {
"text": "'pet': no overloaded function could convert all the argument types"
},
"locations": [
{
"id": 0,
"relationships": [
{
"target": 1,
"kinds": [
"contains"
]
}
]
}
],
"relatedLocations": [
{
"id": 1,
"message": {
"text": "or 'void pet(_T0)'"
}
"relationships": [
{
"target": 2,
"kinds": [
"contains"
]
}
]
},
/*...*/
]
} Is that what you had in mind? Do you see any potential issues with this design? I'll ask the other folks at Microsoft who have been working on this. |
@TartanLlama Thanks for the feedback. FWIW I'm now tracking this idea in GCC's bugtracker as an RFE as https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116253 I've attached my patch for G++ and its .sarif output there. This implements the original
Yes, very much so - thanks!
I'll take a look at implementing the relatedLocations version of the proposal to see how well it holds up from the GCC producer side of things. I'll try to put together a version of the testcase that uses
Thanks again |
@TartanLlama I had a go at doing this, chopping up your test case into https://gist.github.com/davidmalcolm/bc84ca043be0c0ef3a3a4d1e9d748c2b It turned out my GCC implementation of your So I think it happens to work for my implementation, but wouldn't if a producer happened to walk the graph of relationships in a different way when writing out the SARIF. |
@TartanLlama FWIW we're having a meeting of the SARIF technical committee on Thursday; I'll try to raise this issue there. |
@TartanLlama There's a whole other aspect to your proposal (beyond nested diagnostics) that I found interesting, which is that in 3.1.1. MSVC you document that MSVC can look for SARIF_OUTPUT_PIPE and send its diagnostics as JSON-RPC notifications there. This seems really useful for IDE integration, and maybe there's an appetite for standardizing it? What you document seems to be very Windows-specific (using a UTF-16-encoded integer representation of a HANDLE). I've had a go at implementing it for GCC on Linux. I'm tracking that in GCC's bug tracker as RFE: add support for sending SARIF output via IPC. My first attempt was using a named FIFO, but that had the limitation of only being for a single tool: once the FIFO is closed, we're done. The latest patch uses unix domain sockets: If the environment has SARIF_SOCKET set to a path to a socket, then GCC diagnostics are emitted as JSON-RPC notifications to the socket in a manner akin to that for "SARIF_OUTPUT_PIPE" in P3358R0, albeit with Unix domain sockets. The patch also implements a trivial "sarif-listener" program that creates the socket and listens for notifications. I can run that in one terminal, and run various builds using GCC in another, and see the diagnostics from those builds appear as SARIF notifications in the first terminal. My patch currently adds (and requires) the command-line option The precise form of the RPC messages isn't quite clear to me. For example, should there be any trailing newlines after the JSON itself? I also wonder if we might want separate notifications for "newResult" and "endOfRun", where the latter could contain information on the tool etc (and maybe also a startOfRun notification?), so that when a connection comes in to the socket it gets metadata as well as just diagnostics. Hope this is constructive |
Makes sense, I guess we could specify that a producer would have to output locations in this manner, but it seems pretty fragile, better to have a more robust encoding method.
Would it help if I were to join as well?
For the SARIF standard, yeah, I think it could make sense for the spec to outline a streaming format for SARIF results, potentially based on JSON RPC. The pipe communication seems perhaps outside the purview of the SARIF format though, what do you think? For SG15, the idea is for all of this to end up in the Ecosystem IS. I think after the initial paper has been discussed by SG15, we can decide the best course of action there.
There should not be any trailing newlines after the JSON itself. Here's an example of what's sent along the pipe:
At least for our use cases we have other ways to extract the toolset information we need (e.g. from the CMake info), but this is certainly something to consider if use cases for it come up! |
I suppose the question here is about when the precise ordering within Another approach might be to amend the original nestingLevel proposal so that nesting levels only apply to locations with
I asked about this, but unfortunately, OASIS no longer allows guests in meetings, only members. I'm happy to have an informal videochat with you if it would be helpful, or maybe it's better to keep things within the issue tracker? Not sure.
I confess I'm unfamiliar with the workflows/relationships of SG15 and the Ecosystem IS, but I can at least talk about all this with the SARIF TC.
Am I right in thinking that MSVC is effectively reusing the Base Protocol part of the LSP for "framing" the JSON-RPC notifications?
For your |
Thanks. For my reference, I had a look at the boundary between the two notifications in that output, here:
where the 2nd |
Note to self: It appears that MSVC is using the |
Maybe you can discuss with the SARIF TC and I can talk to SG15 and then we can schedule a videocall?
SG15 have teleconferences between in-person WG21 meetings, the next should be in the next week or so. The Ecosystem IS is a planned international standard that SG15 is working on that standardises tooling elements of C++ that until now have lived outside the purview of the C++ standard. SARIF output and communication is part of that.
Yeah, as you note in a later message, it's part of the existing vs-streamjsonrpc format.
Currently MSBuild creates a pipe per compiler invocation and passes back information to the IDE via existing VS <-> MSBuild communication channels |
Adding my very quick comments here summarized from the TC discussion today (caveating that I'm not a heavy user of these features). Also citing to leverage Aditya for CodeQL in particular, as it has more hierarchical data that what I've leveraged historically. Overall, it feels like the result object already has properties that support similar structure of data (codeflows, graphs, graphtraversals, stacks). The related locations list, to me, is best served being kept simple as a reference material for the select properties that could reference it. This data looks like you want a |
Adding my own summary of the 2024-08-08 TC discussion about this: We brainstormed a bit on the various ways that SARIF could be extended to support nested messages:
IIRC the consensus was that we hadn't yet got enough implementation experience with the idea to be able to recommend a particular approach (but I could be misremembering), and I said I'll continue to experiment with doing it from the GCC side. After the meeting another idea occurred to me: to extend 3.11 I hope I'm correctly remembering/summarizing the discusssion. |
I'll try to summarize the various unknowns raised by P3358R0 SARIF for Structured Diagnostics here; the wording is my own, though: How should nested/hierarchical diagnostics be expressed in SARIF?See e.g. #572 (comment); there's also been discussion about using the Do we need to generalize SARIF to cover "dynamic" or "progressive" notifications about results, rather than dealing with a log that is already complete?I've filed this as #661 How should an IDE interact with a compiler via SARIF?This feels like a superset of #661, but also supposes some form of IPC. |
After another look at this, my inclination would currently be to figure out how to t5ransform your relatedLocations reference within the graphtraversal objects. Nodes have locations today: https://docs.oasis-open.org/sarif/sarif/v2.1.0/csprd01/sarif-v2.1.0-csprd01.html#_Toc10541213 |
I've tried converting the current A couple questions:
|
Really appreciate you taking a shot at that transform - looks promising! I think to both of your questions, it comes to client consumption / rendering. I don't think you want to go the path of having custom rendering for a particular id or description - this won't scale to any other producer trying to use your rendering. When folks encode information in a graph, that graph should be referenced or leveraged in some way so that consumption of it can consistently be done across any type of issue. Graphs visualization is not unlike any other other element on a I'll summon @aditya on this one for codeflows - if we have prior art on how consumption chooses to pull these in or not. Another idea that comes to mind would be having a unique language element to allow referencing other properties of the |
FWIW I've implemented some of this for GCC 15, albeit using Here's an example of the nested diagnostics on Compiler Explorer: Trunk with Text output (via SARIF output (via |
No description provided.
The text was updated successfully, but these errors were encountered: