Fingerprint in gitlab report should not use location in code as part of the hashing algorithm #7159

gregersn · 2023-09-05T15:00:33Z

ruff 0.0.287

Using the location in the code file as part of the hashing for the fingerprint causes a comparison between reports to show one issue as fixed, and a new one as having come to, if other parts of a file is changed, causing the line number to change.

As a comparison, this is how eslint-gitlab-reports calculate hashing: https://gitlab.com/remcohaszing/eslint-formatter-gitlab/-/blob/main/index.js?ref_type=heads#L72

In its (ruff) current for, the output in gitlab will show a lot of noise when reporting on code quality if, say, a line is inserted above all the linting issues in a code file, and moving them one line down. They will all get new fingerprints, and the fingerprints that were in the old report will be gone.

MichaReiser · 2023-09-05T19:46:15Z

Thanks for reporting this and linking to an alternative implementation. ESlint's approach is interesting because it provides a more stable result. The only shortcoming that I can think of right now (if I understand it correctly) is that it won't report a lint if an error for the same rule in the same file was fixed but a new (otherwise than the location identical) violation was introduced. But this seems less of an error compared to Ruff over-reporting on newly introduced and fixed diagnostics for every PR.

The one thing that's unclear to me as someone who hasn't used GitLab extensively is whether this is a breaking change. I would assume that GitLab will mark ALL diagnostics as changed when upgrading from the current hash function to the new hash function. Do you know if that's correct and how disruptive that would be?

gregersn · 2023-09-05T19:56:35Z

Depends on your definition of a breaking change.
And how intrusive it would be would depend on how people treat their tooling environments.
The upgrade would, with all likely hood trigger all of them changed, for that one pipeline run of main/master after the upgrade, and then after that all would be back to normal. There is a place to view the reports for each pipeline run, but it is loaded for that run, and doesn't really show any of the previous.

So as for how disruptive? Not more than what I would say was worth it to not have almost every run report on new and fixed errors, that are just results of lines being moved around.

MichaReiser · 2023-09-06T15:08:33Z

@gregersn would you be interested in contributing the change? I can point you to the relevant code and provide support.

@zanieb a potential use case for --preview

gregersn · 2023-09-06T15:54:48Z

I have never used Rust, but I am taking a look at it now.

MichaReiser · 2023-09-06T16:00:34Z

I have never used Rust, but I am taking a look at it now.

Happy to help you get started.

The relevant code is

ruff/crates/ruff/src/message/gitlab.rs

Lines 55 to 100 in ea72d5f

    
           impl Serialize for SerializedMessages<'_> { 
        
               fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error> 
        
               where 
        
                   S: Serializer, 
        
               { 
        
                   let mut s = serializer.serialize_seq(Some(self.messages.len()))?; 
        
                   for message in self.messages { 
        
                       let start_location = message.compute_start_location(); 
        
                       let end_location = message.compute_end_location(); 
        
                       let lines = if self.context.is_notebook(message.filename()) { 
        
                           // We can't give a reasonable location for the structured formats, 
        
                           // so we show one that's clearly a fallback 
        
                           json!({ 
        
                               "begin": 1, 
        
                               "end": 1 
        
                           }) 
        
                       } else { 
        
                           json!({ 
        
                               "begin": start_location.row, 
        
                               "end": end_location.row 
        
                           }) 
        
                       }; 
        
                       let path = self.project_dir.as_ref().map_or_else( 
        
                           || relativize_path(message.filename()), 
        
                           |project_dir| relativize_path_to(message.filename(), project_dir), 
        
                       ); 
        
                       let value = json!({ 
        
                           "description": format!("({}) {}", message.kind.rule().noqa_code(), message.kind.body), 
        
                           "severity": "major", 
        
                           "fingerprint": fingerprint(message, &start_location, &end_location), 
        
                           "location": { 
        
                               "path": path, 
        
                               "lines": lines 
        
                           } 
        
                       }); 
        
                       s.serialize_element(&value)?; 
        
                   } 
        
                   s.end() 
        
               } 
        
           }

Let me know if I can help you in anyway and feel free to ping me on Discord.

MichaReiser added the needs-decision Awaiting a decision from a maintainer label Sep 5, 2023

MichaReiser added breaking Breaking API change help wanted Contributions especially welcome and removed needs-decision Awaiting a decision from a maintainer labels Sep 5, 2023

gregersn mentioned this issue Sep 6, 2023

Do not use code location for Gitlab fingerprints. #7203

Merged

MichaReiser closed this as completed in #7203 Sep 8, 2023

fin-gal mentioned this issue Dec 2, 2024

Gitlab code quality fingerprint limitation: multiple errors with identical message #14732

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fingerprint in gitlab report should not use location in code as part of the hashing algorithm #7159

Fingerprint in gitlab report should not use location in code as part of the hashing algorithm #7159

gregersn commented Sep 5, 2023

MichaReiser commented Sep 5, 2023

gregersn commented Sep 5, 2023

MichaReiser commented Sep 6, 2023

gregersn commented Sep 6, 2023

MichaReiser commented Sep 6, 2023

Fingerprint in gitlab report should not use location in code as part of the hashing algorithm #7159

Fingerprint in gitlab report should not use location in code as part of the hashing algorithm #7159

Comments

gregersn commented Sep 5, 2023

MichaReiser commented Sep 5, 2023

gregersn commented Sep 5, 2023

MichaReiser commented Sep 6, 2023

gregersn commented Sep 6, 2023

MichaReiser commented Sep 6, 2023