Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fingerprint in gitlab report should not use location in code as part of the hashing algorithm #7159

Closed
gregersn opened this issue Sep 5, 2023 · 5 comments · Fixed by #7203
Labels
breaking Breaking API change help wanted Contributions especially welcome

Comments

@gregersn
Copy link
Contributor

gregersn commented Sep 5, 2023

ruff 0.0.287

Using the location in the code file as part of the hashing for the fingerprint causes a comparison between reports to show one issue as fixed, and a new one as having come to, if other parts of a file is changed, causing the line number to change.

As a comparison, this is how eslint-gitlab-reports calculate hashing: https://gitlab.com/remcohaszing/eslint-formatter-gitlab/-/blob/main/index.js?ref_type=heads#L72

In its (ruff) current for, the output in gitlab will show a lot of noise when reporting on code quality if, say, a line is inserted above all the linting issues in a code file, and moving them one line down. They will all get new fingerprints, and the fingerprints that were in the old report will be gone.

@MichaReiser MichaReiser added the needs-decision Awaiting a decision from a maintainer label Sep 5, 2023
@MichaReiser
Copy link
Member

Thanks for reporting this and linking to an alternative implementation. ESlint's approach is interesting because it provides a more stable result. The only shortcoming that I can think of right now (if I understand it correctly) is that it won't report a lint if an error for the same rule in the same file was fixed but a new (otherwise than the location identical) violation was introduced. But this seems less of an error compared to Ruff over-reporting on newly introduced and fixed diagnostics for every PR.

The one thing that's unclear to me as someone who hasn't used GitLab extensively is whether this is a breaking change. I would assume that GitLab will mark ALL diagnostics as changed when upgrading from the current hash function to the new hash function. Do you know if that's correct and how disruptive that would be?

@gregersn
Copy link
Contributor Author

gregersn commented Sep 5, 2023

Depends on your definition of a breaking change.
And how intrusive it would be would depend on how people treat their tooling environments.
The upgrade would, with all likely hood trigger all of them changed, for that one pipeline run of main/master after the upgrade, and then after that all would be back to normal. There is a place to view the reports for each pipeline run, but it is loaded for that run, and doesn't really show any of the previous.

So as for how disruptive? Not more than what I would say was worth it to not have almost every run report on new and fixed errors, that are just results of lines being moved around.

@MichaReiser MichaReiser added breaking Breaking API change help wanted Contributions especially welcome and removed needs-decision Awaiting a decision from a maintainer labels Sep 5, 2023
@MichaReiser
Copy link
Member

@gregersn would you be interested in contributing the change? I can point you to the relevant code and provide support.

@zanieb a potential use case for --preview

@gregersn
Copy link
Contributor Author

gregersn commented Sep 6, 2023

I have never used Rust, but I am taking a look at it now.

@MichaReiser
Copy link
Member

I have never used Rust, but I am taking a look at it now.

Happy to help you get started.

The relevant code is

impl Serialize for SerializedMessages<'_> {
fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
where
S: Serializer,
{
let mut s = serializer.serialize_seq(Some(self.messages.len()))?;
for message in self.messages {
let start_location = message.compute_start_location();
let end_location = message.compute_end_location();
let lines = if self.context.is_notebook(message.filename()) {
// We can't give a reasonable location for the structured formats,
// so we show one that's clearly a fallback
json!({
"begin": 1,
"end": 1
})
} else {
json!({
"begin": start_location.row,
"end": end_location.row
})
};
let path = self.project_dir.as_ref().map_or_else(
|| relativize_path(message.filename()),
|project_dir| relativize_path_to(message.filename(), project_dir),
);
let value = json!({
"description": format!("({}) {}", message.kind.rule().noqa_code(), message.kind.body),
"severity": "major",
"fingerprint": fingerprint(message, &start_location, &end_location),
"location": {
"path": path,
"lines": lines
}
});
s.serialize_element(&value)?;
}
s.end()
}
}

Let me know if I can help you in anyway and feel free to ping me on Discord.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking Breaking API change help wanted Contributions especially welcome
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants