Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new lint hashset_insert_after_contains #12873

Merged
merged 4 commits into from
Jul 4, 2024
Merged

Conversation

lochetti
Copy link
Contributor

This PR closes #11103.

This is my first PR creating a new lint (and the second attempt of creating this PR, the first one I was not able to continue because of personal reasons). Thanks for the patience :)

The idea of the lint is to find insert in hashmanps inside if staments that are checking if the hashmap contains the same value that is being inserted. This is not necessary since you could simply call the insert and check for the bool returned if you still need the if statement.

changelog: new lint: [hashset_insert_after_contains]

@rustbot
Copy link
Collaborator

rustbot commented May 31, 2024

r? @llogiq

rustbot has assigned @llogiq.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties label May 31, 2024
Copy link
Contributor

@llogiq llogiq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking on this lint. This is a good start. I left a few notes. We'd want a lint check before merging it though.

I'm currently not at my desk, will look into running the check some time next week when I get around to it.

value: &'tcx Expr<'tcx>,
span: Span,
}
fn try_parse_contains<'tcx>(cx: &LateContext<'_>, expr: &'tcx Expr<'_>) -> Option<ContainsExpr<'tcx>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try_parse_insert and try_parse_contains are equal on all but two things: The UnOp (Not vs. Deref) and the call sym (insert vs. contains). Please factor out a try_parse_op_call method to use in both cases (oh, and the contains result also containing the span, but we can live with ignoring that for the insert case later on).

Copy link
Contributor Author

@lochetti lochetti Jun 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@llogiq thanks for your comment. I was trying to take a look at it and I just have one doubt that I wanted to validate with you.

There is another difference betwheen try_parse_insert and try_parse_contains: on try_parse_insert I am peeling the expr while it is a Not, at the beginning, and at the try_parse_contains I am peeling the value of the expr -if it is a method call- while it is a Deref. So the UnOp is applied on different things (expr vs value of method call expr). To make a generic function I think that I would need to always do both peelings. I don't think it would be a problem, in terms of correctness, but probably it is not necessary, so I just wanted to double check with you.

Does it make sense?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, did it! 2b0cad6

span_lint_and_then(
cx,
HASHSET_INSERT_AFTER_CONTAINS,
expr.span,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be a good idea to use a MultiSpan here containing the !contains and the insert call each (simply call span_lint with a vec![contains_span, insert_span]). That will reduce the visual clutter especially if there is more code within the if expression.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I think I did it as you mentioned, here 2b0cad6

{
span_lint_and_then(
cx,
HASHSET_INSERT_AFTER_CONTAINS,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about SET_CONTAINS_OR_INSERT? Yes, for now this only works on HashSets, but we can easily extend it to also work on BTreeSets or the respective Map types.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, sounds more generic! Renamed it here 2b0cad6

let borrow_set = &mut set;
if !borrow_set.contains(&value) {
borrow_set.insert(value);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about

if set.contains(&value) {
    println!("value is already in set");
} else {
    set.insert(value);
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Right now the code is only searching for the insert in the then part of the IF. I think it would make sense to search as well in the else part. I am just not sure if we would not warn if we find it in the else and the then has "a lot of things" (meaning that probably it was a stylist choice of the developer) or if we keep it simple and just warn if we find the insert in the else as well... I think that keeping it simple would be the best choice, but, for sure, I am open to suggestions.

///
/// ### Why is this bad?
/// Using just `insert` and checking the returned `bool` is more efficient.
///
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know of one possible false positive: If the value is only borrowed & expensive to clone or impossible to clone twice, we may opt to check with contains before inserting to avoid the clone. There should be a "known problems" section mentioning this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, did it here 2b0cad6

#[clippy::version = "1.80.0"]
pub HASHSET_INSERT_AFTER_CONTAINS,
nursery,
"unnecessary call to `HashSet::contains` followed by `HashSet::insert`"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"unnecessary call to `HashSet::contains` followed by `HashSet::insert`"
"call to `HashSet::contains` followed by `HashSet::insert`"

As stated, we cannot know whether the call is necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed on 2b0cad6

@bors
Copy link
Contributor

bors commented Jun 11, 2024

☔ The latest upstream changes (presumably #12849) made this pull request unmergeable. Please resolve the merge conflicts.

@llogiq
Copy link
Contributor

llogiq commented Jun 15, 2024

Sorry this is taking so long, I am unfortunately quite busy at the moment. I'll ping you once I've run the check so you don't need to rebase more than once.

///
/// ### Known problems
/// In case the value that wants to be inserted is borrowed and also expensive or impossible
/// to clone. In such scenario, the developer might want to check with `contain` before inserting,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// to clone. In such scenario, the developer might want to check with `contain` before inserting,
/// to clone. In such a scenario, the developer might want to check with `contains` before inserting,

@llogiq
Copy link
Contributor

llogiq commented Jul 3, 2024

I finally came around to do a lintcheck run, and it looked OK. So r=me after a rebase and the docs fix @bitfield suggested.

@lochetti
Copy link
Contributor Author

lochetti commented Jul 3, 2024

@bors r=llogiq

@bors
Copy link
Contributor

bors commented Jul 3, 2024

@lochetti: 🔑 Insufficient privileges: Not in reviewers

@lochetti
Copy link
Contributor Author

lochetti commented Jul 3, 2024

I finally came around to do a lintcheck run, and it looked OK. So r=me after a rebase and the docs fix @bitfield suggested.

Oops. It looks like I can't r=you... :)

@llogiq
Copy link
Contributor

llogiq commented Jul 4, 2024

No problem.

@bors r+

@bors
Copy link
Contributor

bors commented Jul 4, 2024

📌 Commit 4e71fc4 has been approved by llogiq

It is now in the queue for this repository.

@bors
Copy link
Contributor

bors commented Jul 4, 2024

⌛ Testing commit 4e71fc4 with merge d2400a4...

@bors
Copy link
Contributor

bors commented Jul 4, 2024

☀️ Test successful - checks-action_dev_test, checks-action_remark_test, checks-action_test
Approved by: llogiq
Pushing d2400a4 to master...

@nyurik
Copy link
Contributor

nyurik commented Jul 4, 2024

Awesome work! One question -- would it make sense to generalize the name of this lint to insert_after_contains ? This way it can also cover HashMap and possibly other dictionary-like patterns?

@llogiq
Copy link
Contributor

llogiq commented Jul 4, 2024

@nyurik: Currently the lint only catches HashMaps, but I'd welcome a PR to change that along with the name.

@nyurik
Copy link
Contributor

nyurik commented Jul 5, 2024

I could do a quick PR to rename the lint, but implementing it for other types might take a bit longer, and might not even make the cutoff for the next release (at which point the lint name would become permanent)

@nyurik
Copy link
Contributor

nyurik commented Jul 5, 2024

Oh, my apologies, this was already renamed in the code to set_contains_or_insert. Some options of a generalized name:

  • contains_or_insert (short and to the point?)
  • collection_contains_or_insert
  • ???

@nyurik
Copy link
Contributor

nyurik commented Jul 5, 2024

Renamed in #13053

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimize HashSet contains+insert usage
6 participants