Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lint for undesirable, implicit copies #45683

Open
nikomatsakis opened this issue Nov 1, 2017 · 20 comments
Open

Lint for undesirable, implicit copies #45683

nikomatsakis opened this issue Nov 1, 2017 · 20 comments
Assignees
Labels
A-lint Area: Lints (warnings about flaws in source code) such as unused_mut. C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC S-tracking-design-concerns Status: There are blocking design concerns. T-lang Relevant to the language team, which will review and decide on the PR/issue.

Comments

@nikomatsakis
Copy link
Contributor

nikomatsakis commented Nov 1, 2017

As part of #44619, one topic that keeps coming up is that we have to find some way to mitigate the risk of large, implicit copies. Indeed, this risk exists today even without any changes to the language:

let x = [0; 1024 * 10];
let y = x; // maybe you meant to copy 10K words, but maybe you didn't.

In addition to performance hazards, implicit copies can have surprising semantics. For example, there are several iterator types that would implement Copy, but we were afraid that people would be surprised. Another, clearer example is a type like Cell<i32>, which could certainly be copy, but for this interaction:

let x = Cell::new(22);
let y = x;
x.set(23);
println!("{}", y.get()); // prints 22

For a time before 1.0, we briefly considered introducing a new Pod trait that acted like Copy (memcpy is safe) but without the implicit copy semantics. At the time, @eddyb argued persuasively against this, basically saying (and rightly so) that this is more of a linting concern than anything else -- the implicit copies in the example above, after all, don't lead to any sort of unsoundness, they just may not be the semantics you expected.

Since then, a number of use cases have arisen where having some kind of warning against implicit, unexpected copies would be useful:

  • Iterators implementing Copy
  • Copy/clone closures (closures that are copy can be surprising just as iterators can be)
  • Cell, RefCell, and other types with interior mutability implementing Copy
  • Behavior of #[derive(PartiallEq)] and friends with packed structs
  • A coercion from &T to T (i.e., it'd be nice to be able to do foo(x) where foo: fn(u32) and x: &u32)

I will writeup a more specific proposal in the thread below.

@nikomatsakis
Copy link
Contributor Author

I think the lint would work roughly like this. We define a set of types that should not be implicitly copied. This would include any type whose size is >N bytes (where N is some arbitrary number, perhaps tunable via an attribute on the crate), but also any struct that is tagged with #[must_clone]. The name of #[must_clone] is probably bad; it is meant to be reminiscent of #[must_use] -- basically it's saying that, to copy this type, one ought to clone it (e.g., let y = x.clone() rather than let y = x), so that the semantics are clear to the reader.

We would then comb code for implicit copies. This is (somewhat) easy to define at the MIR level: any use of a user-defined variable (not a temporary) that is of a type implementing Copy. Here are some examples where this would trigger. Let's assume x is a variable of "must clone" type:

let y = x; 
foo(x);
let y = [x; 22]; // at least I think this would count =)

One interesting case that would, by this definition, be linted against is this:

let z = move || use(x); // copies `x` into the closure body

This one is interesting because, unlike the others, there is no way to readily silence the lint, except by adding an #[allow(implicit_copy)]. That is usually a bad sign -- we may not want to lint on this case. One can argue, certainly, that the move is sufficient to warn the reader. Not sure if I buy that. =)

It is also interesting that the use cases here perhaps breakdown into two categories:

  • Big things that may not want to be copied at all.
  • Things where it's ok to copy as a last-use (e.g., Cell<i32>, iterators, closures).

In fact, I would argue that the latter is the majority case, so maybe we just want to separate this into two lints, or else say that structs with the #[must_clone] tag only warn if -- after the implicit copy -- there is a subsequent use. This would actually solve the move || use(x) problem as well, since one could rewrite the closure expression to:

let z = {
    let x0 = x.clone();
    move || use(x0) // this is a last-use of `x0`, hence no warning
};
use(x); // uses original `x`

@eddyb
Copy link
Member

eddyb commented Nov 1, 2017

I would prefer it if this was more specific, e.g. for loops warning if the iteratee is used afterwards.

@arielb1
Copy link
Contributor

arielb1 commented Nov 2, 2017

@nikomatsakis

You don't want to warn on any use - that would warn on moves too. You want to warn on uses where the variable is live after the use.

@nikomatsakis
Copy link
Contributor Author

@arielb1 indeed.

@nikomatsakis
Copy link
Contributor Author

@eddyb do you have a specific example of something you would want to warn, and something you would not want?

@eddyb
Copy link
Member

eddyb commented Nov 8, 2017

@nikomatsakis Copying a Range around shouldn't warn (just like any other pair). Using a Copy variable implementing IntoIterator/Iterator (e.g. a Range) after iterating on it with a for loop should warn, as previously discussed.

@nikomatsakis
Copy link
Contributor Author

nikomatsakis commented Jan 23, 2018

@eddyb it seems like we could generalize then to a lint that works like this:

  • Types can be marked as #[must_clone]:
    • Examples: Cell
    • Warning arises if some Place X (of such a type) is copied and then used again later.
  • Methods can be marked as #[must_clone]:
    • Example: IntoIterator::into_iter
    • Warning arises if the method is called on some Place X and then X is used again later.

Maybe a better name than #[must_clone], especially for the second case.

@nikomatsakis
Copy link
Contributor Author

Anyway, regardless of the nitty gritty details, let me leave a few pointers here for how we might implement this. @cramertj is hopefully gonna take a look.

First off, let's look a bit at MIR. MIR definition is here.. A crucial concept I referenced above is a Place, which represents a path:

rust/src/librustc/mir/mod.rs

Lines 1231 to 1243 in 4e3901d

/// A path to a value; something that can be evaluated without
/// changing or disturbing program state.
#[derive(Clone, PartialEq, Eq, Hash, RustcEncodable, RustcDecodable)]
pub enum Place<'tcx> {
/// local variable
Local(Local),
/// static or static mut variable
Static(Box<Static<'tcx>>),
/// projection out of a place (access a field, deref a pointer, etc)
Projection(Box<PlaceProjection<'tcx>>),
}

The arguments to MIR statements are always operands, which explicitly identify a copy:

rust/src/librustc/mir/mod.rs

Lines 1399 to 1403 in 4e3901d

/// Copy: The value must be available for use afterwards.
///
/// This implies that the type of the place must be `Copy`; this is true
/// by construction during build, but also checked by the MIR type checker.
Copy(Place<'tcx>),

I think right now we will produce a Copy operand whenever the input is of copy type, and a Move operand whenever the input may not be of Copy type. It is an error to use a value after it is moved. So our analysis would look for Copy operands.

One way to do that is with the MIR visitor, which lets you quickly walk the IR. In particular we'd probably want to override the visit_operand method:

fn visit_operand(&mut self,
operand: & $($mutability)* Operand<'tcx>,
location: Location) {
self.super_operand(operand, location);
}

There is code for computing liveness -- basically, whether a variable may be used again. That code lives in liveness.rs. You can use liveness_of_locals to compute what is live and where:

/// Compute which local variables are live within the given function
/// `mir`. The liveness mode `mode` determines what sorts of uses are
/// considered to make a variable live (e.g., do drops count?).
pub fn liveness_of_locals<'tcx>(mir: &Mir<'tcx>, mode: LivenessMode) -> LivenessResult {

but that only gives you values at basic block boundaries. You will want simulate_block to walk through and find out what is live at any particular point:

/// Walks backwards through the statements/terminator in the given
/// basic block `block`. At each point within `block`, invokes
/// the callback `op` with the current location and the set of
/// variables that are live on entry to that location.
pub fn simulate_block<'tcx, OP>(&self, mir: &Mir<'tcx>, block: BasicBlock, mut callback: OP)
where
OP: FnMut(Location, &LocalSet),

So I guess that roughly the idae would be to identify copies where the value's type is marked to be linted. We would then compute the liveness at the point of copy and see if the Place being moved may be referenced again.

Some annoying challenges:

  • liveness at present is computed only for local variables;
    • so if somebody moves from a.b and then uses a.c, we can't really tell that
    • the move related code from dataflow is better here I guess, but I don't think we currently have an analysis for all uses, only for moves
    • still, woulnd't be hard to adapt, right? Maybe there's a reason it would be? I'm not sure why we didn't use dataflow in the first place here.

Still, this is a lint after all, it doesn't have to be totally precise. I think that for the first version of the code I would only worry about local variables and ignore more complex Place values. That seems like a simple thing to fix up later.

@cramertj cramertj self-assigned this Jan 23, 2018
@jkordish jkordish added A-lint Area: Lints (warnings about flaws in source code) such as unused_mut. C-future-incompatibility Category: Future-incompatibility lints labels Jan 25, 2018
@nikomatsakis
Copy link
Contributor Author

@qmx may take a look at this, since @cramertj never had time

@nikomatsakis nikomatsakis assigned qmx and unassigned cramertj Feb 16, 2018
@cramertj
Copy link
Member

@qmx I started https://github.com/cramertj/rust/tree/copy-lint if you want to take a look. Not sure how useful it is-- I only got the skeleton.

@RalfJung
Copy link
Member

RalfJung commented Mar 28, 2018

RefCell cannot be Copy. The problem is that if you have &RefCell<T>, you could end up copying it while another thread has a mutable reference to the content of the RefCell. That'd be unsound.

Same for Mutex, RwLock.

EDIT: As @rkruppe pointed out, "thread" doesn't make sense here. That doesn't change the problem though, see below.

@hanna-kruppe
Copy link
Contributor

RefCell is not Sync so how could you have a &RefCell<T> in one thread while another thread can access the referent?

@RalfJung
Copy link
Member

This can all happen in one thread.

@RalfJung
Copy link
Member

let r = RefCell::new(42);
let r1 = &r;
let r2 = &r;
let _mutref = r1.borrow_mut();
let illegal_copy = *r2; // copying while there is an outstanding mutable reference

@hanna-kruppe
Copy link
Contributor

Oh yeah you're right. The mention of another thread really mislead me there.

@RalfJung
Copy link
Member

Sorry for that, not sure what I thought. "Another function" is really what I meant (though it can be the same, as in my counterexample).

@nikomatsakis
Copy link
Contributor Author

@RalfJung in retrospect, the RefCell problem is obvious -- basically, of course copying is bad, for the same reason that borrow() isn't infallible. Silly me.

@qmx
Copy link
Member

qmx commented Apr 20, 2018

Just found out about rust-lang/rfcs#2407 - that would be a nice suggestion for the linter in the future, wouldn't it?

@Centril Centril added C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC and removed C-future-incompatibility Category: Future-incompatibility lints labels Apr 1, 2019
@est31
Copy link
Member

est31 commented Feb 27, 2021

Just as a data point, this issue affects real world programs. Recently rav1e has eliminated an implicit copy of a large type and gained speed thanks to it.

Changing &table to table in this statement (and the similar statement in ac_q) eliminated an implicit copy onto the stack, reducing the encoding time by 2.5%.

PR link

@pnkfelix
Copy link
Member

Just wanted to note that we did add a lint that highlights all moves larger than a configured limit #83519

@joshtriplett joshtriplett added the S-tracking-design-concerns Status: There are blocking design concerns. label Mar 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-lint Area: Lints (warnings about flaws in source code) such as unused_mut. C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC S-tracking-design-concerns Status: There are blocking design concerns. T-lang Relevant to the language team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests