Lint for undesirable, implicit copies #45683

nikomatsakis · 2017-11-01T13:30:57Z

As part of #44619, one topic that keeps coming up is that we have to find some way to mitigate the risk of large, implicit copies. Indeed, this risk exists today even without any changes to the language:

let x = [0; 1024 * 10];
let y = x; // maybe you meant to copy 10K words, but maybe you didn't.

In addition to performance hazards, implicit copies can have surprising semantics. For example, there are several iterator types that would implement Copy, but we were afraid that people would be surprised. Another, clearer example is a type like Cell<i32>, which could certainly be copy, but for this interaction:

let x = Cell::new(22);
let y = x;
x.set(23);
println!("{}", y.get()); // prints 22

For a time before 1.0, we briefly considered introducing a new Pod trait that acted like Copy (memcpy is safe) but without the implicit copy semantics. At the time, @eddyb argued persuasively against this, basically saying (and rightly so) that this is more of a linting concern than anything else -- the implicit copies in the example above, after all, don't lead to any sort of unsoundness, they just may not be the semantics you expected.

Since then, a number of use cases have arisen where having some kind of warning against implicit, unexpected copies would be useful:

Iterators implementing Copy
Copy/clone closures (closures that are copy can be surprising just as iterators can be)
Cell, ~~RefCell, and other types with interior mutability implementing Copy~~
Behavior of #[derive(PartiallEq)] and friends with packed structs
A coercion from &T to T (i.e., it'd be nice to be able to do foo(x) where foo: fn(u32) and x: &u32)

I will writeup a more specific proposal in the thread below.

The text was updated successfully, but these errors were encountered:

nikomatsakis · 2017-11-01T13:39:51Z

I think the lint would work roughly like this. We define a set of types that should not be implicitly copied. This would include any type whose size is >N bytes (where N is some arbitrary number, perhaps tunable via an attribute on the crate), but also any struct that is tagged with #[must_clone]. The name of #[must_clone] is probably bad; it is meant to be reminiscent of #[must_use] -- basically it's saying that, to copy this type, one ought to clone it (e.g., let y = x.clone() rather than let y = x), so that the semantics are clear to the reader.

We would then comb code for implicit copies. This is (somewhat) easy to define at the MIR level: any use of a user-defined variable (not a temporary) that is of a type implementing Copy. Here are some examples where this would trigger. Let's assume x is a variable of "must clone" type:

let y = x; 
foo(x);
let y = [x; 22]; // at least I think this would count =)

One interesting case that would, by this definition, be linted against is this:

let z = move || use(x); // copies `x` into the closure body

This one is interesting because, unlike the others, there is no way to readily silence the lint, except by adding an #[allow(implicit_copy)]. That is usually a bad sign -- we may not want to lint on this case. One can argue, certainly, that the move is sufficient to warn the reader. Not sure if I buy that. =)

It is also interesting that the use cases here perhaps breakdown into two categories:

Big things that may not want to be copied at all.
Things where it's ok to copy as a last-use (e.g., Cell<i32>, iterators, closures).

In fact, I would argue that the latter is the majority case, so maybe we just want to separate this into two lints, or else say that structs with the #[must_clone] tag only warn if -- after the implicit copy -- there is a subsequent use. This would actually solve the move || use(x) problem as well, since one could rewrite the closure expression to:

let z = {
    let x0 = x.clone();
    move || use(x0) // this is a last-use of `x0`, hence no warning
};
use(x); // uses original `x`

eddyb · 2017-11-01T21:28:32Z

I would prefer it if this was more specific, e.g. for loops warning if the iteratee is used afterwards.

arielb1 · 2017-11-02T12:53:07Z

@nikomatsakis

You don't want to warn on any use - that would warn on moves too. You want to warn on uses where the variable is live after the use.

nikomatsakis · 2017-11-08T18:20:27Z

@arielb1 indeed.

nikomatsakis · 2017-11-08T18:21:13Z

@eddyb do you have a specific example of something you would want to warn, and something you would not want?

eddyb · 2017-11-08T20:03:01Z

@nikomatsakis Copying a Range around shouldn't warn (just like any other pair). Using a Copy variable implementing IntoIterator/Iterator (e.g. a Range) after iterating on it with a for loop should warn, as previously discussed.

nikomatsakis · 2018-01-23T21:55:09Z

@eddyb it seems like we could generalize then to a lint that works like this:

Types can be marked as #[must_clone]:
- Examples: Cell
- Warning arises if some Place X (of such a type) is copied and then used again later.
Methods can be marked as #[must_clone]:
- Example: IntoIterator::into_iter
- Warning arises if the method is called on some Place X and then X is used again later.

Maybe a better name than #[must_clone], especially for the second case.

nikomatsakis · 2018-01-23T22:50:59Z

Anyway, regardless of the nitty gritty details, let me leave a few pointers here for how we might implement this. @cramertj is hopefully gonna take a look.

First off, let's look a bit at MIR. MIR definition is here.. A crucial concept I referenced above is a Place, which represents a path:

rust/src/librustc/mir/mod.rs

Lines 1231 to 1243 in 4e3901d

    
           /// A path to a value; something that can be evaluated without 
        
           /// changing or disturbing program state. 
        
           #[derive(Clone, PartialEq, Eq, Hash, RustcEncodable, RustcDecodable)] 
        
           pub enum Place<'tcx> { 
        
               /// local variable 
        
               Local(Local), 
        
               /// static or static mut variable 
        
               Static(Box<Static<'tcx>>), 
        
               /// projection out of a place (access a field, deref a pointer, etc) 
        
               Projection(Box<PlaceProjection<'tcx>>), 
        
           }

The arguments to MIR statements are always operands, which explicitly identify a copy:

rust/src/librustc/mir/mod.rs

Lines 1399 to 1403 in 4e3901d

    
           /// Copy: The value must be available for use afterwards. 
        
           /// 
        
           /// This implies that the type of the place must be `Copy`; this is true 
        
           /// by construction during build, but also checked by the MIR type checker. 
        
           Copy(Place<'tcx>),

I think right now we will produce a Copy operand whenever the input is of copy type, and a Move operand whenever the input may not be of Copy type. It is an error to use a value after it is moved. So our analysis would look for Copy operands.

One way to do that is with the MIR visitor, which lets you quickly walk the IR. In particular we'd probably want to override the visit_operand method:

rust/src/librustc/mir/visit.rs

Lines 142 to 146 in 4e3901d

    
           fn visit_operand(&mut self, 
        
                            operand: & $($mutability)* Operand<'tcx>, 
        
                            location: Location) { 
        
               self.super_operand(operand, location); 
        
           }

There is code for computing liveness -- basically, whether a variable may be used again. That code lives in liveness.rs. You can use liveness_of_locals to compute what is live and where:

rust/src/librustc_mir/util/liveness.rs

Lines 117 to 120 in 4e3901d

    
           /// Compute which local variables are live within the given function 
        
           /// `mir`. The liveness mode `mode` determines what sorts of uses are 
        
           /// considered to make a variable live (e.g., do drops count?). 
        
           pub fn liveness_of_locals<'tcx>(mir: &Mir<'tcx>, mode: LivenessMode) -> LivenessResult {

but that only gives you values at basic block boundaries. You will want simulate_block to walk through and find out what is live at any particular point:

rust/src/librustc_mir/util/liveness.rs

Lines 161 to 167 in 4e3901d

    
           /// Walks backwards through the statements/terminator in the given 
        
           /// basic block `block`.  At each point within `block`, invokes 
        
           /// the callback `op` with the current location and the set of 
        
           /// variables that are live on entry to that location. 
        
           pub fn simulate_block<'tcx, OP>(&self, mir: &Mir<'tcx>, block: BasicBlock, mut callback: OP) 
        
           where 
        
               OP: FnMut(Location, &LocalSet),

So I guess that roughly the idae would be to identify copies where the value's type is marked to be linted. We would then compute the liveness at the point of copy and see if the Place being moved may be referenced again.

Some annoying challenges:

liveness at present is computed only for local variables;
- so if somebody moves from a.b and then uses a.c, we can't really tell that
- the move related code from dataflow is better here I guess, but I don't think we currently have an analysis for all uses, only for moves
- still, woulnd't be hard to adapt, right? Maybe there's a reason it would be? I'm not sure why we didn't use dataflow in the first place here.

Still, this is a lint after all, it doesn't have to be totally precise. I think that for the first version of the code I would only worry about local variables and ignore more complex Place values. That seems like a simple thing to fix up later.

nikomatsakis · 2018-02-16T00:35:10Z

@qmx may take a look at this, since @cramertj never had time

cramertj · 2018-02-16T00:52:57Z

@qmx I started https://github.com/cramertj/rust/tree/copy-lint if you want to take a look. Not sure how useful it is-- I only got the skeleton.

RalfJung · 2018-03-28T16:28:02Z

RefCell cannot be Copy. The problem is that if you have &RefCell<T>, you could end up copying it while another thread has a mutable reference to the content of the RefCell. That'd be unsound.

Same for Mutex, RwLock.

EDIT: As @rkruppe pointed out, "thread" doesn't make sense here. That doesn't change the problem though, see below.

hanna-kruppe · 2018-03-28T16:31:36Z

RefCell is not Sync so how could you have a &RefCell<T> in one thread while another thread can access the referent?

RalfJung · 2018-03-28T19:44:21Z

This can all happen in one thread.

RalfJung · 2018-03-28T19:45:51Z

let r = RefCell::new(42);
let r1 = &r;
let r2 = &r;
let _mutref = r1.borrow_mut();
let illegal_copy = *r2; // copying while there is an outstanding mutable reference

hanna-kruppe · 2018-03-28T19:46:42Z

Oh yeah you're right. The mention of another thread really mislead me there.

RalfJung · 2018-03-28T19:47:13Z

Sorry for that, not sure what I thought. "Another function" is really what I meant (though it can be the same, as in my counterexample).

nikomatsakis · 2018-04-03T20:59:49Z

@RalfJung in retrospect, the RefCell problem is obvious -- basically, of course copying is bad, for the same reason that borrow() isn't infallible. Silly me.

qmx · 2018-04-20T20:20:55Z

Just found out about rust-lang/rfcs#2407 - that would be a nice suggestion for the linter in the future, wouldn't it?

est31 · 2021-02-27T18:22:33Z

Just as a data point, this issue affects real world programs. Recently rav1e has eliminated an implicit copy of a large type and gained speed thanks to it.

Changing &table to table in this statement (and the similar statement in ac_q) eliminated an implicit copy onto the stack, reducing the encoding time by 2.5%.

PR link

pnkfelix · 2021-10-26T15:35:05Z

Just wanted to note that we did add a lint that highlights all moves larger than a configured limit #83519

nikomatsakis added E-needs-mentor T-lang Relevant to the language team, which will review and decide on the PR/issue. labels Nov 1, 2017

aturon mentioned this issue Nov 1, 2017

Tracking issue for experiments around coercions, generics, and Copy type ergonomics #44619

Closed

nikomatsakis mentioned this issue Nov 1, 2017

What should we do about #[derive(...)] on #[repr(packed)] structs? #39696

Closed

nikomatsakis mentioned this issue Jan 10, 2018

Add missing Copy/Clone impls for Iterators #47304

Closed

cramertj self-assigned this Jan 23, 2018

jkordish added A-lint Area: Lints (warnings about flaws in source code) such as unused_mut. C-future-incompatibility Category: Future-incompatibility lints labels Jan 25, 2018

nikomatsakis assigned qmx and unassigned cramertj Feb 16, 2018

cramertj mentioned this issue Feb 22, 2018

Tracking issue for RFC 2132: copy_closures/clone_closures #44490

Closed

3 tasks

nikomatsakis mentioned this issue Mar 27, 2018

Autoreferencing copy types #44763

Closed

Centril added C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC and removed C-future-incompatibility Category: Future-incompatibility lints labels Apr 1, 2019

kennytm mentioned this issue May 16, 2019

Warn on large structure move or large return structure tikv/tikv#4702

Open

jackh726 removed the WG-compiler-middle label Jan 29, 2022

joshtriplett added the S-tracking-design-concerns Status: There are blocking design concerns. label Mar 16, 2022

gadmm mentioned this issue Jan 1, 2023

Unboxed types (version 2) ocaml/RFCs#34

Open

jyn514 removed the E-needs-mentor label Jun 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lint for undesirable, implicit copies #45683

Lint for undesirable, implicit copies #45683

nikomatsakis commented Nov 1, 2017 •

edited by RalfJung

Loading

nikomatsakis commented Nov 1, 2017

eddyb commented Nov 1, 2017

arielb1 commented Nov 2, 2017

nikomatsakis commented Nov 8, 2017

nikomatsakis commented Nov 8, 2017

eddyb commented Nov 8, 2017

nikomatsakis commented Jan 23, 2018 •

edited

Loading

nikomatsakis commented Jan 23, 2018

nikomatsakis commented Feb 16, 2018

cramertj commented Feb 16, 2018

RalfJung commented Mar 28, 2018 •

edited

Loading

hanna-kruppe commented Mar 28, 2018

RalfJung commented Mar 28, 2018

RalfJung commented Mar 28, 2018

hanna-kruppe commented Mar 28, 2018

RalfJung commented Mar 28, 2018

nikomatsakis commented Apr 3, 2018

qmx commented Apr 20, 2018

est31 commented Feb 27, 2021

pnkfelix commented Oct 26, 2021

Lint for undesirable, implicit copies #45683

Lint for undesirable, implicit copies #45683

Comments

nikomatsakis commented Nov 1, 2017 • edited by RalfJung Loading

nikomatsakis commented Nov 1, 2017

eddyb commented Nov 1, 2017

arielb1 commented Nov 2, 2017

nikomatsakis commented Nov 8, 2017

nikomatsakis commented Nov 8, 2017

eddyb commented Nov 8, 2017

nikomatsakis commented Jan 23, 2018 • edited Loading

nikomatsakis commented Jan 23, 2018

nikomatsakis commented Feb 16, 2018

cramertj commented Feb 16, 2018

RalfJung commented Mar 28, 2018 • edited Loading

hanna-kruppe commented Mar 28, 2018

RalfJung commented Mar 28, 2018

RalfJung commented Mar 28, 2018

hanna-kruppe commented Mar 28, 2018

RalfJung commented Mar 28, 2018

nikomatsakis commented Apr 3, 2018

qmx commented Apr 20, 2018

est31 commented Feb 27, 2021

pnkfelix commented Oct 26, 2021

nikomatsakis commented Nov 1, 2017 •

edited by RalfJung

Loading

nikomatsakis commented Jan 23, 2018 •

edited

Loading

RalfJung commented Mar 28, 2018 •

edited

Loading