Handle conflicting autofix instructions #660

charliermarsh · 2022-11-08T14:43:30Z

If we have a line like:

x: Union[int, Option[str]]

Then if the relevant rules and Python version permit it, we'd like to autofix that line to:

x: int | str | None

However, as-is, that requires two overlapping fixes, which we don't support. So Ruff will fix the outer rule (to x: int | Option[str]), then fix the inner rule if you re-run it (x: int | str | None).

We need to implement an autofix strategy that's amenable to multiple "concurrent" fixes.

This will also apply to import sorting: we want to be able to remove unused imports and re-sort imports in a single pass.

The text was updated successfully, but these errors were encountered:

charliermarsh · 2022-11-08T14:47:36Z

I don't know how to implement this yet. I need to do some research. But a few ideas come to mind:

Structure it as a CRDT-like concurrent editing problem. Use CRDTs or similar to enable us to reconcile these kinds of concurrent edits.
Re-write rules to batch these changes. For example, above, maybe we just surface a single error with the autofix for the entire expression, rather than emitting an error for Union and an error for Option. This is pretty limiting, as it wouldn't work for (e.g.) the unused import + re-order import case.
Make the fix API more AST-aware? Enable some kind of composition semantics, such that if we know we're replacing a list of Statements with a list of Statements, then we should allow any edits of the form Statement -> Statement.

ljodal · 2022-11-08T21:53:01Z

Have you worked with LibCST? I like how that handles modification to the ast/cst. Basically you implement a visitor that takes two nodes as input, node and updated_node, and returns a node. The node parameter is the parsed node from source and the updated_node parameter is the output from the previous visitor. So depending on the rule you can decide to consider previous modifications or not completely ignore them.

I also think that would work fine for the example above here. You could have two visitors; one that does Optional[X] -> X | None and a one that does Union[A, B] to A | B. It wouldn’t really matter what order these are run in, the output should always be the same. Either Union[A, Optional[B]] -> A | Optional[B] -> A | B | None or Union[A, Optional[B] -> Union[A, B | None] -> A | B | None

This would probably not fix the remove unused import issue, though that could maybe be considered a separate issue? If I’m not mistaken libcst handles that through a special hook where you can mark imports as unused/potentially unused after a modification has been made. I assume those imports are then removed in a second pass, which could run before the sorting runs

charliermarsh · 2022-11-08T22:09:38Z

Yeah! We use LibCST internally to power some (not all) of our autofix operations. However, I think that LibCST's visitor API is entirely defined on the Python side, so I haven't used it before (we just use the CST parser and the ability to generate code from a modified CST struct)...

I do like what you're describing here. The main challenge, as-is, is that we don't actually create a CST for every module -- we create them eagerly, since CST generation is quite expensive. When we want to autofix something via LibCST, we extract the source code (using the delimiters defined by the AST), pass it to LibCST, generate the modified source, and then treat that source-to-source replacement as the "fix".

In other words, as-implemented, the individual fix operations have access to a CST, but there's no concept of passing the modified code or CST from check to check. (That's not to say we can't use this approach; rather, that it would take work.)

I do think we probably need to do something AST-aware (or CST-aware) in order to solve this problem.

charliermarsh · 2022-11-08T22:12:41Z

In the Union[A, Optional[B]], another way to look at it would be as a series of sequential edits.

So, to get rid of the Union:

"Remove Union[
"Retain A"
"Remove ,"
"Insert |"
"Retain Option[B]"
"Remove ]"

Then, to get rid of the Option, you do a similar thing.

If fixes were defined as those kinds of atomic operations, I believe you could reconcile them using CRDT-like operations. But it may not apply as cleanly to over kinds of collisions.

ljodal · 2022-11-08T22:24:38Z

Ah, then I'd misunderstood how ruff works. I naively assumed it built up a cst of the source code and then traversed that. As a developer that's at least an interface that is great to work with and one benefit is that you can share a lot of code between linting and fixing.

The visitor pattern in libcst is indeed implemented in Python. It's documented here and basically mirrors the visitors from the ast library with extensions for modifying the tree.

charliermarsh · 2022-11-09T00:08:41Z

Yeah, I did a bunch of exploration around using the CST everywhere, but it was ~10x slower which brought execution time into the multiple seconds range.

charliermarsh · 2022-11-09T00:09:37Z

On CRDTs -- what I'm trying to describe is something like this:

use operational_transform::OperationSeq;

pub fn main() {
    let s = "Union[int, Option[str]]";
    let mut a = OperationSeq::default();
    a.delete("Union[".len() as u64);
    a.retain("int".len() as u64);
    a.delete(",".len() as u64);
    a.insert(" |");
    a.retain(" Option[str]".len() as u64);
    a.delete("]".len() as u64);

    let mut b = OperationSeq::default();
    b.retain("Union[int, ".len() as u64);
    b.delete("Option[".len() as u64);
    b.retain("str".len() as u64);
    b.insert(" | None");
    b.delete("]".len() as u64);
    b.retain("]".len() as u64);

    let (_, b_prime) = a.transform(&b).unwrap();
    let ab_prime = a.compose(&b_prime).unwrap();
    let s = ab_prime.apply(s).unwrap();
    println!("{}", s); // int | str | None
}

I'm not convinced that this is a good idea since we could easily start producing invalid Python.

charliermarsh · 2022-11-09T01:49:59Z

The way that pyupgrade handles this is somewhat interesting. They have a roundtrip tokenizer, and they implement all fixes as lazy transformations on the token stream. I don't see how that's completely safe, but it does work for the cases they implement.

ljodal · 2022-11-09T07:48:51Z

Hmm, does it have to be lazy?

My (new) understanding is that ruff builds up an ast of the file being checked, if that's correct it would be great to be able to build both checkers and fixers on top of the ast.

If we can keep both the tokens and the ast in memory and link them, then it should based on my naive understanding be possible to use the ast to detect where changes are needed, get the tokens for that ast and update them, and then based on the updated tokens update the ast representation again.

I'm a bit rusty when it comes to compiler theory, but I assume the ast is built up from the tokens, so we should have everything available. The hard part here would probably be to deal with certain things that don't map direly to ast nodes like white space, commas etc

andersk · 2022-11-20T08:02:31Z

Maybe we just want to apply a non-conflicting subset of the fixes, then repeatedly rerun the entire linter on the file until it converges?

I doubt we’re ever going to be able to guarantee that enough operations commute to make any other strategy work correctly.

charliermarsh · 2022-11-20T15:07:25Z

Yeah, I think that's very likely what we'll have to do.

charliermarsh · 2022-11-22T21:38:18Z

Should be fixed by #875.

charliermarsh added the enhancement label Nov 8, 2022

charliermarsh changed the title ~~Implement conflicting autofixes~~ Handle conflicting autofix instructions Nov 8, 2022

charliermarsh mentioned this issue Nov 11, 2022

Import-sorting (a ruff approximation of isort) #465

Closed

This was referenced Nov 21, 2022

[Bug] unused typing imports are not auto-fixed when pyupgrade option is selected #866

Closed

Upgrade autofix to fix-until-stable #868

Closed

Apply autofixes iteratively until code is stabilized #875

Merged

charliermarsh closed this as completed Nov 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle conflicting autofix instructions #660

Handle conflicting autofix instructions #660

charliermarsh commented Nov 8, 2022 •

edited

Loading

charliermarsh commented Nov 8, 2022

ljodal commented Nov 8, 2022

charliermarsh commented Nov 8, 2022

charliermarsh commented Nov 8, 2022

ljodal commented Nov 8, 2022

charliermarsh commented Nov 9, 2022

charliermarsh commented Nov 9, 2022

charliermarsh commented Nov 9, 2022

ljodal commented Nov 9, 2022

andersk commented Nov 20, 2022

charliermarsh commented Nov 20, 2022

charliermarsh commented Nov 22, 2022

Handle conflicting autofix instructions #660

Handle conflicting autofix instructions #660

Comments

charliermarsh commented Nov 8, 2022 • edited Loading

charliermarsh commented Nov 8, 2022

ljodal commented Nov 8, 2022

charliermarsh commented Nov 8, 2022

charliermarsh commented Nov 8, 2022

ljodal commented Nov 8, 2022

charliermarsh commented Nov 9, 2022

charliermarsh commented Nov 9, 2022

charliermarsh commented Nov 9, 2022

ljodal commented Nov 9, 2022

andersk commented Nov 20, 2022

charliermarsh commented Nov 20, 2022

charliermarsh commented Nov 22, 2022

charliermarsh commented Nov 8, 2022 •

edited

Loading