-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle conflicting autofix instructions #660
Comments
I don't know how to implement this yet. I need to do some research. But a few ideas come to mind:
|
Have you worked with LibCST? I like how that handles modification to the ast/cst. Basically you implement a visitor that takes two nodes as input, I also think that would work fine for the example above here. You could have two visitors; one that does This would probably not fix the remove unused import issue, though that could maybe be considered a separate issue? If I’m not mistaken libcst handles that through a special hook where you can mark imports as unused/potentially unused after a modification has been made. I assume those imports are then removed in a second pass, which could run before the sorting runs |
Yeah! We use LibCST internally to power some (not all) of our autofix operations. However, I think that LibCST's visitor API is entirely defined on the Python side, so I haven't used it before (we just use the CST parser and the ability to generate code from a modified CST struct)... I do like what you're describing here. The main challenge, as-is, is that we don't actually create a CST for every module -- we create them eagerly, since CST generation is quite expensive. When we want to autofix something via LibCST, we extract the source code (using the delimiters defined by the AST), pass it to LibCST, generate the modified source, and then treat that source-to-source replacement as the "fix". In other words, as-implemented, the individual fix operations have access to a CST, but there's no concept of passing the modified code or CST from check to check. (That's not to say we can't use this approach; rather, that it would take work.) I do think we probably need to do something AST-aware (or CST-aware) in order to solve this problem. |
In the So, to get rid of the
Then, to get rid of the If fixes were defined as those kinds of atomic operations, I believe you could reconcile them using CRDT-like operations. But it may not apply as cleanly to over kinds of collisions. |
Ah, then I'd misunderstood how ruff works. I naively assumed it built up a cst of the source code and then traversed that. As a developer that's at least an interface that is great to work with and one benefit is that you can share a lot of code between linting and fixing. The visitor pattern in libcst is indeed implemented in Python. It's documented here and basically mirrors the visitors from the |
Yeah, I did a bunch of exploration around using the CST everywhere, but it was ~10x slower which brought execution time into the multiple seconds range. |
On CRDTs -- what I'm trying to describe is something like this: use operational_transform::OperationSeq;
pub fn main() {
let s = "Union[int, Option[str]]";
let mut a = OperationSeq::default();
a.delete("Union[".len() as u64);
a.retain("int".len() as u64);
a.delete(",".len() as u64);
a.insert(" |");
a.retain(" Option[str]".len() as u64);
a.delete("]".len() as u64);
let mut b = OperationSeq::default();
b.retain("Union[int, ".len() as u64);
b.delete("Option[".len() as u64);
b.retain("str".len() as u64);
b.insert(" | None");
b.delete("]".len() as u64);
b.retain("]".len() as u64);
let (_, b_prime) = a.transform(&b).unwrap();
let ab_prime = a.compose(&b_prime).unwrap();
let s = ab_prime.apply(s).unwrap();
println!("{}", s); // int | str | None
} I'm not convinced that this is a good idea since we could easily start producing invalid Python. |
The way that |
Hmm, does it have to be lazy? My (new) understanding is that ruff builds up an ast of the file being checked, if that's correct it would be great to be able to build both checkers and fixers on top of the ast. If we can keep both the tokens and the ast in memory and link them, then it should based on my naive understanding be possible to use the ast to detect where changes are needed, get the tokens for that ast and update them, and then based on the updated tokens update the ast representation again. I'm a bit rusty when it comes to compiler theory, but I assume the ast is built up from the tokens, so we should have everything available. The hard part here would probably be to deal with certain things that don't map direly to ast nodes like white space, commas etc |
Maybe we just want to apply a non-conflicting subset of the fixes, then repeatedly rerun the entire linter on the file until it converges? I doubt we’re ever going to be able to guarantee that enough operations commute to make any other strategy work correctly. |
Yeah, I think that's very likely what we'll have to do. |
Should be fixed by #875. |
If we have a line like:
Then if the relevant rules and Python version permit it, we'd like to autofix that line to:
However, as-is, that requires two overlapping fixes, which we don't support. So Ruff will fix the outer rule (to
x: int | Option[str]
), then fix the inner rule if you re-run it (x: int | str | None
).We need to implement an autofix strategy that's amenable to multiple "concurrent" fixes.
This will also apply to import sorting: we want to be able to remove unused imports and re-sort imports in a single pass.
The text was updated successfully, but these errors were encountered: