-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dump results of analysis phase as CSV (DXR support in Rust) #13222
Conversation
// option. This file may not be copied, modified, or distributed | ||
// except according to those terms. | ||
|
||
// Output a CSV file containing the output from rustc's analysis. The data is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be a doc comment with //!
.
(Do you happen to have a DXR instance displaying the output of this?) |
@huonw I'm working on it :-) Need to land the changes to DXR and then set up the instance, hopefully later this week, all being well. I hope I can get it up soon so you can appreciate the joy this patch will bring :-) |
It seems unfortunate to bake this support directly in to the compiler itself. Most usage of A suggestion by @thestinger on IRC was to put this support in a separate tool inside of the |
I fear I have been selling this badly - please think of this not as DXR support, but just like dumping the JSON representation of the AST, but with semantic info too (and in CSV format, not JSON). There is nothing actually DXR-specific here, and if there is by mistake, then I should change it. DXR is just one way to use this info. That said, I wouldn't mind moving this into a different crate if you think that is a better place for it. |
What relationship does this have to the analysis already included in the crate metadata? |
I'd love to see this support land somewhere. I don't have a strong opinion about whether it lands in rustc or as an external tool. In principle, it'd be nice if rustc included a nice API for accessing this sort of data in a reliable way -- but that's a long way off. I guess having semantic tools like dxr and rustdoc in tree at least gives us a clear idea of what the requirements are. |
@brson - there is some overlap - if I knew then what I know now I would have leveraged some of the metadata code for this. But it is not duplication, in particular this info contains info about expressions in function bodies which metadata misses and also has more detail about intermediate forms, e.g., in @nikomatsakis Just to clarify, most of DXR is out of tree (in the DXR repo), what is here is just a dump of analysis info in CSV format. |
This type of thing is fine with me I think as long as it doesn't impose much maintenance burden. |
Let's move this forward. It's very self-contained. No big deal. @nick29581 I agree with Huon about the doc comment. Can you convert it? The issue with With those 'rebase' commits I'm guessing the intermediate commits are broken. The history here looks a bit messy anyway so can you rewrite it into one or a few? |
// Return all non-empty prefixes of a path. | ||
// For each prefix, we return the span for the last segment in the prefix and | ||
// a str representation of the entire prefix. | ||
fn process_path_prefixes(&self, path: &ast::Path) -> ~[(Span, ~str)] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These ~[]
should all be Vec
.
|
||
match fs::mkdir_recursive(&root_path, io::UserRWX) { | ||
Err(_) => { | ||
sess.bug(format!("Could not create directory {}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is just an error (not a bug), since it's almost certainly a problem on the users side. Also, this should print the error too
Err(e) => sess.err(format!("Could not create directory {}: {}",
root_path.display(), e).as_slice())
I'm in favour of this, and I think the patch can be landed (modulo a few final nits) and then iteratively improved/extended/refactored if necessary, especially since it's so self-contained. However, it's quite large so someone else might have another suggestion for how to handle it. I would be super-interested in seeing a regularly updated version of this hosted somewhere (part of the nightly build? I guess that adds yet another dependency (dxr) to the whole process. :( ). Presumably that would be assisted by targets in the makefile too. E.g. imitating make source-analysis
make source-analysis-stage2
make source-analysis-stage2-std
make source-analysis-stage2-rustc
# etc. (seems fine to postpone.) |
@huonw the plan is the Mozilla (web eng) will host an index of Rust and Servo which will be re-indexed daily. Going to take another few weeks though. Do you also want to be able to download the raw data? We could probably arrange that. Given that indexing is pretty much orthogonal to the make target, is there any advantage to having a make target for this as well as a -Z flag? |
Awesome! I was thinking the (I personally have no particular use for the raw data, but I imagine some people might. shrug) r=me assuming there's agreement for landing this. |
@huonw couldn't you just use |
Oh, I guess so... seems a little bit of a hack, but I suppose it works fine. |
I don't know what this means: recipe for target `i686-pc-mingw32/stage1/bin/rustlib/i686-pc-mingw32/lib/stamp.rustc' failed There are no errors I can see in the output, so I don't see why this is failing - is this just Windows build stupidity? |
Weird. Let's just retry... |
(my win32 machine compiled this pull request well) |
An error like that normally means that the windows bots segfaulted while building. If you push to try you'll get a similar environment which should have the same results, and I can help you see what the stack trace is at the time of the segfault. |
It looks like #14604 has suffered a similar build problem which I am currently investigating. It looks like it also requires rustc itself to be built without optimizations. |
I looked into the failure cause of #14604, and it appears to fail because of OOM. Perhaps this patch is triggering a large number of allocations? |
Note that I have un-gated the builder that this PR failed on (#14791), so a rebase will likely allow this to land. |
Adds the option -Zsave-analysis which will dump the results of syntax and type checking into CSV files. These can be interpreted by tools such as DXR to provide semantic information about Rust programs for code search, cross-reference, etc. Authored by Nick Cameron and Peter Elmers (@pelmers; including enums, type parameters/generics).
Adds a -Z flag `save-analysis` which runs after the analysis phase of the compiler and saves a bunch of info into a CSV file for the crate. This is designed to work with the DXR code browser, but is frontend-independent, that is this info should be useful for all kinds of code browsers, IDEs, or other tools. I need to squash commits before landing (there will probably be a fair few to come), please ignore them for now and just comment on the changes.
\o/ impressive work! |
Remove redundant 'resolve_obligations_as_possible' call Hi! I was looking for a "good first issue" and saw this one: rust-lang/rust-analyzer#7542. I like searching for performance improvements, so I wanted to try to find something useful there. There are two tests in integrated_benchmarks.rs, I looked at 'integrated_highlighting_benchmark' (not the one discussed in the issue above). Profile from that test looks like this: ``` $ RUN_SLOW_BENCHES=1 cargo test --release --package rust-analyzer --lib -- integrated_benchmarks::integrated_highlighting_benchmark --exact --nocapture Finished release [optimized] target(s) in 0.06s Running unittests src/lib.rs (target/release/deps/rust_analyzer-a80ca6bb8f877458) running 1 test workspace loading: 358.45ms initial: 9.60s change: 13.96µs cpu profiling is disabled, uncomment `default = [ "cpu_profiler" ]` in Cargo.toml to enable. 273ms - highlight 143ms - infer:wait @ per_query_memory_usage 143ms - infer_query 0 - crate_def_map:wait (3165 calls) 4ms - deref_by_trait (967 calls) 96ms - resolve_obligations_as_possible (22106 calls) 0 - trait_solve::wait (2068 calls) 21ms - Semantics::analyze_impl (18 calls) 0 - SourceBinder::to_module_def (20 calls) 36ms - classify_name (19 calls) 19ms - classify_name_ref (308 calls) 0 - crate_def_map:wait (461 calls) 4ms - descend_into_macros (628 calls) 0 - generic_params_query (4 calls) 0 - impl_data_with_diagnostics_query (1 calls) 45ms - infer:wait (37 calls) 0 - resolve_obligations_as_possible (2 calls) 0 - source_file_to_def (1 calls) 0 - trait_solve::wait (42 calls) after change: 275.23ms test integrated_benchmarks::integrated_highlighting_benchmark ... ok ``` 22106 calls to `resolve_obligations_as_possible` seem like the main issue there. One thing I noticed (and fixed in this PR) is that `InferenceContext::resolve_ty_shallow` first calls `resolve_obligations_as_possible`, and then calls `InferenceTable::resolve_ty_shallow`. But `InferenceTable::resolve_ty_shallow` [inside](https://github.com/rust-lang/rust-analyzer/blob/2e9f1204ca01c3e20898d4a67c8b84899d394a88/crates/hir-ty/src/infer/unify.rs#L372) again calls `resolve_obligations_as_possible`. `resolve_obligations_as_possible` inside has a while loop, which works until it can't find any helpful information. So calling this function for the second time does nothing, so one of the calls could be safely removed. `InferenceContext::resolve_ty_shallow` is actually quite a hot place, and after fixing it, the total number of `resolve_obligations_as_possible` in this test is reduced to 15516 (from 22106). "After change" time also improves from ~270ms to ~240ms, which is not a very huge win, but still something measurable. Same profile after PR: ``` $ RUN_SLOW_BENCHES=1 cargo test --release --package rust-analyzer --lib -- integrated_benchmarks::integrated_highlighting_benchmark --exact --nocapture Finished release [optimized] target(s) in 0.06s Running unittests src/lib.rs (target/release/deps/rust_analyzer-a80ca6bb8f877458) running 1 test workspace loading: 339.86ms initial: 9.28s change: 10.69µs cpu profiling is disabled, uncomment `default = [ "cpu_profiler" ]` in Cargo.toml to enable. 236ms - highlight 110ms - infer:wait @ per_query_memory_usage 110ms - infer_query 0 - crate_def_map:wait (3165 calls) 4ms - deref_by_trait (967 calls) 64ms - resolve_obligations_as_possible (15516 calls) 0 - trait_solve::wait (2068 calls) 21ms - Semantics::analyze_impl (18 calls) 0 - SourceBinder::to_module_def (20 calls) 34ms - classify_name (19 calls) 18ms - classify_name_ref (308 calls) 0 - crate_def_map:wait (461 calls) 3ms - descend_into_macros (628 calls) 0 - generic_params_query (4 calls) 0 - impl_data_with_diagnostics_query (1 calls) 45ms - infer:wait (37 calls) 0 - resolve_obligations_as_possible (2 calls) 0 - source_file_to_def (1 calls) 0 - trait_solve::wait (42 calls) after change: 238.15ms test integrated_benchmarks::integrated_highlighting_benchmark ... ok ``` The performance of this test could be further improved but at the cost of making code more complicated, so I wanted to check if such a change is desirable before sending another PR. `resolve_obligations_as_possible` is actually called a lot of times even when no new information was provided. As I understand, `resolve_obligations_as_possible` could do something useful only if some variables/values were unified since the last check. We can store a boolean variable inside `InferenceTable`, which indicates if `try_unify` was called after last `resolve_obligations_as_possible`. If it wasn't called, we can safely not call `resolve_obligations_as_possible` again. I tested this change locally, and it reduces the number of `resolve_obligations_as_possible` to several thousand (it is not shown in the profile anymore, so don't know the exact number), and the total time is reduced to ~180ms. Here is a generated profile: ``` $ RUN_SLOW_BENCHES=1 cargo test --release --package rust-analyzer --lib -- integrated_benchmarks::integrated_highlighting_benchmark --exact --nocapture Finished release [optimized] target(s) in 0.06s Running unittests src/lib.rs (target/release/deps/rust_analyzer-a80ca6bb8f877458) running 1 test workspace loading: 349.92ms initial: 8.56s change: 11.32µs cpu profiling is disabled, uncomment `default = [ "cpu_profiler" ]` in Cargo.toml to enable. 175ms - highlight 21ms - Semantics::analyze_impl (18 calls) 0 - SourceBinder::to_module_def (20 calls) 33ms - classify_name (19 calls) 17ms - classify_name_ref (308 calls) 0 - crate_def_map:wait (461 calls) 3ms - descend_into_macros (628 calls) 0 - generic_params_query (4 calls) 0 - impl_data_with_diagnostics_query (1 calls) 97ms - infer:wait (38 calls) 0 - resolve_obligations_as_possible (2 calls) 0 - source_file_to_def (1 calls) 0 - trait_solve::wait (42 calls) after change: 177.04ms test integrated_benchmarks::integrated_highlighting_benchmark ... ok ``` Let me know if adding a new bool field seems like a reasonable tradeoff, so I can send a PR.
…1995 Use a deterministic number of digits in rustc_tools_util commit hashes Using `git rev-parse --short` in rustc_tools_util causes nondeterministic compilation of projects that use `setup_version_info!` and `get_version_info!` when built from the exact same source code and git commit. The number of digits printed by `--short` is sensitive to how many other branches and tags in the repository have been fetched so far, what other commits have been worked on in other branches, how recently you had run `git gc`, platform-specific variation in git's default configuration, and platform differences in the sequence of steps performed by the release pipeline. Someone can compile a tool from a particular commit, switch branches to work on a different commit (or simply do a git fetch), go back to the first commit and be unable to reproduce the binary that was built from it previously. Currently, variation in short commit hashes causes Clippy version strings to be out of sync between different targets. On x86_64-unknown-linux-gnu: ```console $ clippy-driver +1.80.0 --version clippy 0.1.80 (0514789 2024-07-21) ``` Whereas on aarch64-apple-darwin: ```console $ clippy-driver +1.80.0 --version clippy 0.1.80 (0514789 2024-07-21) ``` --- changelog: none
Adds a -Z flag
save-analysis
which runs after the analysis phase of the compiler and saves a bunch of info into a CSV file for the crate. This is designed to work with the DXR code browser, but is frontend-independent, that is this info should be useful for all kinds of code browsers, IDEs, or other tools.I need to squash commits before landing (there will probably be a fair few to come), please ignore them for now and just comment on the changes.