Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

debuginfo: Generate cross-crate unique type identifiers for debuginfo types #14819

Closed

Conversation

michaelwoerister
Copy link
Member

With this change, rustc creates a unique type identifier for types in debuginfo. These type identifiers are used by LLVM to correctly handle link-time-optimization scenarios but also help rustc with dealing with inlining from other crates. For more information, see the documentation block at the top of librustc/middle/trans/debuginfo.rs and also my blog post about the topic. This should fix the LTO issues that have been popping up lately.

The changes to the debuginfo module's inner workings are also improved by this. Metadata uniquing of pointer types is not handled explicitly instead of relying on LLVM doing the right thing behind the scenes, and region parameters on types should not lead to metadata duplication anymore.

There are two things that I'd like to get some feedback on:

  1. IDs for named items consist of two parts: The Strict Version Hash of their defining crate and the AST node id of their definition within that crate. My question is: Is the SVH a good choice for identifying the crate? Is it even going to stay? The crate-id RFC got me confused.
  2. Unique Type Identifiers can be arbitrary strings and right now the format is rather verbose. For debugging this is nice, because one can infer a lot about a type from the type id alone (it's more or less a signature). For deeply nested generics, id strings could get rather long though. One option to limit the id size would be to use some hashcode instead of the full id (anything that avoids collision as much as possible). Another option would be to use a more compact representation, like ty_encode. This reduces size but also readability.
    Since these ID's only show up in LLVM IR, I'm inclined to just leave in the verbose format for now, and only act if sizes of rlibs become a problem.

@huonw
Copy link
Member

huonw commented Jun 11, 2014

  1. IDs for named items consist of two parts: The Strict Version Hash of their defining crate and the AST node id of their definition within that crate. My question is: Is the SVH a good choice for identifying the crate? Is it even going to stay? The crate-id RFC got me confused.

AIUI SVH is just designed as a internal compiler thing for identifying when two crates are different (e.g. detect when they've been compiled with an different, incompatible version of rustc). @alexcrichton would know more.

@pnkfelix
Copy link
Member

If you read #10207 you can see the original motivation for SVH spelled out pretty clearly.

I'm not sure it is the right choice for what @michaelwoerister wants here. But then again, I am hard-pressed to think of a case where it would be wrong to use the SVH here. (After all, you shouldn't be trying to mix debug-info from a different compilation of a given crate, right?)

@michaelwoerister
Copy link
Member Author

Thanks for your comments, @huonw and @pnkfelix!
The SVH-approach should always provide something that works, but it might disambiguate types a bit too much.
Take the following case as example: I have two versions of a crate that only slightly differ: A' and A''. And I have one dependency that links to A' and another one that links to A'':

A' <--- D1 <--+
                \
                 +-- MyExe
                /
A'' <-- D2 <--+

Although 99% of the types in A' and A'' might be exactly the same, they will all be distinct as far as debuginfo is concerned, leading to a lot of duplication. Apart from the wasted space, the debuginfo should be fine though. And the same kind of duplication should happen for everything else in the two crates, as rustc won't check if A'::TypeX is the same as A''::TypeX, right?

@alexcrichton
Copy link
Member

Is the SVH a good choice for identifying the crate? Is it even going to stay?

Yes, as @pnkfelix said this seems like the right thing to use. It is also not going away (not affected by crate ids)

I'm inclined to just leave in the verbose format for now, and only act if sizes of rlibs become a problem.

Sounds good to me!

@michaelwoerister
Copy link
Member Author

Good to know. Thanks, everyone!

… types.

With this change, rustc creates a unique type identifier for types in debuginfo. These type identifiers are used by LLVM to correctly handle link-time-optimization scenarios but also help rustc with dealing with inlining from other crates. For more information, see the documentation block at the top of librustc/middle/trans/debuginfo.rs.

Fixes rust-lang#13681.
@michaelwoerister
Copy link
Member Author

Rebased and added a fix for #14385.

bors added a commit that referenced this pull request Jun 13, 2014
…ichton

With this change, rustc creates a unique type identifier for types in debuginfo. These type identifiers are used by LLVM to correctly handle link-time-optimization scenarios but also help rustc with dealing with inlining from other crates. For more information, see the documentation block at the top of librustc/middle/trans/debuginfo.rs and also [my blog post about the topic](http://michaelwoerister.github.io/2014/06/05/rust-debuginfo-and-unique-type-identifiers.html). This should fix the LTO issues that have been popping up lately. 

The changes to the debuginfo module's inner workings are also improved by this. Metadata uniquing of pointer types is not handled explicitly instead of relying on LLVM doing the right thing behind the scenes, and region parameters on types should not lead to metadata duplication anymore.

There are two things that I'd like to get some feedback on:
1. IDs for named items consist of two parts: The [Strict Version Hash](https://github.com/mozilla/rust/blob/0.10/src/librustc/back/svh.rs#L11) of their defining crate and the AST node id of their definition within that crate. My question is: Is the SVH a good choice for identifying the crate? Is it even going to stay? The [crate-id RFC](rust-lang/rfcs#109) got me confused.
2. Unique Type Identifiers can be arbitrary strings and right now the format is rather verbose. For debugging this is nice, because one can infer a lot about a type from the type id alone (it's more or less a signature). For deeply nested generics, id strings could get rather long though. One option to limit the id size would be to use some hashcode instead of the full id (anything that avoids collision as much as possible). Another option would be to use a more compact representation, like ty_encode. This reduces size but also readability.
Since these ID's only show up in LLVM IR, I'm inclined to just leave in the verbose format for now, and only act if sizes of rlibs become a problem.
@bors bors closed this Jun 13, 2014
@nrc
Copy link
Member

nrc commented Jun 13, 2014

After pulling master, I am getting "error: internal compiler error: Type metadata for ty::t '&[core::fmt::rt::Piece<>]' is already in the TypeMap!", I think due to this. Is it something I can resolve locally or a bug?

@michaelwoerister
Copy link
Member Author

@nick29581 That seems to be a error on my side. I'll look into it.

@michaelwoerister
Copy link
Member Author

@nick29581 Yeah, it seems that this is an error in the TypeMap consistency check. A quick hack to get rid of the error is to just comment out the two Session::bug() calls at https://github.com/mozilla/rust/blob/master/src/librustc/middle/trans/debuginfo.rs#L239 and https://github.com/mozilla/rust/blob/master/src/librustc/middle/trans/debuginfo.rs#L252. A proper fix will take a bit longer and I won't have time for it before next Wednesday. I'll open an issue so I don't forget. Sorry for the inconvenience!

bors added a commit to rust-lang-ci/rust that referenced this pull request Jun 5, 2023
fix(analysis-stats): divided by zero error

## What does this PR try to resolve?

2023-05-15 rust-analyzer suffers from

```
 thread 'main' panicked at 'attempt to divide by zero', crates/rust-analyzer/src/cli/analysis_stats.rs:230:56
```

This commit <rust-lang/rust-analyzer@51e8b8f> might be the culprit.

This PR uses `percentage` function to avoid the classic “division by zero” bug.

## Reproducer

```console
cargo new ra-test
pushd ra-test
echo "pub type Foo = u32;" >> src/lib.rs
rust-analyzer analysis-stats .
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants