Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rustdoc-search: add support for traits and associated types #116085

Merged
merged 2 commits into from
Nov 21, 2023

Conversation

notriddle
Copy link
Contributor

@notriddle notriddle commented Sep 23, 2023

Summary

Trait associated type queries work in rustdoc's type driven search. The data is included in the search-index.js file, and the queries are designed to "do what I mean" when users type them in, so, for example, Iterator<Item=T> -> Option<T> includes Iterator::next in the SERP1, and Iterator<T> -> Option<T> also includes Iterator::next in the SERP.

Sample searches

Motivation

My primary motivation for working on search.js at all is to make it easier to use highly generic APIs, like the Iterator API. The type signature describes these functions pretty well, while the names are almost arbitrary.

Before this PR, type bindings were not consistently included in search-index.js at all (you couldn't find Iterator::next by typing in its function signature) and you couldn't explicitly search for them. This PR fixes both of these problems.

Guide-level explanation

Excerpt from the Rustdoc book, included in this PR.

Function signature searches can query generics, wrapped in angle brackets, and traits will be normalized like types in the search engine if no type parameters match them. For example, a function with the signature fn my_function<I: Iterator<Item=u32>>(input: I) -> usize can be matched with the following queries:

  • Iterator<Item=u32> -> usize
  • Iterator<u32> -> usize (you can leave out the Item= part)
  • Iterator -> usize (you can leave out iterator's generic entirely)
  • T -> usize (you can match with a generic parameter)

Each of the above queries is progressively looser, except the last one would not match dyn Iterator, since that's not a type parameter.

Reference-level explanation

Inside the angle brackets, you can choose whether to write a name before the parameter and the equal sign. This syntax is called GenericArgsBinding in the Rust Reference, and it allows you to constrain a trait's associated type.

As a convenience, you don't actually have to put the name in (Rust requires it, but Rustdoc Search doesn't). This works about the same way unboxing already works in Search: the terse Iterator<u32> is a match for Iterator<Item=u32>, but the opposite is not true, just like u32 is a match for Iterator<u32>.

When converting a trait method for the search index, the trait is substituted for Self, and all associated types are bound to generics. This way, if you have the following trait definition:

pub trait MyTrait {
    type Output;
    fn method(self) -> Self::Output;
}

The following queries will match its method:

  • MyTrait<Output=T> -> T
  • MyTrait<T> -> T
  • MyTrait -> T

But these queries will not match it:

  • MyTrait<Output=u32> -> u32
  • MyTrait<Output> -> Output
  • MyTrait -> MyTrait::Output

Drawbacks

It's a little bit bigger:

$ du before/search-index1.74.0.js after/search-index1.74.0.js
4020    before/search-index1.74.0.js
4068    after/search-index1.74.0.js

Rationale and alternatives

I don't want to just not do this. On it's own, it's not terribly useful, but in addition to searching by normal traits, this is also intended as a desugaring target for closures. That's why it needs to actually distinguish the two: it allows the future desugaring to distinguish function output and input.

The other alternative would be to not allow users to leave out the name, so iterator<u32> doesn't work. That would be unfortunate, because mixing up which ones have out params and which ones are plain generics is an easy enough mistake that the Rust compiler itself helps people out with it.

Prior art

Unresolved questions

Okay, but how do we want to handle closures? I know the system will desugar FnOnce(T) -> U into trait:FnOnce<Output=U, primitive:tuple<T>>, but what if I don't know what trait I'm looking for? This PR can merge with nothing, but it'd be nice to have a plan.

Specifically, how should the special form used to handle all varieties of basic callable: primitive:fn (function pointers), and trait:Fn, trait:FnOnce, and trait:FnMut should all be searchable using a single syntax, because I'm always forgetting which one is used in the function I'm looking for.

The essential question is how closely we want to copy Rust's own syntax. The tersest way to expression Option::map might be:

Option<T>, (T -> U) -> Option<U>

That's the approach I would prefer, but nobody's going to attempt it without being told, so maybe this would be better?

Option<T>, (fn(T) -> U) -> Option<U>

It does require double parens, but at least it's mostly unambiguous. Unfortunately, it looks like the syntax you'd use for function pointers, implying that if you specifically wanted to limit your search to function pointers, you'd need to use primitive:fn(T) -> U. Then again, searching is normally case-insensitive, so you'd want that anyway to disambiguate from trait:Fn(T) -> U.

Future possibilities

This thing really needs a ranking algorithm

That is, this PR increases the number of matches for some type-based queries. They're usually pretty good matches, but there's still more of them, and it's evident that if you have two functions, foo(MyTrait<u8>) and bar(MyTrait<Item=u8>), if the user typed MyTrait<u8> then foo should show up first.

A design choice that these PRs have followed is that adding more stuff to the search query always reduces the number of functions that get matched. The advantage of doing it that way is that you can rank them by just counting how many atoms are in the function's signature (lowest score goes on top). Since it's impossible for a matching function to have fewer atoms than the search query, if there's a function with exactly the same set of atoms in it, then that'll be on top.

More complicated ranking algos tend to penalize long documents anyway, if the distance metrics I found through Flipboard (and postgresql's ts_rank_cd) are anything to go by. Real-world data sets tend to have weird outliers, like they have God Functions with zillions of arguments of all sorts of types, and Rustdoc ought to put such a function at the bottom.

The other natural choice would be interleaving with unifyFunctionTypes to count the number of unboxings and reorderings. This would compute a distance function, and would do a fine job of ranking the results, as described here by the Hoogle dev, but is more complicated than it sounds. The current algorithm returns when it finds a result that exists at all, but a distance function should find an optimal solution to find the smallest sequence of edits.

This could also use a benchmark suite and some optimization

This approach also lends itself to layering a bloom filter in front of the backtracking unification engine.

  • At load time, hash the typeNameIdMap ID for each atom and set the matching entry in a fixed-size byte array for each function to 1. Call it fnType.bloomFilter
  • At search time, do the same for the atoms in the query (excluding special forms like [] that can match more than one thing). Call it parsedQuery.bloomFilter
  • For each function, if (fnType.bloomFilter | (~parsedQuery.bloomFilter) !== ~0) { return false; }

There's also room to optimize the unification engine itself, by using stacks and persistent data structures instead of copying arrays around, or by using hashing instead of linear scans (the current algorithm was rewritten from one that tried to do that, but was too much to fit in my head and had a bunch of bugs). The advantage of Just Backtracking Better over the bloom filter is that it doesn't require the engine to retain any special algebraic properties.

But, first, we need a set of benchmarks to be able to judge if such a thing will actually help.

Referring to associated types by path

I don't want to implement this one, but if I did, this is how I'd do it.

In Rust, this is represented by a structure called a qualified path, or QPath. They look like this:

<Self as Iterator>::Item
<F as FnOnce>::Output

They can also, if it's unambiguous, use a plain path and just let the system figure it out:

Self::Item
F::Output

In Rustdoc Type-Driven Search, we don't want to force people to be unambiguous. Instead, we should try all reasonable interpretations, return results whenever any of them match, and let users make their query more specific if too many results are matches.

To enable associated type path searches in Rustdoc, we need to:

  1. When lowering a trait method to a search-index.js function signature, Self should be explicitly represented as a generic argument. It should always be assigned -1, so that if the user uses Self in their search query, we can ensure it always matches the real Self and not something else. Any functions that don't have a Self should drop a 0 into the first position of the where clause, to express that there isn't one and reserve the -1 position.
    • Reminder: generics are negative, concrete types are positive, and zero is a reserved sentinel.
    • Right now, Iterator::next is lowered as if it were fn next<T>(self: Iterator<Item=T>) -> Option<T>.
      It should become fn next<Self, T>(self: Self) -> Option<T> where Self: Iterator<Item=T> instead.
  2. Add another backtracking edge to the unification engine, so that when the user writes something like some::thing, the interpretation where some is a module and thing is a standalone item becomes one possible match candidate, while the interpretation where some is a trait and thing is an associated type is a separate match candidate. The backtracking engine is basically powerful enough to do this already, since unboxing generic type parameters into their traits already requires the ability to do this kind of thing.
    • When interpreting some::thing where some is a trait and thing is an associated type, it should be treated equivalently to <self as some>::thing. If you want to bind it to some generic parameter other than Self, you need to explicitly say so.
    • If no trait called some actually exists, treat it as a generic type parameter instead. Track every trait mentioned in the current working function signature, and add a match candidate for each one.
    • A user that explicitly wants the trait-associated-type interpretation could write a qpath (like <self as trait>::type), and a user that explicitly wants the module-item interpretation should use an item type filter (like struct:module::type).
  3. To actually do the matching, maintain a Map<(QueryGenericParamId, TraitId), FnGenericParamId> alongside the existing Map<QueryGenericParamId, FnGenericParamId> that is already used to handle plain generic parameters. This works, because, when a trait function signature is lowered to search-index.js, the rustdoc backend always generates an FnGenericParamId for every trait associated type it sees mentioned in the function's signature.
  4. Parse QPaths. Specifically,
    • QueryElem adds three new fields. isQPath is a boolean flag, and traitNameId contains an entry for typeNameIdMap corresponding to the trait part of a qpath, and parentId may contain either a concrete type ID or a negative number referring to a generic type parameter. The actual id of the query elem will always be a negative number, because this is essentially a funny way to add a generic type constraint.
    • If it's a QPath, then both of those IDs get filled in with the respective parts of the map. The unification engine will check the where clause to ensure the trait actually applies to the generic parameter in question, will check the type parameter constraint, and will add a mapping to mgens recording this as a solution.
    • If it's just a regular path, then isQPath is false, and the parser will fill in both traitNameId and parentId based on the same path. The unification engine, seeing isQPath is false and that these IDs were filled in, will try all three solutions: the path might be part of a concrete type name, or it might be referring to a trait, or it might be referring to a generic type parameter.

Why not implement QPath searches?

I'm not sure if anybody really wants to write such complicated queries. You can do a pretty good job of describing the generic functions in the standard library without resorting to FQPs.

These two queries, for example, would both match the Iterator::map function if we added support for higher order function queries and a rule that allows a type to match its notable traits.

// I like this version, because it's identical to how `Option::map` would be written.
// There's a reason why Iterator::map and Option::map have the same name.
Iterator<T>, (T -> U) -> Iterator<U>

// This version explicitly uses the type parameter constraints.
Iterator<Item=T>, (T -> U) -> Iterator<Item=U>

If I try to write this one using FQP, however, the results seem worse:

// This one is less expressive than the versions that don't use associated type paths.
// It matches `Iterator::filter`, while the above two example queries don't.
Iterator, (Iterator::Item -> Iterator::Item) -> Iterator

// This doesn't work, because the return type of `Iterator::map` is not a generic
// parameter with an `Iterator` trait bound. It's a concrete type that
// implements `Iterator`. Return-Position-Impl-Trait is the same way.
//
// There's a difference between something like `map`, whose return value
// implements Iterator, and something like `collect`, where the caller
// gets to decide what the concrete type is going to be.
//Self, (Self::Item -> I::Item) -> I where Self: Iterator, I: Iterator

// This works, but it seems subjectively ugly, complex, and counterintuitive to me.
Self, (<Self as Iterator>::Item -> T) -> Iterator<Item=T>

Footnotes

  1. search engine results page

@rustbot
Copy link
Collaborator

rustbot commented Sep 23, 2023

r? @GuillaumeGomez

(rustbot has picked a reviewer for you, use r? to override)

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue. labels Sep 23, 2023
@rustbot
Copy link
Collaborator

rustbot commented Sep 23, 2023

Some changes occurred in HTML/CSS/JS.

cc @GuillaumeGomez, @jsha

Some changes occurred in src/librustdoc/clean/types.rs

cc @camelid

@notriddle notriddle marked this pull request as draft September 23, 2023 01:51
@GuillaumeGomez
Copy link
Member

The feature and the code both look good to me. Great work!

One thing that has been bugging me though is that the search doesn't make difference between an associated item name and an associated type. For example:

pub trait Trait<Foo> {
    fn f() -> Foo;
}

pub struct Foo;
pub struct Bar;

impl Trait for Bar<Foo> {
    ...
}

Then when searching Trait<Foo>, it will return two results: the trait itself and its implementation for Bar.

@notriddle
Copy link
Contributor Author

@GuillaumeGomez I don’t really understand your example, since it doesn’t compile, even after I replace the with an implementation.

Do you have a more complete test case, or a real example from the standard library or gtk-rs?

@GuillaumeGomez
Copy link
Member

Sorry, here is a valid code:

pub trait Trait<Foo> {
    type Item;

    fn f() -> Foo;
    fn f2() -> Self::Item;
}

pub struct Foo;
pub struct Bar;
pub struct Item;

impl Trait<Foo> for Bar {
    type Item = Item;
    fn f() -> Foo { Foo }
    fn f2() -> Self::Item { Item }
}

So if you look for Trait<Item>, you'll get results for type Item but also for Trait when either the generic or the associated type is set to Item. Same goes for Foo (which can be both the generic or a struct name).

@notriddle
Copy link
Contributor Author

When I run a search for Trait<Item> against that sample code, I don't get any of that. What sort of results were you expecting?

@GuillaumeGomez
Copy link
Member

Interestingly enough, looking for trait<yoyoyo> also works. I was sure that the generic was supposed to match an existing impl, a generic name or an associated type inside the trait or the impl. I suppose I completely misunderstood then.

@notriddle
Copy link
Contributor Author

notriddle commented Sep 25, 2023

The In Names tab is the one that comes up when I run that search, and it isn’t touched by this PR at all. Name-based search doesn’t really support generics. Only function signature search does.

https://rust-lang.zulipchat.com/#narrow/stream/266220-t-rustdoc/topic/search.20syntax.20discussion.20.28for.20.2390630.29/near/275924527

@GuillaumeGomez
Copy link
Member

Should have precised: I was mostly wondering about the current situation, not about this PR. It was more thoughts about the future of rustdoc search and its complexity rather than a problem I encountered in this PR (which I already approved as I agree with it 😉 ).

@notriddle notriddle marked this pull request as ready for review September 26, 2023 21:42
@rustbot
Copy link
Collaborator

rustbot commented Sep 26, 2023

Some changes occurred in src/librustdoc/clean/types.rs

cc @camelid

Some changes occurred in HTML/CSS/JS.

cc @GuillaumeGomez, @jsha

@GuillaumeGomez
Copy link
Member

Is there anything new I need to look at? If not we can start the FCP. :)

@notriddle
Copy link
Contributor Author

For the feature, not really. The feature spec should be ready for an FCP.

I'd like you to closely read through the code, though. See if the backtracking approach makes sense, and if there's any major corner cases that aren't being tested.

@GuillaumeGomez
Copy link
Member

Will do! Give me a few days so I can take enough time to do it.

@rust-log-analyzer

This comment has been minimized.

@notriddle notriddle force-pushed the notriddle/search-associated-types branch from 52c2d72 to d0c1dd6 Compare October 4, 2023 16:24
@rust-log-analyzer

This comment has been minimized.

@notriddle notriddle force-pushed the notriddle/search-associated-types branch from d0c1dd6 to ae21fa7 Compare October 4, 2023 17:05
@bors
Copy link
Contributor

bors commented Oct 6, 2023

☔ The latest upstream changes (presumably #116483) made this pull request unmergeable. Please resolve the merge conflicts.

@notriddle notriddle force-pushed the notriddle/search-associated-types branch from ae21fa7 to 1063b1c Compare October 6, 2023 17:24
@rust-log-analyzer

This comment has been minimized.

@compiler-errors
Copy link
Member

spurious. some random hash ending in the number f is was appended to add.

@bors retry

@rfcbot
Copy link

rfcbot commented Nov 10, 2023

🔔 This is now entering its final comment period, as per the review above. 🔔

@bors
Copy link
Contributor

bors commented Nov 16, 2023

☔ The latest upstream changes (presumably #117955) made this pull request unmergeable. Please resolve the merge conflicts.

@notriddle notriddle force-pushed the notriddle/search-associated-types branch from f28eca5 to dd988b7 Compare November 16, 2023 16:04
@rust-log-analyzer

This comment has been minimized.

@notriddle notriddle force-pushed the notriddle/search-associated-types branch from dd988b7 to 07b3a75 Compare November 16, 2023 16:11
@rust-log-analyzer

This comment has been minimized.

@notriddle notriddle force-pushed the notriddle/search-associated-types branch from 07b3a75 to 45b0fc3 Compare November 16, 2023 17:22
@rustbot
Copy link
Collaborator

rustbot commented Nov 16, 2023

Some changes occurred in GUI tests.

cc @GuillaumeGomez

@bors
Copy link
Contributor

bors commented Nov 19, 2023

☔ The latest upstream changes (presumably #118024) made this pull request unmergeable. Please resolve the merge conflicts.

@notriddle notriddle force-pushed the notriddle/search-associated-types branch from 45b0fc3 to acc74c0 Compare November 20, 2023 01:54
@rfcbot rfcbot added finished-final-comment-period The final comment period is finished for this PR / Issue. to-announce Announce this issue on triage meeting and removed final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. labels Nov 20, 2023
@rfcbot
Copy link

rfcbot commented Nov 20, 2023

The final comment period, with a disposition to merge, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

This will be merged soon.

@GuillaumeGomez
Copy link
Member

Thanks!

@bors r+ rollup

@bors
Copy link
Contributor

bors commented Nov 21, 2023

📌 Commit acc74c0 has been approved by GuillaumeGomez

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 21, 2023
bors added a commit to rust-lang-ci/rust that referenced this pull request Nov 21, 2023
Rollup of 6 pull requests

Successful merges:

 - rust-lang#116085 (rustdoc-search: add support for traits and associated types)
 - rust-lang#117522 (Remove `--check-cfg` checking of command line `--cfg` args)
 - rust-lang#118029 (Expand Miri's BorTag GC to a Provenance GC)
 - rust-lang#118035 (Fix early param lifetimes in generic_const_exprs)
 - rust-lang#118083 (Remove i686-apple-darwin cross-testing)
 - rust-lang#118091 (Remove now deprecated target x86_64-sun-solaris.)

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit e62f7be into rust-lang:master Nov 21, 2023
11 checks passed
@rustbot rustbot added this to the 1.76.0 milestone Nov 21, 2023
rust-timer added a commit to rust-lang-ci/rust that referenced this pull request Nov 21, 2023
Rollup merge of rust-lang#116085 - notriddle:notriddle/search-associated-types, r=GuillaumeGomez

rustdoc-search: add support for traits and associated types

# Summary

Trait associated type queries work in rustdoc's type driven search. The data is included in the search-index.js file, and the queries are designed to "do what I mean" when users type them in, so, for example, `Iterator<Item=T> -> Option<T>` includes `Iterator::next` in the SERP[^SERP], and `Iterator<T> -> Option<T>` also includes `Iterator::next` in the SERP.

[^SERP]: search engine results page

## Sample searches

* [`iterator<Item=T>, fnmut -> T`][iterreduce]
* [`iterator<T>, fnmut -> T`][iterreduceterse]

[iterreduce]: http://notriddle.com/rustdoc-html-demo-5/associated-types/std/index.html?search=iterator%3CItem%3DT%3E%2C%20fnmut%20-%3E%20T&filter-crate=std
[iterreduceterse]: http://notriddle.com/rustdoc-html-demo-5/associated-types/std/index.html?search=iterator%3CT%3E%2C%20fnmut%20-%3E%20T&filter-crate=std

# Motivation

My primary motivation for working on search.js at all is to make it easier to use highly generic APIs, like the Iterator API. The type signature describes these functions pretty well, while the names are almost arbitrary.

Before this PR, type bindings were not consistently included in search-index.js at all (you couldn't find Iterator::next by typing in its function signature) and you couldn't explicitly search for them. This PR fixes both of these problems.

# Guide-level explanation

*Excerpt from [the Rustdoc book](http://notriddle.com/rustdoc-html-demo-5/associated-types/rustdoc/read-documentation/search.html), included in this PR.*

> Function signature searches can query generics, wrapped in angle brackets, and traits will be normalized like types in the search engine if no type parameters match them. For example, a function with the signature `fn my_function<I: Iterator<Item=u32>>(input: I) -> usize` can be matched with the following queries:
>
> * `Iterator<Item=u32> -> usize`
> * `Iterator<u32> -> usize` (you can leave out the `Item=` part)
> * `Iterator -> usize` (you can leave out iterator's generic entirely)
> * `T -> usize` (you can match with a generic parameter)
>
> Each of the above queries is progressively looser, except the last one would not match `dyn Iterator`, since that's not a type parameter.

# Reference-level explanation

Inside the angle brackets, you can choose whether to write a name before the parameter and the equal sign. This syntax is called [`GenericArgsBinding`](https://doc.rust-lang.org/reference/paths.html#paths-in-expressions) in the Rust Reference, and it allows you to constrain a trait's associated type.

As a convenience, you don't actually have to put the name in (Rust requires it, but Rustdoc Search doesn't). This works about the same way unboxing already works in Search: the terse `Iterator<u32>` is a match for `Iterator<Item=u32>`, but the opposite is not true, just like `u32` is a match for `Iterator<u32>`.

When converting a trait method for the search index, the trait is substituted for `Self`, and all associated types are bound to generics. This way, if you have the following trait definition:

```rust
pub trait MyTrait {
    type Output;
    fn method(self) -> Self::Output;
}
```

The following queries will match its method:

  * `MyTrait<Output=T> -> T`
  * `MyTrait<T> -> T`
  * `MyTrait -> T`

But these queries will not match it:

  * <i>`MyTrait<Output=u32> -> u32`</i>
  * <i>`MyTrait<Output> -> Output`</i>
  * <i>`MyTrait -> MyTrait::Output`</i>

# Drawbacks

It's a little bit bigger:

```console
$ du before/search-index1.74.0.js after/search-index1.74.0.js
4020    before/search-index1.74.0.js
4068    after/search-index1.74.0.js
```

# Rationale and alternatives

I don't want to just not do this. On it's own, it's not terribly useful, but in addition to searching by normal traits, this is also intended as a desugaring target for closures. That's why it needs to actually distinguish the two: it allows the future desugaring to distinguish function output and input.

The other alternative would be to not allow users to leave out the name, so `iterator<u32>` doesn't work. That would be unfortunate, because mixing up which ones have out params and which ones are plain generics is an easy enough mistake that the Rust compiler itself helps people out with it.

# Prior art

  * <http://neilmitchell.blogspot.com/2020/06/hoogle-searching-overview.html>

    The current Rustdoc algorithm, both before this PR and after it, has a fairly expensive matching algorithm over a fairly simple file format. Luckily, we aren't trying to scale to all of crates.io, so it's usable, but it's not great when I throw it at docs.servo.org

# Unresolved questions

Okay, but *how do we want to handle closures?* I know the system will desugar `FnOnce(T) -> U` into `trait:FnOnce<Output=U, primitive:tuple<T>>`, but what if I don't know what trait I'm looking for? This PR can merge with nothing, but it'd be nice to have a plan.

Specifically, how should the special form used to handle all varieties of basic callable: primitive:fn (function pointers), and trait:Fn, trait:FnOnce, and trait:FnMut should all be searchable using a single syntax, because I'm always forgetting which one is used in the function I'm looking for.

The essential question is how closely we want to copy Rust's own syntax. The tersest way to expression Option::map might be:

    Option<T>, (T -> U) -> Option<U>

That's the approach I would prefer, but nobody's going to attempt it without being told, so maybe this would be better?

    Option<T>, (fn(T) -> U) -> Option<U>

It does require double parens, but at least it's mostly unambiguous. Unfortunately, it looks like the syntax you'd use for function pointers, implying that if you specifically wanted to limit your search to function pointers, you'd need to use `primitive:fn(T) -> U`. Then again, searching is normally case-insensitive, so you'd want that anyway to disambiguate from `trait:Fn(T) -> U`.

# Future possibilities

## This thing really needs a ranking algorithm

That is, this PR increases the number of matches for some type-based queries. They're usually pretty good matches, but there's still more of them, and it's evident that if you have two functions, `foo(MyTrait<u8>)` and `bar(MyTrait<Item=u8>)`, if the user typed `MyTrait<u8>` then `foo` should show up first.

A design choice that these PRs have followed is that adding more stuff to the search query always reduces the number of functions that get matched. The advantage of doing it that way is that you can rank them by just counting how many atoms are in the function's signature (lowest score goes on top). Since it's impossible for a matching function to have fewer atoms than the search query, if there's a function with exactly the same set of atoms in it, then that'll be on top.

More complicated ranking algos tend to penalize long documents anyway, if the [distance metrics](https://www.benfrederickson.com/distance-metrics/?utm_source=flipboard&utm_content=other) I found through [Flipboard](https://flipboard.com/`@arnie0426/building-recommender-systems-nvue3iqtgrn10t45)` (and postgresql's `ts_rank_cd`) are anything to go by. Real-world data sets tend to have weird outliers, like they have God Functions with zillions of arguments of all sorts of types, and Rustdoc ought to put such a function at the bottom.

The other natural choice would be interleaving with `unifyFunctionTypes` to count the number of unboxings and reorderings. This would compute a distance function, and would do a fine job of ranking the results, as [described here](https://ndmitchell.com/downloads/slides-hoogle_finding_functions_from_types-16_may_2011.pdf) by the Hoogle dev, but is more complicated than it sounds. The current algorithm returns when it finds a result that *exists at all*, but a distance function should find an *optimal solution* to find the smallest sequence of edits.

## This could also use a benchmark suite and some optimization

This approach also lends itself to layering a bloom filter in front of the backtracking unification engine.

* At load time, hash the typeNameIdMap ID for each atom and set the matching entry in a fixed-size byte array for each function to 1. Call it `fnType.bloomFilter`
* At search time, do the same for the atoms in the query (excluding special forms like `[]` that can match more than one thing). Call it `parsedQuery.bloomFilter`
* For each function, `if (fnType.bloomFilter | (~parsedQuery.bloomFilter) !== ~0) { return false; }`

There's also room to optimize the unification engine itself, by using stacks and persistent data structures instead of copying arrays around, or by using hashing instead of linear scans (the current algorithm was rewritten from one that tried to do that, but was too much to fit in my head and had a bunch of bugs). The advantage of Just Backtracking Better over the bloom filter is that it doesn't require the engine to retain any special algebraic properties.

But, first, we need a set of benchmarks to be able to judge if such a thing will actually help.

## Referring to associated types by path

*I don't want to implement this one, but if I did, this is how I'd do it.*

In Rust, this is represented by a structure called a qualified path, or QPath. They look like this:

    <Self as Iterator>::Item
    <F as FnOnce>::Output

They can also, if it's unambiguous, use a plain path and just let the system figure it out:

    Self::Item
    F::Output

In Rustdoc Type-Driven Search, we don't want to force people to be unambiguous. Instead, we should try *all reasonable interpretations*, return results whenever any of them match, and let users make their query more specific if too many results are matches.

To enable associated type path searches in Rustdoc, we need to:

1. When lowering a trait method to a search-index.js function signature, Self should be explicitly represented as a generic argument. It should always be assigned `-1`, so that if the user uses `Self` in their search query, we can ensure it always matches the real Self and not something else. Any functions that don't *have* a Self should drop a `0` into the first position of the where clause, to express that there isn't one and reserve the `-1` position.
   * Reminder: generics are negative, concrete types are positive, and zero is a reserved sentinel.
   * Right now, `Iterator::next` is lowered as if it were `fn next<T>(self: Iterator<Item=T>) -> Option<T>`.
     It should become `fn next<Self, T>(self: Self) -> Option<T> where Self: Iterator<Item=T>` instead.
3. Add another backtracking edge to the unification engine, so that when the user writes something like `some::thing`, the interpretation where `some` is a module and `thing` is a standalone item becomes one possible match candidate, while the interpretation where `some` is a trait and `thing` is an associated type is a separate match candidate. The backtracking engine is basically powerful enough to do this already, since unboxing generic type parameters into their traits already requires the ability to do this kind of thing.
   * When interpreting `some::thing` where `some` is a trait and `thing` is an associated type, it should be treated equivalently to `<self as some>::thing`. If you want to bind it to some generic parameter other than `Self`, you need to explicitly say so.
   * If no trait called `some` actually exists, treat it as a generic type parameter instead. Track every trait mentioned in the current working function signature, and add a match candidate for each one.
   * A user that explicitly wants the trait-associated-type interpretation could write a qpath (like `<self as trait>::type`), and a user that explicitly wants the module-item interpretation should use an item type filter (like `struct:module::type`).
4. To actually do the matching, maintain a `Map<(QueryGenericParamId, TraitId), FnGenericParamId>` alongside the existing `Map<QueryGenericParamId, FnGenericParamId>` that is already used to handle plain generic parameters. This works, because, when a trait function signature is lowered to search-index.js, the `rustdoc` backend always generates an FnGenericParamId for every trait associated type it sees mentioned in the function's signature.
5. Parse QPaths. Specifically,
   * QueryElem adds three new fields. `isQPath` is a boolean flag, and `traitNameId` contains an entry for `typeNameIdMap` corresponding to the trait part of a qpath, and `parentId` may contain either a concrete type ID or a negative number referring to a generic type parameter. The actual `id` of the query elem will always be a negative number, because this is essentially a funny way to add a generic type constraint.
   * If it's a QPath, then both of those IDs get filled in with the respective parts of the map. The unification engine will check the where clause to ensure the trait actually applies to the generic parameter in question, will check the type parameter constraint, and will add a mapping to `mgens` recording this as a solution.
   * If it's just a regular path, then `isQPath` is false, and the parser will fill in both `traitNameId` and `parentId` based on the same path. The unification engine, seeing isQPath is false and that these IDs were filled in, will try all three solutions: the path might be part of a concrete type name, or it might be referring to a trait, or it might be referring to a generic type parameter.

### Why not implement QPath searches?

I'm not sure if anybody really wants to write such complicated queries. You can do a pretty good job of describing the generic functions in the standard library without resorting to FQPs.

These two queries, for example, would both match the Iterator::map function if we added support for higher order function queries and a rule that allows a type to match its *notable traits*.

    // I like this version, because it's identical to how `Option::map` would be written.
    // There's a reason why Iterator::map and Option::map have the same name.
    Iterator<T>, (T -> U) -> Iterator<U>

    // This version explicitly uses the type parameter constraints.
    Iterator<Item=T>, (T -> U) -> Iterator<Item=U>

If I try to write this one using FQP, however, the results seem worse:

    // This one is less expressive than the versions that don't use associated type paths.
    // It matches `Iterator::filter`, while the above two example queries don't.
    Iterator, (Iterator::Item -> Iterator::Item) -> Iterator

    // This doesn't work, because the return type of `Iterator::map` is not a generic
    // parameter with an `Iterator` trait bound. It's a concrete type that
    // implements `Iterator`. Return-Position-Impl-Trait is the same way.
    //
    // There's a difference between something like `map`, whose return value
    // implements Iterator, and something like `collect`, where the caller
    // gets to decide what the concrete type is going to be.
    //Self, (Self::Item -> I::Item) -> I where Self: Iterator, I: Iterator

    // This works, but it seems subjectively ugly, complex, and counterintuitive to me.
    Self, (<Self as Iterator>::Item -> T) -> Iterator<Item=T>
@notriddle notriddle deleted the notriddle/search-associated-types branch November 21, 2023 17:31
@apiraino apiraino removed the to-announce Announce this issue on triage meeting label Dec 14, 2023
wip-sync pushed a commit to NetBSD/pkgsrc-wip that referenced this pull request Feb 18, 2024
Pkgsrc changes:
 * Adapt checksums and patches.

Upstream chnages:

Version 1.76.0 (2024-02-08)
==========================

Language
--------
- [Document Rust ABI compatibility between various types]
  (rust-lang/rust#115476)
- [Also: guarantee that char and u32 are ABI-compatible]
  (rust-lang/rust#118032)
- [Warn against ambiguous wide pointer comparisons]
  (rust-lang/rust#117758)

Compiler
--------
- [Lint pinned `#[must_use]` pointers (in particular, `Box<T>`
  where `T` is `#[must_use]`) in `unused_must_use`.]
  (rust-lang/rust#118054)
- [Soundness fix: fix computing the offset of an unsized field in
  a packed struct]
  (rust-lang/rust#118540)
- [Soundness fix: fix dynamic size/align computation logic for
  packed types with dyn Trait tail]
  (rust-lang/rust#118538)
- [Add `$message_type` field to distinguish json diagnostic outputs]
  (rust-lang/rust#115691)
- [Enable Rust to use the EHCont security feature of Windows]
  (rust-lang/rust#118013)
- [Add tier 3 {x86_64,i686}-win7-windows-msvc targets]
  (rust-lang/rust#118150)
- [Add tier 3 aarch64-apple-watchos target]
  (rust-lang/rust#119074)
- [Add tier 3 arm64e-apple-ios & arm64e-apple-darwin targets]
  (rust-lang/rust#115526)

Refer to Rust's [platform support page][platform-support-doc]
for more information on Rust's tiered platform support.

Libraries
---------
- [Add a column number to `dbg!()`]
  (rust-lang/rust#114962)
- [Add `std::hash::{DefaultHasher, RandomState}` exports]
  (rust-lang/rust#115694)
- [Fix rounding issue with exponents in fmt]
  (rust-lang/rust#116301)
- [Add T: ?Sized to `RwLockReadGuard` and `RwLockWriteGuard`'s Debug impls.]
  (rust-lang/rust#117138)
- [Windows: Allow `File::create` to work on hidden files]
  (rust-lang/rust#116438)

Stabilized APIs
---------------
- [`Arc::unwrap_or_clone`]
  (https://doc.rust-lang.org/stable/std/sync/struct.Arc.html#method.unwrap_or_clone)
- [`Rc::unwrap_or_clone`]
  (https://doc.rust-lang.org/stable/std/rc/struct.Rc.html#method.unwrap_or_clone)
- [`Result::inspect`]
  (https://doc.rust-lang.org/stable/std/result/enum.Result.html#method.inspect)
- [`Result::inspect_err`]
  (https://doc.rust-lang.org/stable/std/result/enum.Result.html#method.inspect_err)
- [`Option::inspect`]
  (https://doc.rust-lang.org/stable/std/option/enum.Option.html#method.inspect)
- [`type_name_of_val`]
  (https://doc.rust-lang.org/stable/std/any/fn.type_name_of_val.html)
- [`std::hash::{DefaultHasher, RandomState}`]
  (https://doc.rust-lang.org/stable/std/hash/index.html#structs)
  These were previously available only through `std::collections::hash_map`.
- [`ptr::{from_ref, from_mut}`]
  (https://doc.rust-lang.org/stable/std/ptr/fn.from_ref.html)
- [`ptr::addr_eq`](https://doc.rust-lang.org/stable/std/ptr/fn.addr_eq.html)

Cargo
-----

See [Cargo release notes]
(https://github.com/rust-lang/cargo/blob/master/CHANGELOG.md#cargo-176-2024-02-08).

Rustdoc
-------
- [Don't merge cfg and doc(cfg) attributes for re-exports]
  (rust-lang/rust#113091)
- [rustdoc: allow resizing the sidebar / hiding the top bar]
  (rust-lang/rust#115660)
- [rustdoc-search: add support for traits and associated types]
  (rust-lang/rust#116085)
- [rustdoc: Add highlighting for comments in items declaration]
  (rust-lang/rust#117869)

Compatibility Notes
-------------------
- [Add allow-by-default lint for unit bindings]
  (rust-lang/rust#112380)
  This is expected to be upgraded to a warning by default in a future Rust
  release. Some macros emit bindings with type `()` with user-provided spans,
  which means that this lint will warn for user code.
- [Remove x86_64-sun-solaris target.]
  (rust-lang/rust#118091)
- [Remove asmjs-unknown-emscripten target]
  (rust-lang/rust#117338)
- [Report errors in jobserver inherited through environment variables]
  (rust-lang/rust#113730)
  This [may warn](rust-lang/rust#120515)
  on benign problems too.
- [Update the minimum external LLVM to 16.]
  (rust-lang/rust#117947)
- [Improve `print_tts`](rust-lang/rust#114571)
  This change can break some naive manual parsing of token trees
  in proc macro code which expect a particular structure after
  `.to_string()`, rather than just arbitrary Rust code.
- [Make `IMPLIED_BOUNDS_ENTAILMENT` into a hard error from a lint]
  (rust-lang/rust#117984)
- [Vec's allocation behavior was changed when collecting some iterators]
  (rust-lang/rust#110353)
  Allocation behavior is currently not specified, nevertheless
  changes can be surprising.
  See [`impl FromIterator for Vec`]
  (https://doc.rust-lang.org/nightly/std/vec/struct.Vec.html#impl-FromIterator%3CT%3E-for-Vec%3CT%3E)
  for more details.
- [Properly reject `default` on free const items]
  (rust-lang/rust#117818)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. finished-final-comment-period The final comment period is finished for this PR / Issue. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants