[api-documenter] Fix ae-ambiguous-ids meta issue #1308

octogonz · 2019-06-02T20:46:45Z

This issue tracks the fix for a set of related bugs tagged by ae-ambiguous-ids. These issues all involve IDs that are generated using a surjective mapping that results in naming collisions (e.g. a file being ovewritten) or instability (e.g. a hyperlink being broken after a SemVer MINOR change).

The problem was summarized in #1252 (comment):

we need a naming pattern that ideally would satisfy several requirements:

be unambiguous, i.e. provide a one-to-one mapping for all possible inputs

be stable, e.g. adding a new function overload should not cause the UID to change for a preexisting function overload (since that could lead to broken hyperlinks)

look "nice" to humans (particularly for the name field)

(Elsewhere I pointed out that the TSDoc declaration reference notation meets 1 and 2, but perhaps not 3.)

Examples of problematic TypeScript constructs:

a static member with the same name as an instance member

a function with multiple overloads

a merged declarations (e.g. an enum X { ... } alongside namespace X { ... })

an indexer (defined using symbols without any name) (e.g. [key: string] : string; as a member of an interface)

Certain IDs are case-insensitive, which means that e.g. noTallPeople and NotAllPeople may collide.

And here's a summary of the specific kinds of IDs that we need to solve this for:

Markdown URLs: case insensitive, no special characters
DocFX YAML URLs: case insensitive, no special characters
DocFX YAML IDs: case sensitive, allows some special characters
Titles that appear in docs: needs to look nice to a human

The text was updated successfully, but these errors were encountered:

octogonz · 2019-06-02T21:26:57Z

Here's a pathological input that illustrates most of the sources of ambiguity, along with the corresponding TSDoc declaration reference expressions that would uniquely identify these members (without relying on @label tags):

// TSDoc: '(A:class)'
export class A {
  // TSDoc: '(A:class).(:constructor,0)'
  public constructor(c: number, d: number);
  // TSDoc: '(A:class).(:constructor,1)'
  public constructor(c: string);
  // not visible API
  public constructor(c: number|string, d?: number) {
  }

  // TSDoc: '(A:class).(constructor:instance)'
  public "constructor": number; // TypeScript keyword confusingly used as member name

  // TSDoc: '(A:class).("(constructor:instance)":instance)'
  public "(constructor:instance)": number; // TSDoc expression confusingly used as member name

  // TSDoc: '(A:class).(memberA:instance,0)'
  public memberA(e: number, f: number): void;
  // TSDoc: '(A:class).(memberA:instance,1)'
  public memberA(e: string): void;
  // not visible API
  public memberA(e: number|string, f?: number): void {
  }

  // TSDoc: '(A:class).(memberA:static,0)'
  public static memberA(g: boolean): void { // static member conflicting with instance member
  }

  // TSDoc: '(A:class).(MemberA:static,0)'
  public static MemberA(): void {  // differs by case only
  }
}

// TSDoc: '(B:namespace)'
export namespace B {
  // TSDoc: '(B:namespace).simpleSymbol'
  export const simpleSymbol: unique symbol = Symbol("B.simpleSymbol");
}

// TSDoc: '(B:enum)'
export enum B { // merged declaration
  // TSDoc: '(B:enum).One'
  One
}

// TSDoc: '(C:namespace)'
export namespace C {
  // TSDoc: '(C:namespace).(memberC:0)'
  export function memberC(): void {
  }
}

// TSDoc: '(C:interface)'
export interface C { // merged declaration
  // TSDoc: '(C:interface).memberC'
  memberC: number;  // "instance side" conflicts with "static side" from namespace

  // TSDoc: '(C:interface).(:new,0)'
  new(h: number): number;

  // TSDoc: '(C:interface).(:call,0)'
  (i: string): string;

  // TSDoc: '(C:interface).(:index,0)'
  [j: string]: number;

  // TSDoc: '(C:interface).[(B:namespace).simpleSymbol]'
  [B.simpleSymbol](k: boolean, l: string): void;
}

octogonz · 2019-06-02T22:41:12Z

be unambiguous, i.e. provide a one-to-one mapping for all possible inputs

be stable, e.g. adding a new function overload should not cause the UID to change for a preexisting function overload (since that could lead to broken hyperlinks)

look "nice" to humans (particularly for the name field)

Thinking about the above code, it's obvious that we can't have both 2 and 3. Maybe we can relax the "stable" requirement like this:

"If a new API member is added that makes an existing ID ambiguous, it's okay to change the ID, but an old ID should never get reassigned to a different target."

For example, if we use ID a.f to refer to a member function, and then we add an overload, it's better to replace the original ID with a.f_1 (breaking hyperlinks pointing to a.f), versus preserving it and having it accidentally point to the wrong overload. This design also ensures that a web server could remember the old URLs and be able to redirect old URLs to the new disambiguated URL.

rbuckton · 2019-06-03T20:09:19Z

Would it be better to form ids for method signatures based on their parameter types, not their overload order? You see this naming scheme in the .NET documentation for methods with overloads, i.e.:

CountdownEvent.AddCount() is System_Threading_CountdownEvent_AddCount
CountdownEvent.AddCount(int) is System_Threading_CountdownEvent_AddCount_System_Int32_

Adding a net-new overload to an existing overload list would break the "stable" requirement:

declare class Foo {
  method(): void; // (Foo:class).method
  // method(x: string): void;
  method(x: number): void; // (Foo:class).method_1
}

Uncommenting the above line would give that overload the name (Foo:class).method_1, replacing the existing link.

octogonz · 2019-06-03T23:36:44Z

Would it be better to form ids for method signatures based on their parameter types, not their overload order? You see this naming scheme in the .NET documentation for methods with overloads

People seemed to like this idea when we discussed it in #881 (comment).

However, by itself it's not a complete solution. It doesn't handle other sources of ambiguity (e.g. static vs instance, character case differences). It doesn't even handle all forms of overloading, for example:

export class Foo {
    /**
     * First overload
     * @param x - a boolean
     */
    public method(x: boolean): void;

    /**
     * Second overload
     * @param x - an array of strings
     */
    public method(x: string[]): void;

    public method(x: boolean | string[]): void {
      // . . .      
    }  
}

Perhaps we can think of ID assignment as a kind of mapping function:

f(item, context, history) --> identifier

The output of f() is an identifier string. If our scenario is a URL, then the identifier could be api/foo.method_1. If our scenario is a doc title, then the identifier could be Foo.method(x) #1. The item is the API declaration that the identifier refers to. The context is the set of other declarations whose mapped identifier must not conflict with item.

We could also consider enhancing API Documenter to remember a history of previous outputs (e.g. saved as a JSON file), which could be consulted by the function to avoid "broken links" (i.e. changing an identifier to something new) and/or "shuffled links" (i.e. reusing an existing identifier to reference a different target). This additional tech might make it easier to provide nice-looking identifiers, while still providing a solution for high traffic websites who are concerned about URLs getting broken or shuffled.

acchou · 2019-06-04T00:00:26Z

I'd like to register a preference for overloaded functions specifically, to simply merge all of the identifiers that collide into a single file. This is natural (and in my opinion, preferable) for overloads because they are documenting the behavior of a single function that may take different inputs -- duplicating the documentation for each overload is not user friendly.

octogonz · 2019-06-04T01:14:02Z

I'd like to register a preference for overloaded functions specifically, to simply merge all of the identifiers that collide into a single file.

@acchou I suspect this depends on the how the overloads are used in your API. We've heard from others that documenting individual overloads was an important need.

FYI we recently started work on an api-documenter.json settings file, so maybe the default handling of overloads could be configurable. (Feel free to open a GitHub issue tracking this, since it's somewhat offtopic here.)

We also proposed to add a new TSDoc tag @partOf that says "this declaration shouldn't get its own page; instead, show it as part of the signature for that declaration over there." (See microsoft/tsdoc#137) Then overloads could be merged like this:

/**
 * Adds some numbers together
 */
declare function add(x: number, y: number): number;

/** {@partOf (add:1)} */
declare function add(x: number, y: number, z: number): number;

octogonz · 2019-06-04T01:32:04Z

We could also consider enhancing API Documenter to remember a history of previous outputs (e.g. saved as a JSON file), which could be consulted by the function to avoid "broken links" (i.e. changing an identifier to something new) and/or "shuffled links" (i.e. reusing an existing identifier to reference a different target). This additional tech might make it easier to provide nice-looking identifiers, while still providing a solution for high traffic websites who are concerned about URLs getting broken or shuffled.

It's theoretically possible to provide a "no shuffling" guarantee without resorting to history. But every approach I can come up with leads to awkward, ugly identifiers. It seems that (optionally) tracking history in a JSON file is the right way to go: It's "opt-in" for people who don't care about broken links, and it's relatively easy to incorporate into a mapping algorithm.

Regarding the mapping algorithm, here's some more thoughts about how we can implement f():

Each hierarchical declaration should get solved independently. For example, in my code sample, if we're mapping A.MemberA to the URL /api/a.membera.md, the parts A-->a and MemberA-->membera should be calculated independently. I like this because something like /api/a-2.membera-1.md is more understandable than /api/a.membera-6.md.
The operation should start with a friendly mapping, e.g. lowering case and discarding bad characters: MemberA --> membera
For Unicode, it seems reasonable for the friendly mapping to include any normal writing characters, since this is the year 2019 and most systems now support IRI
For non-alphanumeric characters (e.g. public "&^@%#@"(x: string): void {) we should just throw them away rather than trying to encode them
Next, we handle any ambiguities introduced by the friendly mapping. This happens by computing f() for all siblings at a given level of the hierarchy, and then checking for collisions.
If a collision is encountered, we'll apply a series of disambiguation transformations. Since shuffled links are worse than broken links, when a disambiguation transformation is applied, it should be applied to all conflicting members. For example, if we disambiguate (A:class).(memberA:static,0) by mapping it to /api/a.member-static.md then (A:class).(memberA:instance,0) should also become /api/a.member-instance.md and not simply /api/a.member.md.

The transformations will depend on the particular identifier scenario. Recall that we're tackling 4 scenarios:

Markdown URLs
DocFX YAML URLs
DocFX YAML IDs
Titles that appear in docs

Since the URLs come from filenames, they shouldn't use characters that are bad for filenames. And although the f() mappings will be somewhat different, we should aim for the resulting identifiers to be as consistent as possible. Probably I'll create a single PR that implements all 4 mapping functions.

octogonz · 2019-06-04T01:59:22Z

What characters should be allowed in the URLs? In #1303 @acchou found that ( ) caused problems for Docusaurus. We also need to be careful with characters that cause problems in filenames.

I found this thread discussing safe filename characters, and it concludes that [0-9a-zA-Z\-._] is the safest set.

Suppose we start with a TSDoc canonical declaration reference like this:

(A:class).(memberA:instance,0)

The ( and ) are mostly redundant (and TSDoc considered making them optional), so let's drop them. We can't use :, but we can replace it with - which is not a valid identifier character. We can also map , to - without introducing ambiguity. Lastly, we'll discard the "index selector" ,0 because f() will reintroduce its own numbering if needed. This converts the above longform expression to a URL like this:

/api/a-class.membera-instance.md

If there are no ambiguities, we can reduce the canonical reference by removing the redundant parts before applying the character substitutions.

Or, if there still ambiguities after replacing characters, let's resolve them by reintroducing numeric suffixes after each dotted part, like this:

/api/a-class-1234.membera-instance-2345.md

Lastly, for the 4 anonymous member kinds (constructor, call, index, new) our friendly name can treat these as regular identifiers (a.constructor) instead of as TSDoc member selectors (a.-constructor).

There's other possible improvements, but this actually seems like a pretty reasonable starting point. BTW I thought about appending function parameter names, but then I realized that it means the URL will break if someone renames a parameter, which maybe isn't worth it. (?)

Here's a fully worked out result of the above algorithm:

// URL '/api/a.md'
export class A {
  // URL '/api/a.constructor-1.md'
  public constructor(c: number, d: number);
  // URL '/api/a.constructor-2.md'
  public constructor(c: string);
  // not visible API
  public constructor(c: number|string, d?: number) {
  }

  // URL '/api/a.constructor-3.md'
  public "constructor": number; // TypeScript keyword confusingly used as member name

  // URL '/api/a.constructorinstance.md'
  public "(constructor:instance)": number; // TSDoc expression confusingly used as member name

  // URL '/api/a.membera-instance-1.md'
  public memberA(e: number, f: number): void;
  // URL '/api/a.membera-instance-2.md'
  public memberA(e: string): void;
  // not visible API
  public memberA(e: number|string, f?: number): void {
  }

  // URL '/api/a.membera-static-1.md'
  public static memberA(g: boolean): void { // static member conflicting with instance member
  }

  // URL '/api/a.membera-static-2.md'
  public static MemberA(): void {  // differs by case only
  }
}

// URL '/api/b-namespace.md'
export namespace B {
  // URL '/api/b-namespace.simplesymbol.md'
  export const simpleSymbol: unique symbol = Symbol("B.simpleSymbol");
}

// URL '/api/b-enum.md'
export enum B { // merged declaration
  // URL '/api/b-enum.one.md'
  One
}

// URL '/api/c-namespace.md'
export namespace C {
  // URL '/api/c-namespace.memberc.md'
  export function memberC(): void {
  }
}

// URL '/api/c-interface.md'
export interface C { // merged declaration
  // URL '/api/c-interface.memberc.md'
  memberC: number;  // "instance side" conflicts with "static side" from namespace

  // URL '/api/c-interface.new.md'
  new(h: number): number;

  // URL '/api/c-interface.call.md'
  (i: string): string;

  // URL '/api/c-interface.index.md'
  [j: string]: number;

  // URL '/api/c-interface.bnamespacesimplesymbol.md'
  [B.simpleSymbol](k: boolean, l: string): void;
}

The handling of ECMAScript symbols may get awkward (e.g. (ClassM2:class).([(WellknownSymbolsM1:namespace).(toNumberPrimitive:variable)]:static) is possible in TSDoc). If we allow ourselves to use [ and ] characters in the file name, that would greatly improve the handling of ECMAScript symbols.

octogonz · 2019-06-04T02:06:52Z

Lemme know if you guys like this direction. If so, I can start work on a PR when I get some time.

It's tempting to put the mapping implementation in api-extractor-model so it can be reused by other tools besides API Documenter...

octogonz · 2019-06-04T02:17:36Z

Would it be better to form ids for method signatures based on their parameter types, not their overload order?

@rbuckton I just realized I misread your sentence as "parameter names". Parameter types won't work in TypeScript because the parameter types can be arbitrarily large expressions, whereas in C# they are generally a simple dotted name.

Examples from real code:

export function mergeStyles(...args: (IStyle | IStyleBaseArray | false | null | undefined)[]): string;

export function mergeStyleSets<
  TStyleSet1 extends IStyleSet<TStyleSet1>,
  TStyleSet2 extends IStyleSet<TStyleSet2>,
  TStyleSet3 extends IStyleSet<TStyleSet3>,
  TStyleSet4 extends IStyleSet<TStyleSet4>
>(
  styleSet1: TStyleSet1 | false | null | undefined,
  styleSet2: TStyleSet2 | false | null | undefined,
  styleSet3: TStyleSet3 | false | null | undefined,
  styleSet4: TStyleSet4 | false | null | undefined
): IProcessedStyleSet<TStyleSet1 & TStyleSet2 & TStyleSet3 & TStyleSet4>;

export function serializeRuleEntries(ruleEntries: { [key: string]: string | number }): string;

rbuckton · 2019-06-05T13:01:42Z

The operation should start with a friendly mapping, e.g. lowering case and discarding bad characters: MemberA --> membera

What about cases where this causes a collision, since JavaScript/TypeScript property names are case sensitive?

For non-alphanumeric characters (e.g. public "&^@%#@"(x: string): void {) we should just throw them away rather than trying to encode them.

Not opposed, but this can introduce collisions as well.

For methods, docfx is very specific about how uids for methods should be derived for ManagedReference documents (joining the uids of the parameter types). It seems best to follow suit to avoid the shuffled links concern.

Next, we handle any ambiguities introduced by the friendly mapping. This happens by computing f() for all siblings at a given level of the hierarchy, and then checking for collisions.

I concur with this, as it is something I had to implement myself in a custom YamlDocumenter for one of my projects: https://github.com/esfx/esfx/blob/master/scripts/docs/yamlDocumenter.js#L601

Keep in mind that the canonical UID you generate does not necessarily have to be a valid path/URL, but you could instead pass it through another function to generate a valid URL. Also, URLs already have an encoding strategy for invalid characters.

[…] replace it with - which is not a valid identifier character […]

This worries me because it is perfectly valid in a quoted method/property name. You would need some mechanism for "quoting" a name with invalid characters to avoid collisions. Another option is to double up - characters in a name:

a() -> a-method
"a-method"() -> a--method-method

[…] BTW I thought about appending function parameter names, […]

I would argue for parameter types rather than names, as that is the DocFX standard for ManagedReference documents.

Another way to deal with the shuffled links concern would be to have something like a @uid foo tag that could be used to explicitly set the uid (possibly for historical purposes).

Parameter types won't work in TypeScript because the parameter types can be arbitrarily large expressions, whereas in C# they are generally a simple dotted name.

C# UIDs can be quite large as well. (IStyle|IStyleBaseArray|false|null|undefined)[] isn't a very large UID compared to System.Collections.Generic.Dictionary{System.String,System.Collections.Generic.List{System.Int32}} (taken from the DocFX documentation).

For some cases, an arbitrarily complex type (such as an anonymous type) is better broken down into an alias and a reference:

  …
  - uid: some-package.some-class.some-method
    name: method()
    fullName: method()
    syntax:
      return:
        type:
        - some-package.some-class.some-method.anonymous-0
  ...
references:
  - uid: some-package.some-class.some-method.anonymous-0
    name.typeScript: '{ a: A, b: B, c: C, d: D }`
    fullName.typeScript: '{ a: A, b: B, c: C, d: D }`
    spec.typeScript:
    - name: '{ a: '
      fullName: '{ a: '
    - uid: A
      name: A
      fullName: A
    …

rbuckton · 2019-06-05T13:08:13Z

Regarding the parameter types and complex aliases, I posted a comment on an issue about complex types with additional context here: #867 (comment)

postspectacular · 2019-11-17T13:11:40Z

I understand this all is still in flux, but may I ask what is the currently supported syntax to link to an overloaded function? It seems there's some missing information and discrepancies between different parts of the documentation and the actual tooling/implementation. E.g. using the canonical references created by api-extractor in a {@link} tag cause the warnings for ambiguous references to disappear, but then still no link at all is showing up in the generated markdown...

Example:

Generated .api.json snippet:

{
  "kind": "Function",
  "canonicalReference": "@thi.ng/arrays!quickSort:function(1)",
  ...
}

Neither of these work (excluded from api-documenter markdown output):

{@link @thi.ng/arrays!quickSort:function(1)}
{@link @thi.ng/arrays#quickSort:function(1)}

Any (interim) advice is highly appreciated! Thanks!

octogonz · 2019-11-18T04:00:57Z

@postspectacular Let's discuss your question with #1240 since it's really about @link syntax, which is not ambiguous (in either of the two versions of the notation heheh).

kasperisager · 2021-04-07T09:25:52Z

Is there any news on this or something I can help with to move things along?

octogonz · 2021-06-28T20:18:11Z

@kasperisager which subproblem were you interested in?

kasperisager · 2021-06-29T06:21:46Z

This issue is affecting us quite a bit in our attempts to land Siteimprove/alfa#290, especially in relation to #1830 and #1921 that both seem to be manifestations of this underlying issue. I don't believe there are any patterns in the Alfa codebase that haven't already been outlined in the previous comments, so there's probably nothing new I can add on that front.

Also, I do apologise for this new "Tracked in [...]" notice now appearing fairly prominently on all the Rushstack issues we've referenced from Siteimprove/alfa#290. I don't know why GitHub all of a sudden decided it would be a good idea to broadcast that to the world 🙈

octogonz added ae-ambiguous-ids This issue belongs to a set of bugs tracked by Pete bug Something isn't working as intended labels Jun 2, 2019

rbuckton mentioned this issue Jun 11, 2019

[api-documenter] Add logic to disambiguate duplicate UID, file, or toc entries #1331

Closed

jsamr mentioned this issue Jun 15, 2019

Document library with API Extractor and TSDoc typeskill/typer#10

Open

This was referenced Jun 22, 2019

[api-documenter] Hyperlinking support for "complex" types (YAML generator) #867

Closed

[api-extractor] Deeply linked types #1337

Merged

octogonz mentioned this issue Sep 25, 2019

[api-documenter] Disambiguate output files with colliding names #1536

Merged

postspectacular mentioned this issue Nov 14, 2019

[all][docs] Problems in documentation test repo thi-ng/umbrella#174

Open

This was referenced Nov 18, 2019

[api-extractor] Linking to overloads generates warnings, but no way to specify which overload #1240

Closed

[api-extractor]: Add support for @throws #1632

Merged

JCMais mentioned this issue May 14, 2020

Research ways to create some API Documentation JCMais/node-libcurl#223

Closed

octogonz mentioned this issue Jun 8, 2020

[api-extractor] Incorrect handling of interfaces enhancing a class with the same name #1921

Open

2 tasks

kasperisager mentioned this issue Jul 3, 2020

Set up API Extractor for generating API documentation Siteimprove/alfa#290

Merged

9 tasks

octogonz mentioned this issue May 22, 2022

[api-extractor] Better support for handling unexported base classes #3430

Open

octogonz mentioned this issue Jul 25, 2022

[api-extractor] Option to include un-exported types in doc model output #3513

Closed

zelliott mentioned this issue Sep 7, 2022

[api-extractor] Fix #3593 Incorrect canonical reference to aliased class in .api.json #3602

Merged

iclanton added this to Bug Triage Feb 26, 2024

github-project-automation bot moved this to Needs triage in Bug Triage Feb 26, 2024

iclanton moved this from Needs triage to AE/AD in Bug Triage Feb 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[api-documenter] Fix ae-ambiguous-ids meta issue #1308

[api-documenter] Fix ae-ambiguous-ids meta issue #1308

octogonz commented Jun 2, 2019

octogonz commented Jun 2, 2019 •

edited

Loading

octogonz commented Jun 2, 2019

rbuckton commented Jun 3, 2019 •

edited

Loading

octogonz commented Jun 3, 2019

acchou commented Jun 4, 2019

octogonz commented Jun 4, 2019

octogonz commented Jun 4, 2019

octogonz commented Jun 4, 2019 •

edited

Loading

octogonz commented Jun 4, 2019

octogonz commented Jun 4, 2019

rbuckton commented Jun 5, 2019

rbuckton commented Jun 5, 2019

postspectacular commented Nov 17, 2019

octogonz commented Nov 18, 2019

kasperisager commented Apr 7, 2021

octogonz commented Jun 28, 2021

kasperisager commented Jun 29, 2021

[api-documenter] Fix ae-ambiguous-ids meta issue #1308

[api-documenter] Fix ae-ambiguous-ids meta issue #1308

Comments

octogonz commented Jun 2, 2019

octogonz commented Jun 2, 2019 • edited Loading

octogonz commented Jun 2, 2019

rbuckton commented Jun 3, 2019 • edited Loading

octogonz commented Jun 3, 2019

acchou commented Jun 4, 2019

octogonz commented Jun 4, 2019

octogonz commented Jun 4, 2019

octogonz commented Jun 4, 2019 • edited Loading

octogonz commented Jun 4, 2019

octogonz commented Jun 4, 2019

rbuckton commented Jun 5, 2019

rbuckton commented Jun 5, 2019

postspectacular commented Nov 17, 2019

octogonz commented Nov 18, 2019

kasperisager commented Apr 7, 2021

octogonz commented Jun 28, 2021

kasperisager commented Jun 29, 2021

octogonz commented Jun 2, 2019 •

edited

Loading

rbuckton commented Jun 3, 2019 •

edited

Loading

octogonz commented Jun 4, 2019 •

edited

Loading