Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

working perf test and API test injection #2

Merged
merged 5 commits into from
May 21, 2023

Conversation

jerch
Copy link

@jerch jerch commented May 20, 2023

No description provided.

@jerch
Copy link
Author

jerch commented May 20, 2023

@PerBothner Oh wait, still need to rename the benchmark test (initially copied over from serialize addon)...

@jerch
Copy link
Author

jerch commented May 20, 2023

Now you can run the basic benchmark test by doing this:

$> cd addons/xterm-addon-unicode-graphemes
$> yarn benchmark

You can alter the perf test or add perf related tests in benchmark/UnicodeGraphemeAddon.benchmark.ts.

@PerBothner
Copy link
Owner

Most of this I'm not able to evaluate, but have no objection to.

However, note that UnicodeProperties.ts is a generated file that comes from the unicode-properties project. So any changes here should also go upstream - at some point.

If we change the implementation in UnicodeProperties to use a simple Uint8Array for BMP and some other data structure for other chanracter that should also go upsteam - at some point.

@jerch
Copy link
Author

jerch commented May 20, 2023

Most of this I'm not able to evaluate, but have no objection to.

Well I tested it locally in a separate repo checkout, until things worked. There are still things to fine-tune (those nasty TODOs), but we can prolly fix those in the other PR?

However, note that UnicodeProperties.ts is a generated file that comes from the unicode-properties project. So any changes here should also go upstream - at some point.

Oh sorry, was not aware if this. In terms of "it should run in any JS env" I'd suggest to revisit the generated code for nodeJS incompat's. This got much better with newer nodeJS versions, but there are still cases, where an explicit nodeJS code path is needed (like here for base64 decoding).

If we change the implementation in UnicodeProperties to use a simple Uint8Array for BMP and some other data structure for other chanracter that should also go upsteam - at some point.

Yeah, I guess this will already make big difference perfwise. And the "wasted" memory of 65KB is cake these days.

@PerBothner
Copy link
Owner

Is "is cake" a German expression? It is not a common English phrase.

Note we gain the 64k back if unicode-graphemes becomes the default and UnicodeV6 is removed or moved to an addon.

@PerBothner
Copy link
Owner

The unicode-trie implementation does 2 lookup operations in the Uint32array if a BMP character:

  get(codePoint: number): number {
    let index;
    if ((codePoint < 0) || (codePoint > 0x10ffff)) {
      return this.errorValue;
    }

    if ((codePoint < 0xd800) || ((codePoint > 0xdbff) && (codePoint <= 0xffff))) {
      index = (this.data[codePoint >> SHIFT_2] << INDEX_SHIFT) + (codePoint & DATA_MASK);
      return this.data[index];
    }
  ... otherwise non-BMP ...
}

Replacing that with a single lookup in a Uint8Array is presumably slightly faster, but it seems likely there are others things contributing to the slowdown. If cache misses is a major factor, then perhaps special-casing ASCII (or perhaps below 0300) might be worthwhile. Though I would expect the cache needed for just ASCII would be minor.

@jerch
Copy link
Author

jerch commented May 20, 2023

Is "is cake" a German expression? It is not a common English phrase.

You sure? Came across it several times in the meaning of "something easily done" or "something not being a big thing". Even merriam webster lists it? (And nope, in German you would say "not a big thing", almost lit. translated, or Kinderspiel - "child's play").

Note we gain the 64k back if unicode-graphemes becomes the default and UnicodeV6 is removed or moved to an addon.

Right. Btw this was a bin search in older versions (see xtermjs#798 and xtermjs#1785), which was much slower.

@PerBothner
Copy link
Owner

"a piece of cake" is a traditional phrase, but "cake" without "piece" seems weird to me (and to my husband, who grew up here). Perhaps the language has evolved to drop the "piece" and we haven't noticed. There is also the phrase "easy as pie" (which is misleading because making a pie isn't that easy).

@jerch
Copy link
Author

jerch commented May 20, 2023

Ah sorry for creating confusion - my English is not that good and highly "poisoned" by the internet, movies and TV series. There is a high chance that I picked up some uncommon slang abbreviation without grasping its full contextual meaning (seems "cake" can have also very unpleasant meanings, just looked it up).

Btw the first post here explains, where the "pie" saying may come from - seems it is a "false friend" borrowing in Australia (coming from the Maori word PAI, which means GOOD, and turned into "pie" in English).

@PerBothner PerBothner merged commit e87fa16 into PerBothner:clusters May 21, 2023
@PerBothner
Copy link
Owner

This simple (if slightly ugly) tweak speeds the benchmark up by about 20%:

  public charProperties(codepoint: number, preceding: UnicodeCharProperties): UnicodeCharProperties {
    if ((codepoint >= 32 && codepoint < 127) && (preceding >> 3) == 0)
      return 1;
   ...

@jerch
Copy link
Author

jerch commented May 21, 2023

Wow that pretty impressive, given how small the change is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants