You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Dec 15, 2022. It is now read-only.
this makes any string containing multi-byte characters match in O(N^2), since at every advancing step, the code loops back to compute the conversion
In my opinion, the only way to make this conversion fast is to compute it once up-front and use memoization for the conversions. However, the current caching mechanism is sub optimal since each string is cached per OnigScanner. So, if a line needs 20 OnigScanner's to tokenize, the conversion would still happen 20 times. The current caching mechanism is also "leakish" as each OnigScanner keeps references to the last matched string. Moreover, the current caching mechanism forces users of node-oniguruma to reuse to the maximum possible extent a certain OnigScanner instance in order to get caching.
I think that the users of node-oniguruma know best when and how they want to match a certain string.
I would like, when I get some free time, to work on a PR that does the following:
introduces & exposes to JS a new type called OnigString
the OnigString will compute the conversion up-front in its ctor.
the OnigString will provide cache slots for all OnigRegExp.
the OnigString has its lifecycle controlled by v8, so when javascript dereferences it, all cache slots and cached conversion is GCed.
make OnigRegExp::Search cached by only accepting an OnigString. We will therefore get that each individual regular expression is using the cache (no more need to do the trick with a OnigScanner with precisely one regular expression)
make OnigScanner::FindNextMatchSync accept a v8String or an OnigString. If it is called with a v8String it immediately constructs an OnigString, i.e. not doing any caching across calls.
remove all other caching done through OnigStringContext
This would change semantics & performance characteristics of node-oniguruma, requiring a new major version. After this change, the javascript users will be able to use the library with caching or without caching:
I wanted to check with you @zcbenz@kevinsawicki if this is a change that takes node-oniguruma in a good direction and that you would agree with such a change ... before I invest time in it.
Thanks!
The text was updated successfully, but these errors were encountered:
From microsoft/vscode#94
I think this impacts also atom on Mac/Linux. Possible repro steps for atom / first-mate:
I believe the root cause of the slowness is the charOffset <-> byteOffset conversion:
In my opinion, the only way to make this conversion fast is to compute it once up-front and use memoization for the conversions. However, the current caching mechanism is sub optimal since each string is cached per
OnigScanner
. So, if a line needs 20OnigScanner
's to tokenize, the conversion would still happen 20 times. The current caching mechanism is also "leakish" as eachOnigScanner
keeps references to the last matched string. Moreover, the current caching mechanism forces users of node-oniguruma to reuse to the maximum possible extent a certainOnigScanner
instance in order to get caching.I think that the users of node-oniguruma know best when and how they want to match a certain string.
I would like, when I get some free time, to work on a PR that does the following:
OnigString
OnigString
will compute the conversion up-front in its ctor.OnigString
will provide cache slots for allOnigRegExp
.OnigString
has its lifecycle controlled by v8, so when javascript dereferences it, all cache slots and cached conversion is GCed.OnigRegExp::Search
cached by only accepting anOnigString
. We will therefore get that each individual regular expression is using the cache (no more need to do the trick with aOnigScanner
with precisely one regular expression)OnigScanner::FindNextMatchSync
accept av8String
or anOnigString
. If it is called with av8String
it immediately constructs anOnigString
, i.e. not doing any caching across calls.OnigStringContext
This would change semantics & performance characteristics of node-oniguruma, requiring a new major version. After this change, the javascript users will be able to use the library with caching or without caching:
I wanted to check with you @zcbenz @kevinsawicki if this is a change that takes node-oniguruma in a good direction and that you would agree with such a change ... before I invest time in it.
Thanks!
The text was updated successfully, but these errors were encountered: