-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix #2844 #3911; add --spellsuggest to suggest symbols in scope with similar spellings on undefined symbol error #16067
Conversation
e09791f
to
b425d3e
Compare
In my experience with NimEdit the "edit distance" is really quite bad for this. See https://norvig.com/spell-correct.html for ideas how to do it. |
@Araq norvig's algorithm is unsuitable in the context of compiler spell correction:
as shown in this simplified (but realistic performance wise) example, it would take:
import std/sets
import std/math
import std/times
import std/editdistance
import std/random
import std/sugar
proc main()=
var r = initRand(987)
var words: seq[string]
let n = 1_000
var buf: string
let alphabet = {'a'..'z', '0'..'9'}
for i in 0..<n:
let m = (i mod 20) + 1
buf.setLen 0
for j in 0..<m: buf.add r.sample(alphabet)
words.add buf
var query = "editDistances"
block:
let editDist = 3 # param
let t0 = epochTime()
var t: HashSet[string]
for a in words: t.incl a
let W = query.len
let m = pow(2*W.float*36, editDist.float).int
let t1 = epochTime()
var c = 0
var prefix = "foobar"
for j in 0..<m:
if (prefix & $j) in t:
c.inc
let t2 = epochTime()
echo (t1: t2 - t1, t2: t1 - t0, candidates: m, dummy: c)
block:
let t1 = epochTime()
var c = 0
for ai in words:
c += editDistance(ai, query)
let t2 = epochTime()
echo (t: t2 - t1, dummy: c)
main() prints, with -d:danger: => > 1 minute is impractical. whereas the algo in this PR would take < 1 millisecond. for In other words, you'd need 400 million symbols in scope to break-even. And even then, there are trivial ways to dramatically speedup the existing editdistance based algorithm (I can go into details if needed). And that's only considering edit distances of size <= 3, and not even considering unicode, for which norvig's approach wouldn't work (prohibitive or at least unclear, the naive utf8 analog has issues). norvig's algo is only useful for:
try this PR, the heuristic I used (show all shortest edit distance, taking into account scope as tie breaker) does a pretty good job IMO. One improvement (to be done as future work) is to add Damerau–Levenshtein distance algorithm to Once this is implemented, it'll be trivial to use that instead. |
This is what I was getting at and even nimsuggest uses basic identifier counting for suggestions. But maybe you're right and the current scopes provide sufficient context. However, another thing that you should really do -- if you haven't already -- is to limit the edit-distance. |
d4125a3
to
2db7b21
Compare
@Araq PTAL
instead i now have:
and furthermore, a limit on number of suggestions is more intuitive than a limit on edit-distance (the ranking criterion might change); see also future work in PR msg item |
friendly ping @Araq |
Accepted but blocked by #15935 Sorry. |
what's the progress of this PR? |
Well it was ready and green for a while but (as I kind of predicted) broke after IC PR #15935 (ditto for other PR's I care a lot about such as |
2db7b21
to
ce2431c
Compare
a4e122a
to
48243fe
Compare
… on undefined symbol errors
… on undefined symbol errors
48243fe
to
11751d8
Compare
@Araq PTAL, this PR now works again (i had to use In addition:
the 32bit linux CI failure is unrelated and tracked in another issue |
…s in scope with similar spellings on undefined symbol error (nim-lang#16067) * add --spellsuggest to suggest symbols in scope with similar spellings on undefined symbol errors * implement --spellsuggest with 0 arguments
* followup nim-lang#16067 --spellSuggest * enable --spellSuggest by default * fixup
…s in scope with similar spellings on undefined symbol error (nim-lang#16067) * add --spellsuggest to suggest symbols in scope with similar spellings on undefined symbol errors * implement --spellsuggest with 0 arguments
* followup nim-lang#16067 --spellSuggest * enable --spellSuggest by default * fixup
fix #2844
fix #3911
fix #9197
saves time when perplexed about a symbol that you think should exist but was misspelt.
Some other languages have a similar feature too, and it helps.
note
example 1
example 2
future work
make this flag also suggest when a module import is misspellt, including things like
import exitprocs
instead ofimport std/exitprocs
also use spellsuggestions in more contexts, eg unrecognized pragmas, compiler flags/commands etc
add Damerau–Levenshtein distance algorithm to std/editdistance, which allows adjacent transpositions with cost 1 instead of cost 2, and use that instead (see misc std/editdistance timotheecour/Nim#397)
assign low cost (or cost 1) to moves: abcDef => defAbc; eg for pathJoin vs joinPath
I've left open for future work the currently un-allowed
--spellsuggest
(without argument), which will use some secret sauce based on length of query + max edit distance, but even with that secret sauce,--spellsuggest:N
will still be useful (EDIT: now implemented)I don't believe this adds any meaningful overhead so
--spellsuggest
can become a default once the 0 arg version is implemented=> followup #16067 --spellSuggest #17401
make this work with dotExpr:
links