Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix KeyCaseSensitive and Stripkey Problem #45

Merged
merged 5 commits into from
Jun 24, 2021

Conversation

songxiaocheng
Copy link
Contributor

@songxiaocheng songxiaocheng commented Jun 23, 2021

  1. 调用者可以强行设置 searchOptions,在不强行设置searchOptions的情况下,遵重词典设置
  2. 在大小写不敏感情况下,如果存在精确匹配,则返回精确匹配的条目

通过了所有test测试,debug也不报错,example还没有实验。

我在使用该库过程中,之前无法匹配大小写不一致的css、js等,该 PR 可以解决我遇到的问题。

参见 #41 的讨论。

其它实现相关:

  1. compareFn this 移除,作为 _reduceWordKeyBlock 和 _binarySearh 的入参。因为 KeyCaseSensitive 为 No 时,先用大小写敏感的 compareFn,查不到再用大小写不敏感的 compareFn
  2. _stripKey 根据 Stripkey 提供 _s 函数,compareFn 根据 KeyCaseSensitive 设置,两者互相独立,因此删去 _stripKey 中的 KeyCaseSensitive 相关的部分。
  3. 修改过程中,为复用共同逻辑,进行了轻微重构。

@songxiaocheng
Copy link
Contributor Author

提交有点乱了,我整理下

@terasum
Copy link
Owner

terasum commented Jun 23, 2021

我觉得这个代码还是有两个地方还需要明确一下:

  1. 是否需要 在大小写不敏感情况下,如果存在精确匹配,则返回精确匹配的条目,否则返回不精确匹配? 也就是说 查大写是否要返回小写
  2. css/js 今天我在测试的时候发现的一个问题是解析 keyText 的时候用的 UTF8 解码器可能导致检索不到,现在master分支应该是可以检索的到的

另外 example 可以 用 yarn run example 跑一下的

@songxiaocheng
Copy link
Contributor Author

我觉得这个代码还是有两个地方还需要明确一下:

  1. 是否需要 在大小写不敏感情况下,如果存在精确匹配,则返回精确匹配的条目,否则返回不精确匹配? 也就是说 查大写是否要返回小写
  2. css/js 今天我在测试的时候发现的一个问题是解析 keyText 的时候用的 UTF8 解码器可能导致检索不到,现在master分支应该是可以检索的到的

另外 example 可以 用 yarn run example 跑一下的

  1. 至少MDD文件是非常需要,对于MDX文件来说,也没什么坏处。
  2. 之前的代码给MDD设置了utf-16的解码器,但是后来的代码给覆盖了,实际上还是用了utf-8,应该用utf-16,这个我也改过来了。

我整理完提交试试example。

@songxiaocheng songxiaocheng reopened this Jun 23, 2021
@songxiaocheng songxiaocheng force-pushed the PR/keycasesensitive-stripkey branch from f71045c to f9fb170 Compare June 23, 2021 18:34
@songxiaocheng
Copy link
Contributor Author

songxiaocheng commented Jun 23, 2021

@terasum 已更新,目前可以跑通所有 test 和 jest,debug 和 example 均不报错。

该PR包含了 PR #34 和 PR #38 的内容。

Copy link
Owner

@terasum terasum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我先合并,后面再细看下我觉得有问题的两个地方

if (!_s || _s == undefined) {
// eslint-disable-next-line
_s = this._stripKey();
}
let left = 0;
let right = list.length;
let mid = 0;
while (left < right) {
while (left <= right) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果这里非要改成 <= 的话,上面的 right = list.length 应该改成 right= list.length -1

Copy link
Contributor Author

@songxiaocheng songxiaocheng Jun 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我认为不是,1. left 可以被加到等于 list.length,此时返回 undefined. 2. right 可能被减。
其实你原来是

while (left < right) {
      ...
      const compareResult = compareFn(_s(word), _s(list[mid].keyText));
      if (compareResult > 0) {
        left = mid + 1;
      } else if (compareResult == 0) {
        return mid;
      } else {
        right = mid - 1;
      }
}
return left;

造成 最后返回的 left 从未参与过比较
只需比较一下最后的 left,将上面直接改为

while (left < right) {
      ...
      const compareResult = compareFn(_s(word), _s(list[mid].keyText));
      ...
}
return compareFn(_s(word), _s(list[mid].keyText)) == 0 ? left : undefined

也是可以的,但是这样 compareFn 部分写在俩不同地方,于是我给合并到上面 while 中变为

while (left <= right) {
      ...
      const compareResult = compareFn(_s(word), _s(list[mid].keyText));
      ...
}
return undefined

自始至终和 let right = list.length; 无关。

Copy link
Contributor Author

@songxiaocheng songxiaocheng Jun 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

但是似乎 right = list.length 改成 right= list.length - 1 也没啥问题?毕竟从不直接用 right 访问,只影响第一个 mid 的值,但是初始的 mid = left + ((right - left) >> 1) 似乎一样。

最初,left 为 0,rightlist.length,因而 mid 等于 list.length >> 1,这里这些都是非负的,>>1 相当于整除以2(或除以2后向下取整):若 length 为奇数,则等于 floor(length/2);若 length 为偶数,则等于 length/2。两者都在 [0, length-1] 中,即 mid 总是可取到的。除此之外,初始的 right 值似乎没有其他影响了。

你可以看看这里的试验:https://jsfiddle.net/songxiaocheng/3d4Ls01a/5/ 手动改一下 right 初始值,结果显示,减不减1结果似乎都是一样的。既然如此,可考虑和其他地方出现的二分查找代码统一写法。

Copy link
Owner

@terasum terasum Jun 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我看了一下你的试验代码,上面改为 right = list.length 是会出现越界的情况的:

function _binarySearh(list, a) {
    let left = 0;
    let right = list.length;
    let mid = 0;
    while (left <= right) {
      mid = left + ((right - left) >> 1);
      console.log("> " + mid+" " + list[mid]);
      if (a>list[mid]) {
        left = mid + 1;
      } else if (a==list[mid]) {
        return mid;
      } else {
        right = mid - 1;
      }
    }
    return undefined
}

list1 = [1,2,3,4]
list2 = [1,2,3,4,5]

console.log(_binarySearh(list1, 5)==undefined)

输出

> 2 3
> 3 4
> 4 undefined
true

list[mid] 在越界的时候会变成 undefiend 可能会出现非预期行为

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

改成 let right = list.length-1; 就不会有问题了

}

return this.searchOptions.stripKey ||
common.isTrue(this.header.StripKey || (this._version >= 2.0 ? '' : 'yes'))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这行的 (this._version >= 2.0 ? '' : 'yes') 我觉得应该要保留的

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

你在 _readHeader() 中已经保证了 this.header.StripKey 有值了:

this.header.StripKey = this.header.StripKey || 'Yes';

@terasum terasum merged commit daca55b into terasum:master Jun 24, 2021
terasum pushed a commit that referenced this pull request Oct 7, 2024
terasum pushed a commit that referenced this pull request Oct 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants