Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow build index on property that cardinality is list or set #1466

Closed
wants to merge 8 commits into from

Conversation

jadepeng
Copy link
Contributor

支持对cardinality 为list或set 的属性建索引
see: #893

@CLAassistant
Copy link

CLAassistant commented May 26, 2021

CLA assistant check
All committers have signed the CLA.

Copy link
Contributor

@javeme javeme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice commit! thank you very much for your contribution. there still some minor comments before merging the commit.

@codecov
Copy link

codecov bot commented May 26, 2021

Codecov Report

Merging #1466 (b8f8b6a) into master (718daa6) will increase coverage by 0.46%.
The diff coverage is 85.36%.

❗ Current head b8f8b6a differs from pull request most recent head 5a9331b. Consider uploading reports for the commit 5a9331b to get more accurate results
Impacted file tree graph

@@             Coverage Diff              @@
##             master    #1466      +/-   ##
============================================
+ Coverage     61.73%   62.20%   +0.46%     
- Complexity     5842     5879      +37     
============================================
  Files           381      381              
  Lines         32177    32206      +29     
  Branches       4504     4515      +11     
============================================
+ Hits          19866    20035     +169     
+ Misses        10265    10116     -149     
- Partials       2046     2055       +9     
Impacted Files Coverage Δ
...a/com/baidu/hugegraph/backend/query/Condition.java 77.89% <0.00%> (-0.57%) ⬇️
...n/java/com/baidu/hugegraph/schema/PropertyKey.java 76.47% <50.00%> (-0.74%) ⬇️
...du/hugegraph/schema/builder/IndexLabelBuilder.java 87.08% <66.66%> (-0.27%) ⬇️
.../baidu/hugegraph/backend/query/ConditionQuery.java 86.66% <85.71%> (+0.66%) ⬆️
...du/hugegraph/backend/tx/GraphIndexTransaction.java 81.48% <100.00%> (+0.33%) ⬆️
...a/com/baidu/hugegraph/api/traversers/CountAPI.java 0.00% <0.00%> (ø)
...va/com/baidu/hugegraph/backend/cache/RamCache.java 72.41% <0.00%> (+0.57%) ⬆️
... and 7 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 718daa6...5a9331b. Read the comment docs.

Comment on lines 47 to 48
import afu.org.checkerframework.checker.oigj.qual.O;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

u can use this IDEA-style-config to simplify code style problem(include align / import rule & auto remove useless.. etc)

Copy link
Member

@imbajin imbajin May 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

另外, 多谢贡献.

方便的话可以评论区/文档/博客 简单说一下你的实现思路, 也可以探讨一下不同方案的优劣. 或者及时沟通, IM 交流讨论一下都行 😄 (📬 )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个方案,主要思路就是为collection 的每一项,以及所有组合项,都建立索引
组合项建立索引是一种省事的方案
缺点是索引数会多一点,不过这种属性一般用来存储tag等,数量相对有限
好处是在查询多个item时,不用找每个item的id列表求交集,一次查询即可

还有个方案就是像全文索引那样,每一个item建立索引就行

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

刚看到这个回复, 那理解应该是接近的, 考虑到图使用方式的各不相同, 很难限定大家都只有少量的 set 元素, 一旦有同学基于稍多一点元素的集合建立了这类索引, 代价都太大

就算选择单个 item 建索引的方案, 也有几个必要问题需要一起确认和思考, 对应增删改查有 4 个问题:

  1. 写索引的时候怎么做
  2. 读索引的时候怎么查
  3. set / list 元素可能经常更新, 此时索引更新的方案 (比如新增, 减少了一个元素)
  4. 删除点边时这个索引怎么删

PS: 另外 IM 上留了个言~

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

谢谢回复,今天抽空改了下,当前是单个item构建索引,查询时取交集(新增了JointIdHolderList)

另外,看了下相关的更新代码,rocksdb索引更新是直接删除旧索引

另外,IM是指哪个IM?

@@ -500,7 +503,14 @@ public void registerResultsFilter(Function<HugeElement, Boolean> filter) {
public static String concatValues(List<Object> values) {
List<Object> newValues = new ArrayList<>(values.size());
for (Object v : values) {
newValues.add(convertNumberIfNeeded(v));
if(v instanceof Collection){
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in what scene that go here with collection v? seem the v in condition contains(v) is not collection type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

collection index 在build index和查询时,可以传入collection

collection index在构建时,会生成collection数据的所有组合
[1,2,3] => [1,2,3,[1,2],[1,3],[2,3],[1,2,3]]

所以建立的索引里,既有具体的每一个item,还包含item的相互组合
这里Collections.sort是确保查询和构建索引时,保持一致的顺序

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

话说我们需要支持查询的时候, 允许传入集合么, 如果照这样设计, 给一个 list / set 属性设置了索引
有近乎指数级的写 + 存索引成本, 似乎代价非常高, 不知道理解有误么

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这样代价确实比较大,方案已更新

valid = value instanceof Set;
valid = valid && this.checkDataType((Set<?>) value);
valid = value instanceof Set && this.checkDataType((Set<?>) value);
valid = valid || this.checkDataType(value);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

before calling checkValueType when query with contains(value), wrap value as collection like: propertyKey.newValue().add(value), so we can keep the common code clear.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里之前修改是针对collection索引字段,查询时可以传single值或者collection,
hugegraph-example/src/main/java/com/baidu/hugegraph/example/TestCollectionIndexExample.java

@@ -500,7 +503,14 @@ public void registerResultsFilter(Function<HugeElement, Boolean> filter) {
public static String concatValues(List<Object> values) {
List<Object> newValues = new ArrayList<>(values.size());
for (Object v : values) {
newValues.add(convertNumberIfNeeded(v));
if(v instanceof Collection){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

话说我们需要支持查询的时候, 允许传入集合么, 如果照这样设计, 给一个 list / set 属性设置了索引
有近乎指数级的写 + 存索引成本, 似乎代价非常高, 不知道理解有误么

Comment on lines +473 to +474
" key " +
"'%s' whose cardinality is list or set",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

combine 2 lines & prefer leave space in the end of previous line

@@ -33,6 +33,7 @@
import java.util.function.Function;
import java.util.stream.Collectors;

import org.apache.commons.lang3.StringUtils;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still useful?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

暂时是有意义的,collection search index 需要

@jadepeng
Copy link
Contributor Author

jadepeng commented Jun 1, 2021

通过讨论,社区版本延用单属性查询,这个pr关闭,另推送一个单属性查询的

@jadepeng jadepeng closed this Jun 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants