Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement billion scale data #612

Merged
merged 15 commits into from
Sep 24, 2020
Merged

Conversation

kmrmt
Copy link
Contributor

@kmrmt kmrmt commented Aug 5, 2020

Description:

implement billion scale dataset loader and download job.
If you have a good package name, please suggest that.

Related Issue:

Nothing.

How Has This Been Tested?:

There is not test but benchmark.

Environment:

  • Go Version: 1.14.4
  • Docker Version: 19.03.8
  • Kubernetes Version: 1.18.2
  • NGT Version: 1.12.0

Types of changes:

  • Bug fix [type/bug]
  • New feature [type/feature]
  • Add tests [type/test]
  • Security related changes [type/security]
  • Add documents [type/documentation]
  • Refactoring [type/refactoring]
  • Update dependencies [type/dependency]
  • Update benchmarks and performances [type/bench]
  • Update CI [type/ci]

Changes to Core Features:

  • Have you added an explanation of what your changes do and why you'd like us to include them?
  • Have you written new tests for your core changes, as applicable?
  • Have you successfully ran tests with your changes locally?

Checklist:

  • I have read the CONTRIBUTING document.
  • I have checked open Pull Requests for the similar feature or fixes?
  • I have added tests and benchmarks to cover my changes.
  • I have ensured all new and existing tests passed.
  • I have commented my code, particularly in hard-to-understand areas
  • I have updated the documentation accordingly.

@pull-assistant
Copy link

pull-assistant bot commented Aug 5, 2020

Score: 0.95

Best reviewed: commit by commit


Optimal code review plan (2 warnings)

implement billion scale data loader

...ssets/x1b/loader_test_bench.go 44% changes removed in fix benchmark

     fix dataset to datasets

implement billion scale loader

.../e2e/strategy/stream_remove.go 50% changes removed in fix benchmark timer

     fix benchmark timer

     fix error handling

     fix benchmark timers

     🤖 Update license headers / Format go codes and yaml files

     add x1b for docker build

     regenerate tests

     regenerate tests

     fix generated test

     by reviewdog

     by comments

     stop benchmark if error is occurred

     fix benchmark

Powered by Pull Assistant. Last update c3e2cb3 ... 7a57e13. Read the comment docs.

@vdaas-ci
Copy link
Collaborator

vdaas-ci commented Aug 5, 2020

[CHATOPS:HELP] ChatOps commands.

  • 🙆‍♀️ /approve - approve
  • 💌 /changelog - add changelog comment
  • 🍱 /format - format codes and add licenses
  • /gen-test - generate test codes
  • 🏷️ /label - add labels
  • /rebase - rebase master

@codecov
Copy link

codecov bot commented Aug 5, 2020

Codecov Report

Merging #612 into master will decrease coverage by 0.20%.
The diff coverage is 0.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #612      +/-   ##
==========================================
- Coverage   14.53%   14.32%   -0.21%     
==========================================
  Files         417      419       +2     
  Lines       19432    19420      -12     
==========================================
- Hits         2824     2782      -42     
- Misses      16371    16404      +33     
+ Partials      237      234       -3     
Impacted Files Coverage Δ
hack/benchmark/assets/x1b/loader.go 0.00% <0.00%> (ø)
...k/benchmark/core/benchmark/strategy/bulk_insert.go 0.00% <0.00%> (ø)
...mark/core/benchmark/strategy/bulk_insert_commit.go 0.00% <0.00%> (ø)
hack/benchmark/core/benchmark/strategy/insert.go 0.00% <0.00%> (ø)
...benchmark/core/benchmark/strategy/insert_commit.go 0.00% <0.00%> (ø)
hack/benchmark/core/benchmark/strategy/search.go 0.00% <0.00%> (ø)
hack/benchmark/core/benchmark/strategy/util.go 0.00% <0.00%> (ø)
hack/benchmark/internal/e2e/strategy/insert.go 0.00% <0.00%> (ø)
hack/benchmark/internal/e2e/strategy/remove.go 0.00% <0.00%> (ø)
hack/benchmark/internal/e2e/strategy/search.go 0.00% <0.00%> (ø)
... and 15 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update dd60c10...7a57e13. Read the comment docs.

@kmrmt kmrmt requested a review from rinx August 6, 2020 05:08
Makefile.d/bench.mk Outdated Show resolved Hide resolved
rinx
rinx previously approved these changes Aug 11, 2020
Copy link
Contributor

@rinx rinx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@kmrmt kmrmt marked this pull request as draft August 14, 2020 05:16
@kmrmt kmrmt force-pushed the feature/benchmark/billion_scale_dataset branch from d7955c5 to c79fddd Compare August 21, 2020 04:53
@github-actions github-actions bot added team/set SET team size/XXL and removed size/L labels Aug 21, 2020
@kmrmt
Copy link
Contributor Author

kmrmt commented Aug 21, 2020

@hlts2 I want your advice how way to fix e2e benchmark.
I fixed Dataset interface to have same one small dataset and large dataset.
In this case, it is difficult for me to fix the bulk_insert benchmark.

@kmrmt kmrmt marked this pull request as ready for review August 24, 2020 08:42
@kmrmt kmrmt requested a review from hlts2 August 25, 2020 04:08
hack/benchmark/assets/x1b/loader_test_bench.go Outdated Show resolved Hide resolved
hack/benchmark/assets/x1b/loader_test_bench.go Outdated Show resolved Hide resolved
hack/benchmark/assets/x1b/loader.go Outdated Show resolved Hide resolved
hack/benchmark/assets/x1b/loader_test_bench.go Outdated Show resolved Hide resolved
hack/benchmark/core/benchmark/strategy/bulk_insert.go Outdated Show resolved Hide resolved
@kmrmt
Copy link
Contributor Author

kmrmt commented Aug 27, 2020

/rebase
/format

@vdaas-ci
Copy link
Collaborator

[REBASE] Rebase triggered by kmrmt for branch: feature/benchmark/billion_scale_dataset

@vdaas-ci vdaas-ci force-pushed the feature/benchmark/billion_scale_dataset branch from ef8733d to 9655fcd Compare August 27, 2020 07:12
@vdaas-ci
Copy link
Collaborator

[FORMAT] Updating license headers and formatting go codes triggered by kmrmt.

@kmrmt kmrmt requested a review from hlts2 August 27, 2020 08:00
@vdaas-ci vdaas-ci force-pushed the feature/benchmark/billion_scale_dataset branch from c8c1198 to 0023a92 Compare September 24, 2020 05:46
@vdaas-ci
Copy link
Collaborator

[FORMAT] Updating license headers and formatting go codes triggered by vankichi.

vdaas-ci
vdaas-ci previously approved these changes Sep 24, 2020
Copy link
Collaborator

@vdaas-ci vdaas-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[APPROVED] This PR is approved by vankichi.

@vankichi
Copy link
Contributor

/rebase
/format
/approve

@vdaas-ci
Copy link
Collaborator

[REBASE] Rebase triggered by vankichi for branch: feature/benchmark/billion_scale_dataset

kmrmt and others added 15 commits September 24, 2020 05:51
Signed-off-by: Kosuke Morimoto <[email protected]>
Signed-off-by: Kosuke Morimoto <[email protected]>
Signed-off-by: Kosuke Morimoto <[email protected]>
Signed-off-by: Kosuke Morimoto <[email protected]>
Signed-off-by: Kosuke Morimoto <[email protected]>
Signed-off-by: Kosuke Morimoto <[email protected]>
Signed-off-by: Kosuke Morimoto <[email protected]>
Signed-off-by: Kosuke Morimoto <[email protected]>
Signed-off-by: Kosuke Morimoto <[email protected]>
Signed-off-by: Kosuke Morimoto <[email protected]>
Signed-off-by: Kosuke Morimoto <[email protected]>
Signed-off-by: Kosuke Morimoto <[email protected]>
@vdaas-ci vdaas-ci force-pushed the feature/benchmark/billion_scale_dataset branch from 0023a92 to 7a57e13 Compare September 24, 2020 05:51
@vdaas-ci
Copy link
Collaborator

[FORMAT] Updating license headers and formatting go codes triggered by vankichi.

Copy link
Collaborator

@vdaas-ci vdaas-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[APPROVED] This PR is approved by vankichi.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remaining comments which cannot be posted as a review comment to avoid GitHub Rate Limit

golangci

pkg/tools/cli/loadtest/assets/large_dataset_test.go|1024| pkg/tools/cli/loadtest/assets/large_dataset_test.go:1024: Line contains TODO/BUG/FIXME: "TODO test cases" (godox)
pkg/tools/cli/loadtest/assets/large_dataset_test.go|1098| pkg/tools/cli/loadtest/assets/large_dataset_test.go:1098: Line contains TODO/BUG/FIXME: "TODO test cases" (godox)
pkg/tools/cli/loadtest/assets/large_dataset_test.go|1114| pkg/tools/cli/loadtest/assets/large_dataset_test.go:1114: Line contains TODO/BUG/FIXME: "TODO test cases" (godox)
pkg/tools/cli/loadtest/assets/large_dataset_test.go|1188| pkg/tools/cli/loadtest/assets/large_dataset_test.go:1188: Line contains TODO/BUG/FIXME: "TODO test cases" (godox)
pkg/tools/cli/loadtest/assets/large_dataset_test.go|1204| pkg/tools/cli/loadtest/assets/large_dataset_test.go:1204: Line contains TODO/BUG/FIXME: "TODO test cases" (godox)
pkg/tools/cli/loadtest/assets/small_dataset_test.go|51| pkg/tools/cli/loadtest/assets/small_dataset_test.go:51: Line contains TODO/BUG/FIXME: "TODO test cases" (godox)
pkg/tools/cli/loadtest/assets/small_dataset_test.go|66| pkg/tools/cli/loadtest/assets/small_dataset_test.go:66: Line contains TODO/BUG/FIXME: "TODO test cases" (godox)
pkg/tools/cli/loadtest/assets/small_dataset_test.go|128| pkg/tools/cli/loadtest/assets/small_dataset_test.go:128: Line contains TODO/BUG/FIXME: "TODO test cases" (godox)
pkg/tools/cli/loadtest/assets/small_dataset_test.go|140| pkg/tools/cli/loadtest/assets/small_dataset_test.go:140: Line contains TODO/BUG/FIXME: "TODO test cases" (godox)
pkg/tools/cli/loadtest/assets/dataset.go|28| File is not gofumpt-ed (gofumpt)
pkg/tools/cli/loadtest/assets/dataset.go|29 col 2| exported var ErrOutOfBounds should have comment or be unexported (golint)
pkg/tools/cli/loadtest/assets/large_dataset.go|1 col 1| package comment should be of the form "Package assets ..." (golint)
pkg/tools/cli/loadtest/assets/large_dataset.go|54 col 27| error strings should not be capitalized or end with punctuation or a newline (golint)
pkg/tools/cli/loadtest/assets/small_dataset.go|90 col 19| G404: Use of weak random number generator (math/rand instead of crypto/rand) (gosec)
pkg/tools/cli/loadtest/assets/small_dataset.go|91 col 19| G404: Use of weak random number generator (math/rand instead of crypto/rand) (gosec)
pkg/tools/cli/loadtest/assets/hdf5_loader_test.go|300| line is 155 characters (lll)
pkg/tools/cli/loadtest/assets/dataset.go|89 col 2| return statements should not be cuddled if block has more than two lines (wsl)
pkg/tools/cli/loadtest/assets/large_dataset.go|67 col 4| only one cuddle assignment allowed before range statement (wsl)
pkg/tools/cli/loadtest/assets/large_dataset.go|70 col 4| append only allowed to cuddle with appended value (wsl)
pkg/tools/cli/loadtest/assets/large_dataset.go|61 col 3| for statement without condition should never be cuddled (wsl)
pkg/tools/cli/loadtest/assets/small_dataset.go|89 col 4| only one cuddle assignment allowed before range statement (wsl)
pkg/tools/cli/loadtest/assets/small_dataset.go|86 col 3| only one cuddle assignment allowed before range statement (wsl)
pkg/tools/cli/loadtest/assets/large_dataset.go|63 col 7| err113: do not compare errors directly, use errors.Is() instead: "err == ErrOutOfBounds" (goerr113)
pkg/tools/cli/loadtest/assets/hdf5_loader_test.go|16 col 9| package should be assets_test instead of assets (testpackage)
pkg/tools/cli/loadtest/assets/large_dataset_test.go|16 col 9| package should be assets_test instead of assets (testpackage)

@vankichi vankichi merged commit b96c5d0 into master Sep 24, 2020
@vankichi vankichi deleted the feature/benchmark/billion_scale_dataset branch September 24, 2020 06:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants