-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Feature Filtering in Model Validation #1258
Conversation
df241c4
to
be78b86
Compare
Signed-off-by: Kaituo Li <[email protected]>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1258 +/- ##
============================================
+ Coverage 71.83% 77.05% +5.22%
- Complexity 4898 5287 +389
============================================
Files 518 517 -1
Lines 22879 22890 +11
Branches 2245 2239 -6
============================================
+ Hits 16434 17639 +1205
+ Misses 5410 4216 -1194
Partials 1035 1035
Flags with carried forward coverage won't be shown. Click here to find out more.
|
WhiteSource Security Check is failing. Same for other repo (e.g., KNN) as well. Error: Oops! An error occurred while running the Security Check.The Scanner was unable to connect to the repository or branch and clone it. Reached out to infra team to get this resolved. |
Signed-off-by: Kaituo Li <[email protected]> (cherry picked from commit 63dacaa) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Signed-off-by: Kaituo Li <[email protected]>
* Consider feature filter in model validation (#1258) Signed-off-by: Kaituo Li <[email protected]> * follow opensearch-project/k-NN#1795 Signed-off-by: Kaituo Li <[email protected]> --------- Signed-off-by: Kaituo Li <[email protected]>
Description
Previously, we used date histogram aggregation for feature sparsity validation and interval recommendation. But it is possible a feature itself has filters besides the detector's filter. This can cause incorrect interval recommendations. We may fail to root cause feature definition itself caused data sparsity. This PR improves feature sparsity validation and interval recommendation by incorporating feature aggregation. This prevents incorrect interval recommendations and helps identify if feature definitions are causing data sparsity. The same date range aggregation used in the cold start is now employed to retrieve data and verify sparsity.
Detailed Changes
Testing done:
Issues Resolved
List any issues this PR will resolve, e.g. Closes [...].
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.