Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add stats for radial search #1684

Merged
merged 5 commits into from
May 14, 2024
Merged

Conversation

junqiu-lei
Copy link
Member

@junqiu-lei junqiu-lei commented May 2, 2024

Description

Add radial search stats into k-NN plugin stats api, the new stats have:

  • max_distance_query_requests
  • min_score_query_requests
  • max_distance_query_with_filter_requests
  • min_score_query_with_filter_requests

Stats API example

GET /_plugins/_knn/stats HTTP/1.1
Host: localhost:9200
Response details
{
    "_nodes": {
        "total": 1,
        "successful": 1,
        "failed": 0
    },
    "cluster_name": "integTest",
    "circuit_breaker_triggered": false,
    "model_index_status": null,
    "nodes": {
        "nwOR9EKUQEGWmVhT56gDlg": {
            "max_distance_query_with_filter_requests": 3,
            "graph_memory_usage_percentage": 1.211015E-5,
            "graph_query_requests": 12,
            "graph_memory_usage": 2,
            "cache_capacity_reached": false,
            "load_success_count": 1,
            "training_memory_usage": 0,
            "indices_in_cache": {
                "knn-index-test": {
                    "graph_memory_usage_percentage": 1.211015E-5,
                    "graph_memory_usage": 2,
                    "graph_count": 1
                }
            },
            "script_query_errors": 0,
            "hit_count": 11,
            "knn_query_requests": 0,
            "total_load_time": 4045208,
            "miss_count": 1,
            "min_score_query_requests": 6,
            "knn_query_with_filter_requests": 0,
            "training_memory_usage_percentage": 0.0,
            "lucene_initialized": false,
            "max_distance_query_requests": 7,
            "graph_index_requests": 1,
            "faiss_initialized": true,
            "load_exception_count": 0,
            "training_errors": 0,
            "min_score_query_with_filter_requests": 1,
            "eviction_count": 0,
            "nmslib_initialized": false,
            "script_compilations": 0,
            "script_query_requests": 0,
            "graph_stats": {
                "merge": {
                    "current": 0,
                    "total": 0,
                    "total_time_in_millis": 0,
                    "current_docs": 0,
                    "total_docs": 0,
                    "total_size_in_bytes": 0,
                    "current_size_in_bytes": 0
                },
                "refresh": {
                    "total": 1,
                    "total_time_in_millis": 31
                }
            },
            "graph_query_errors": 0,
            "indexing_from_model_degraded": false,
            "graph_index_errors": 0,
            "training_requests": 0,
            "script_compilation_errors": 0
        }
    }
}

Issues Resolved

[List any issues this PR will resolve]

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed as per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@@ -566,4 +566,23 @@ private static void validateSingleQueryType(Integer k, Float distance, Float sco
throw new IllegalArgumentException(String.format("[%s] requires exactly one of k, distance or score to be set", NAME));
}
}

private static void updateQueryStats(Integer k, Float minScore, Float maxDistance, QueryBuilder filter) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you can clean this up a bit more to reduce if else. every parameter similar to k or maxDistance we add, will add to the if else statements.

validateSingleQuery()
updateQueryStats(k, ...)
updateQueryStats(minScore, ...)
updateQueryStats(maxDistance, ...)
static <T> void updateQueryStats(T queyParam, QueryBuilder filter, queryCounter, filtercounter) {
if (queyParam != null) {
            queryCounter.increment();
            if (filter != null) {
                filtercounter.increment();
            }
        }
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @shatejas, this suggestion could reduce the function code itself, but looks like we still need extra multiple time calls at same place. Beside the k is Integer type, maxDistance and minScore are Float type, we couldn't directly use parameter T queyParam for all of them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@junqiu-lei Why not use Object?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VijayanB I don't see obviously benefit to introduce Object for this internal used function. Besides if we introduce the Object, we anyway need the condition check for which query type and query counter to use somewhere else.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@junqiu-lei yeah extra calls is sometimes a good trade off for maintainability. the cost here shouldn't be high but correct me if I am missing something

@VijayanB On the object suggestion, a few thoughts - The method itself is doing quite a few things - parsing, building query builder, validating and adding stats. Its generally an indication to break it down. What we can actually do is start building query builder while parsing and once we build the KNNQueryBuilder we can validate the entire object as a whole and then update the stats for it as a whole, that way you don't need an extra object.

psuedo code to better understand

KnnQueryBuilder querybuilder = parseAndBuild(XContentparser) //Note there are no validations here
validate(querybuilder); //This throws if any of the validations fail
updateStats(querybuilder); //update stats

Having said this, its too much of a refactor so I wouldn't want to club it with this PR and can be punted till absolutely needed

Copy link
Member Author

@junqiu-lei junqiu-lei May 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just had offline sync with @shatejas, I've updated this method to use generic parameter for queryParam

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check my comment: #1684 (comment)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated PR on your comment @navneet1v

@junqiu-lei junqiu-lei requested review from shatejas and navneet1v May 9, 2024 19:36
@junqiu-lei junqiu-lei requested a review from navneet1v May 9, 2024 21:49
@junqiu-lei junqiu-lei requested a review from navneet1v May 10, 2024 02:12
Signed-off-by: Junqiu Lei <[email protected]>
Copy link
Collaborator

@navneet1v navneet1v left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor comment but overall looks good to me. Great job on abstracting the QueryType.

@junqiu-lei junqiu-lei merged commit 9a52b2b into opensearch-project:main May 14, 2024
48 of 50 checks passed
@junqiu-lei junqiu-lei deleted the radial-stat branch May 14, 2024 00:09
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.x 2.x
# Navigate to the new working tree
cd .worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-1684-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 9a52b2bcd4d7e0a05368d8d689b50971f44c6489
# Push it to GitHub
git push --set-upstream origin backport/backport-1684-to-2.x
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-1684-to-2.x.

junqiu-lei added a commit to junqiu-lei/k-NN that referenced this pull request May 14, 2024
Signed-off-by: Junqiu Lei <[email protected]>

(cherry picked from commit 9a52b2b)
Signed-off-by: Junqiu Lei <[email protected]>
junqiu-lei added a commit that referenced this pull request May 14, 2024
(cherry picked from commit 9a52b2b)

Signed-off-by: Junqiu Lei <[email protected]>
luyuncheng pushed a commit to luyuncheng/k-NN-1 that referenced this pull request May 22, 2024
luyuncheng pushed a commit to luyuncheng/k-NN-1 that referenced this pull request May 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants