Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dense Vector Feature as a param to a Mustache script score template #465

Open
razilevin opened this issue May 16, 2023 · 2 comments
Open

Comments

@razilevin
Copy link

Trying to use embeddings to compute cosine similarity. The problem I am getting is there no way to pass the embedding as a param to invoke the following feature during logging.

{
    "name": "vector_simularity",
    "params": [
        "embedding"
    ],
    "template_language": "mustache",
    "template": {
        "function_score": {
            "script_score": {
                "script": {
                    "source": "1 + cosineSimilarity(params.query_vector, doc['base_name_vector'])",
                    "params": {
                        "query_vector": "{{#toJson}}embedding{{/toJson}}"
                    }
                }
            }
        }
    }
}

I got the idea to use toJson mustache template from another post which seems to match what I am tying to do #338

I get the following error when running the query

{
  "error": {
    "root_cause": [
      {
        "type": "script_exception",
        "reason": "runtime error",
        "script_stack": [
          "1 + cosineSimilarity(params['query_vector'], doc['base_name_vector'])",
          "                           ^---- HERE"
        ],
        "script": "1 + cosineSimilarity(params['query_vector'], doc['base_name_vector'])",
        "lang": "painless",
        "position": {
          "offset": 27,
          "start": 0,
          "end": 69
        }
      }
    ],
    "type": "search_phase_execution_exception",
    "reason": "all shards failed",
    "phase": "query",
    "grouped": true,
    "failed_shards": [
      {
        "shard": 0,
        "index": "semantic_search",
        "node": "57-FXL1dQwOjKxaOn62-Dw",
        "reason": {
          "type": "script_exception",
          "reason": "runtime error",
          "script_stack": [
            "1 + cosineSimilarity(params.query_vector, doc['base_name_vector'])",
            "                           ^---- HERE"
          ],
          "script": "1 + cosineSimilarity(params.query_vector', doc['base_name_vector'])",
          "lang": "painless",
          "position": {
            "offset": 27,
            "start": 0,
            "end": 69
          },
          "caused_by": {
            "type": "class_cast_exception",
            "reason": "class java.lang.String cannot be cast to class java.util.List (java.lang.String and java.util.List are in module java.base of loader 'bootstrap')"
          }
        }
      }
    ]
  },
  "status": 400
}

Please note a query like the following works as expected

{
  "query": {
    "size": 36,
    "script_score": {
      "query": {
        "match_all": {}
      },
      "script": {
        "source": "cosineSimilarity(params.queryVector, doc['base_name_vector']) + 1.0",
        "params": {
          "queryVector": query_embedding
        }
      }
    }
  }
}
@razilevin
Copy link
Author

Hacked like this to make work. Here is the definition of the feature. Any feedback?

{
                "name": "vector_simularity",
                "params": [
                    "embedding"
                ],
                "template_language": "mustache",
                "template": {
                    "function_score": {
                        "script_score": {
                            "script": {
                                "source": """
                                List parseArrayOfFloats(def aryOfFloats) { 
                                    def x = aryOfFloats.substring(1, aryOfFloats.length() - 1);
                                    def z = new StringTokenizer(x, ",");
                                    def y = new ArrayList();

                                    while(z.hasMoreTokens()) {
                                        y.add(Float.parseFloat((String)z.nextToken()));
                                    }

                                    return y;
                                }

                                return cosineSimilarity(parseArrayOfFloats(params.query_vector), 'base_name_vector') + 1.0;
                                """,
                                "params": {
                                    "query_vector": "{{#toJson}}embedding{{/toJson}}"
                                }
                            }
                        }
                    }
                }
            }

@jhinch-at-atlassian-com

I believe that the problem is that the original query is not structured correctly. The template can be a deeply nested query, or it can be a string. In order to have the toJson work correctly, it needs to be a string:

{
    "name": "vector_simularity",
    "params": [
        "embedding"
    ],
    "template_language": "mustache",
    "template": "{\"function_score\": {\"script_score\": {\"script\": {\"source\": \"1 + cosineSimilarity(params.query_vector, doc['base_name_vector'])\", \"params\": {\"query_vector\": {{#toJson}}embedding{{/toJson}}}}}}"
    }
}

Note that all the " are escaped and the {{#toJson}}embedding{{/toJson}} is not enclosed in quotes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants