[code_search] Compute an interpretable loss/quality metric #254

jlewi · 2018-09-28T19:16:44Z

We need an evaluation metric that gives us some qualitative sense of how well a model is performing.

From a performance what we care about is whether a search query is correctly mapped to the code that goes with that query. So for the test/evaluation we can compute the number of correctly matched and incorrectly matched examples.

Given a training example (Qi, Ci) where Qi is the query and Ci is the code that matches it, the example is correctly classified if

distance(Qi, Ci) <= (Qi, Cj) for j not equal to i for some set of code examples

Related to #239 Train a high quality model

cwbeitel · 2018-09-28T19:30:29Z

So with this metric you're taking the mean distance to all the non-matching examples (that you sample) and asking wither the distance to the matching example is much less? That's a good measure. You can also relate that to the distance you plan to use when looking up queries. Also maybe typo, I think you meant distance(Qi, Ci).

jlewi · 2018-09-28T22:22:26Z

Yup its a typo.

jlewi · 2018-09-29T01:56:01Z

I think we can use our existing inference structure to compute this.

We already compute the embeddings for all the code examples
- These should be available both in nmslib and BQ
We have TFServing to compute the embeddings of the search code
We use nmslib to look up the nearest neighbor

So we need to make the following changes

Use nmslib to return K most similar docs and feature embeddings
Lookup actual embedding for the code that goes with the search query
Compute distance between the query embedding and the actual code embedding
If we do the above as an RPC, we can write a beam job to send a bunch of these requests and write results to bigquery for analysis

stale · 2019-06-27T03:50:09Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

jlewi added the area/example/code_search The code search example label Sep 28, 2018

jlewi mentioned this issue Sep 28, 2018

[code_search] Train a high quality model #239

Closed

jlewi added the priority/p1 label Nov 30, 2018

stale bot added the lifecycle/stale label Jun 27, 2019

stale bot closed this as completed Jul 4, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[code_search] Compute an interpretable loss/quality metric #254

[code_search] Compute an interpretable loss/quality metric #254

jlewi commented Sep 28, 2018 •

edited

Loading

cwbeitel commented Sep 28, 2018

jlewi commented Sep 28, 2018

jlewi commented Sep 29, 2018

stale bot commented Jun 27, 2019

[code_search] Compute an interpretable loss/quality metric #254

[code_search] Compute an interpretable loss/quality metric #254

Comments

jlewi commented Sep 28, 2018 • edited Loading

cwbeitel commented Sep 28, 2018

jlewi commented Sep 28, 2018

jlewi commented Sep 29, 2018

stale bot commented Jun 27, 2019

jlewi commented Sep 28, 2018 •

edited

Loading