Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] DFA exploration: ROC chart not showing for Classification jobs with non default results field #96603

Closed
alvarezmelissa87 opened this issue Apr 8, 2021 · 3 comments · Fixed by #96890
Labels
bug Fixes for quality problems that affect the customer experience Feature:Data Frame Analytics ML data frame analytics features :ml v7.13.0

Comments

@alvarezmelissa87
Copy link
Contributor

Found in latest kibana.

Describe the bug:
When viewing the exploration page of a DFA classification job with results_field set to something other than the default ml , the ROC chart fails to load and shows an error callout.

image

Steps to reproduce:

  1. Use the mushroom dataset to create a Classification job via the DFA wizard
  2. Set dependent variable to edibility
  3. Set results_field to something other than the default value (in my example I set it to "bob")
  4. Open the results view and see that the ROC chart doesn't load

Expected behavior:
ROC chart should load correctly

Errors in browser console (if relevant):
Request sent to _evaluate endpoint:

{
   "index":"mushroom-class-01",
   "query":{
      "bool":{
         "must":[
            
         ]
      }
   },
   "evaluation":{
      "classification":{
         "actual_field":"edibility",
         "predicted_field":"bob.edibility_prediction",
         "metrics":{
            "accuracy":{
               
            },
            "recall":{
               
            },
            "auc_roc":{
               "include_curve":true,
               "class_name":"e"
            }
         }
      }
   }
}

Error message returned from _evaluate endpoint:

{
   "statusCode":400,
   "error":"Bad Request",
   "message":"[status_exception]: No documents found containing all the required fields [edibility, bob.edibility_prediction, ml.top_classes.class_name, ml.top_classes.class_probability]",
   "attributes":{
      "body":{
         "error":{
            "root_cause":[
               {
                  "type":"status_exception",
                  "reason":"No documents found containing all the required fields [edibility, bob.edibility_prediction, ml.top_classes.class_name, ml.top_classes.class_probability]"
               }
            ],
            "type":"status_exception",
            "reason":"No documents found containing all the required fields [edibility, bob.edibility_prediction, ml.top_classes.class_name, ml.top_classes.class_probability]"
         },
         "status":400
      }
   }
}
@alvarezmelissa87 alvarezmelissa87 added the bug Fixes for quality problems that affect the customer experience label Apr 8, 2021
@botelastic botelastic bot added the needs-team Issues missing a team label label Apr 8, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/ml-ui (:ml)

@alvarezmelissa87
Copy link
Contributor Author

@dimitris-athanasiou - I took a look at the evaluate docs and I don't see any specific examples for when results_field is not the default 'ml' value.

Also looks like the request sent is the same for Classification jobs with results_field set to the default and with a non default value. Curious if you have some insight into what we might be missing in sending to _evaluate?

@dimitris-athanasiou
Copy link
Contributor

dimitris-athanasiou commented Apr 9, 2021

For classification, when the results field is different, you also need to provide the path to the top_classes field. As the _evaluate API is decoupled from the job config (in order to allow usage with indices not created with a DFA job), the API doesn't know of the job's results field. Basically, the request should be:

{
   "index":"mushroom-class-01",
   "query":{
      "bool":{
         "must":[
            
         ]
      }
   },
   "evaluation":{
      "classification":{
         "actual_field":"edibility",
         "predicted_field":"bob.edibility_prediction",
         "top_classes_field": "bob.top_classes",
         "metrics":{
            "accuracy":{
               
            },
            "recall":{
               
            },
            "auc_roc":{
               "include_curve":true,
               "class_name":"e"
            }
         }
      }
   }
}

Note that the UI app could always set that to {results_field}.top_classes where results_field is ml when the default results_field is used in order to avoid if-logic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Fixes for quality problems that affect the customer experience Feature:Data Frame Analytics ML data frame analytics features :ml v7.13.0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants