Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] specifying missing_field_value value and using it instead of empty_string #53108

Conversation

benwtrent
Copy link
Member

@benwtrent benwtrent commented Mar 4, 2020

For analytics, we need a consistent way of indicating when a value is missing. Inheriting from anomaly detection, analysis sent "" when a field is missing. This works fine with numbers, but the underlying analytics process actually treats "" as a category in categorical values.

Consequently, you end up with this situation in the resulting model

{
              "frequency_encoding" : {
                "field" : "RainToday",
                "feature_name" : "RainToday_frequency",
                "frequency_map" : {
                  "" : 0.009844409027270245,
                  "No" : 0.6472019970785184,
                  "Yes" : 0.6472019970785184
                }
              }
            }

For inference this is a problem, because inference will treat missing values as null. And thus not include them on the infer call against the model.

This PR takes advantage of our new missing_field_value option added in elastic/ml-cpp#1034 and supplies the default value, \0.

@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (:ml)

Copy link
Contributor

@dimitris-athanasiou dimitris-athanasiou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@benwtrent benwtrent merged commit f66fbb4 into elastic:master Mar 5, 2020
@benwtrent benwtrent deleted the feature/ml-analytics-supply-null-value-option branch March 5, 2020 13:35
benwtrent added a commit to benwtrent/elasticsearch that referenced this pull request Mar 5, 2020
…ty_string (elastic#53108)

For analytics, we need a consistent way of indicating when a value is missing. Inheriting from anomaly detection, analysis sent `""` when a field is missing. This works fine with numbers, but the underlying analytics process actually treats `""` as a category in categorical values. 

Consequently, you end up with this situation in the resulting model
```
{
              "frequency_encoding" : {
                "field" : "RainToday",
                "feature_name" : "RainToday_frequency",
                "frequency_map" : {
                  "" : 0.009844409027270245,
                  "No" : 0.6472019970785184,
                  "Yes" : 0.6472019970785184
                }
              }
            }
```
For inference this is a problem, because inference will treat missing values as `null`. And thus not include them on the infer call against the model.

This PR takes advantage of our new `missing_field_value` option and supplies `\0` as the value.
benwtrent added a commit that referenced this pull request Mar 5, 2020
…ty_string (#53108) (#53165)

For analytics, we need a consistent way of indicating when a value is missing. Inheriting from anomaly detection, analysis sent `""` when a field is missing. This works fine with numbers, but the underlying analytics process actually treats `""` as a category in categorical values. 

Consequently, you end up with this situation in the resulting model
```
{
              "frequency_encoding" : {
                "field" : "RainToday",
                "feature_name" : "RainToday_frequency",
                "frequency_map" : {
                  "" : 0.009844409027270245,
                  "No" : 0.6472019970785184,
                  "Yes" : 0.6472019970785184
                }
              }
            }
```
For inference this is a problem, because inference will treat missing values as `null`. And thus not include them on the infer call against the model.

This PR takes advantage of our new `missing_field_value` option and supplies `\0` as the value.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants