Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESQL: ClassCastException in stats group by field that is missing in some indices #100186

Closed
craigtaverner opened this issue Oct 3, 2023 · 4 comments · Fixed by #100208
Closed
Labels
:Analytics/ES|QL AKA ESQL >bug Team:QL (Deprecated) Meta label for query languages team

Comments

@craigtaverner
Copy link
Contributor

craigtaverner commented Oct 3, 2023

Description

The following query:

FROM logs-* | STATS count=count(user_agent.name) BY user_agent.name

yields this error:

class org.elasticsearch.compute.data.ConstantNullBlock cannot be cast to 
    class org.elasticsearch.compute.data.BytesRefBlock
(org.elasticsearch.compute.data.ConstantNullBlock and org.elasticsearch.compute.data.BytesRefBlock are in unnamed module of loader java.net.FactoryURLClassLoader @15a8cebd)

The server logs claim the exception is thrown on line 56 of BytesRefBlockHash.java.

Data

The data used for this is the elastic/logs track in benchmarks, where many benchmark queries look at all indices, using GET /logs-*/_search, so I expect ESQL queries on logs-* should also work.

Further investigation of the data:

  • Omitting the group by FROM logs-* | STATS count=count(user_agent.name) works and returns a count of 55494, which matches the number of non-null user agent names (a bit over 10%, see below)
  • Omitting the aggregating function FROM logs-* | STATS by user_agent.name returns a different error ValuesSources are mismatched hinting that the issue might relate to different mappings in the different indices.
  • Searching for values of user_agent.name that are not null from logs-* | keep user_agent.name | WHERE user_agent.name IS NOT NULL | LIMIT 20 yields many values like Chrome, Go-http-client, etc.
  • Doing the stats with a non-null predicate also works
FROM logs-* 
| WHERE user_agent.name IS NOT NULL 
| STATS count=count(user_agent.name) BY user_agent.name 
| SORT count DESC

returns:

count,user_agent.name
33896,Other
10697,Go-http-client
6547,Chrome
1245,Firefox
993,Apache-HttpClient
658,Java
547,Safari
440,Opera
122,LINE
105,aws-sdk-nodejs
90,curl
76,IE
33,Edge
17,Zune
9,Chrome Frame
5,WebKit Nightly
5,BountyBot
3,Mobile Safari
3,Slackbot
1,Android
1,Nimbostratus-Bot
1,Chrome Mobile iOS

Counting how many are NULL vs NOT NULL:

  • FROM logs-* | WHERE user_agent.name IS NOT NULL | STATS count(@timestamp) -> 406506
  • FROM logs-* | WHERE user_agent.name IS NULL | STATS count(@timestamp) -> 55494

While of course, trying the same with a group by fails:

  • FROM logs-* | STATS count(@timestamp) BY user_agent.name -> ValuesSources are mismatched

Index mappings

Finally, looking at the mappings files, there are 13 index mappings defined in elastic/logs:

  • 8 files have no user_agent mapping at all
  • 2 files have empty user_agent mappings (I’ve never seen this before!)
  • 3 files have user_agent with sub-field name as keyword

The correctly mapped files contain the name like this:

"name": {
  "ignore_above": 1024,
  "type": "keyword"
}

The empty user_agent mapping looks like this:

"user_agent": {
  "properties": {}
}
@craigtaverner craigtaverner added >bug Team:QL (Deprecated) Meta label for query languages team :Analytics/ES|QL AKA ESQL labels Oct 3, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-ql (Team:QL)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/elasticsearch-esql (:Query Languages/ES|QL)

@craigtaverner craigtaverner changed the title ESQL: ClassCastException in stats group by field that is null in some indices ESQL: ClassCastException in stats group by field that is missing in some indices Oct 3, 2023
@dnhatn
Copy link
Member

dnhatn commented Oct 3, 2023

@craigtaverner Could you please rerun the query with error_trace=true to get a full stracktrace?

@dnhatn
Copy link
Member

dnhatn commented Oct 3, 2023

I've opened #100208

dnhatn added a commit that referenced this issue Oct 3, 2023
We should remove the ConstantNullBlock implementation, but it will take 
some time to do so. This PR ensures that BlockHash handles cases where
all keys are null.

Closes #100186
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL >bug Team:QL (Deprecated) Meta label for query languages team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants