Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GrokTests unit tests fail with "Unable to find pattern [] in Grok's pattern dictionary" #43673

Closed
alpar-t opened this issue Jun 27, 2019 · 16 comments · Fixed by #48346
Closed
Assignees
Labels
:Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP Team:Data Management Meta label for data/management team >test-failure Triaged test failures from CI v8.0.0-alpha1

Comments

@alpar-t
Copy link
Contributor

alpar-t commented Jun 27, 2019

Example build failure

https://scans.gradle.com/s/iepue6nvnpjco/tests/vmr3zh2fll5n6-jtrpes23gv6dm

Reproduction line

does not reproduce locally

./gradlew :libs:elasticsearch-grok:test --tests "org.elasticsearch.grok.GrokTests.testDatePattern" -Dtests.seed=7783EF4A23023948 -Dtests.security.manager=true -Dtests.locale=fur -Dtests.timezone=Europe/Kiev -Dcompiler.java=12 -Druntime.java=12
HomeTestsGrokTests » testDatePatternClose test output (esc)

Example relevant log:

java.lang.IllegalArgumentException: Unable to find pattern [] in Grok's pattern dictionary

Frequency

reproduces ~ 6 times in the past 30 days

/cc @talevy

@alpar-t alpar-t added :Search/Search Search-related issues that do not fall into other categories >test-failure Triaged test failures from CI v8.0.0 labels Jun 27, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search

@jimczi jimczi added :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP and removed :Search/Search Search-related issues that do not fall into other categories labels Jul 1, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features

@jimczi
Copy link
Contributor

jimczi commented Jul 1, 2019

It is related to the ingest grok processor, @talevy can you take a look and reassign if needed ?

@talevy
Copy link
Contributor

talevy commented Jul 1, 2019

I investigated this issue in the past and wasn't sure how to resolve it. I'll take a look.

@cbuescher cbuescher assigned cbuescher and talevy and unassigned cbuescher Jul 8, 2019
@andrershov andrershov changed the title org.elasticsearch.grok.GrokTests.testDatePattern unit test fails GrokTests unit tests fail with "Unable to find pattern [] in Grok's pattern dictionary" Aug 8, 2019
@andrershov
Copy link
Contributor

@jaymode
Copy link
Member

jaymode commented Sep 20, 2019

I hit this locally on master but wasn't able to reproduce.

java.lang.IllegalArgumentException: Unable to find pattern [] in Grok's pattern dictionary
at __randomizedtesting.SeedInfo.seed([1186E7D07D363562:26D61FAD77849B19]:0)
at org.elasticsearch.grok.Grok.toRegex(Grok.java:193)
at org.elasticsearch.grok.Grok.<init>(Grok.java:104)
at org.elasticsearch.grok.Grok.<init>(Grok.java:82)
at org.elasticsearch.grok.GrokTests.testApacheLog(GrokTests.java:315)
at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:567)
at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1750)
at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:938)
at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:974)
at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:988)
at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817)
at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468)
at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:947)
at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:832)
at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:883)
at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:894)
at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54)
at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
at java.lang.Thread.run(Thread.java:835)

REPRODUCE WITH: ./gradlew ':libs:elasticsearch-grok:test' --tests "org.elasticsearch.grok.GrokTests.testApacheLog" -Dtests.seed=1186E7D07D363562 -Dtests.security.manager=true -Dtests.locale=ml -Dtests.timezone=Europe/Berlin -Dcompiler.java=12 -Druntime.java=12

@talevy
Copy link
Contributor

talevy commented Sep 20, 2019

ugh sorry folks. I'll get back to this. I still have no clue how this is happening

@martijnvg
Copy link
Member

A while ago jcodings was upgraded (#43334), but joni wasn't. I've opened a pr to upgrade joni (#47374). This may not be causing the grok failures, but it isn't ideal either. The version jcodings (1.0.44) was upgraded to wasn't the version joni (2.1.6) was build against. The 2.1.29 joni version is build against jcodings `1.0.44.

@talevy
Copy link
Contributor

talevy commented Oct 7, 2019

thanks Jack, I was hoping some code I changed would expose the error here, but it doesn't look like it

@dnhatn
Copy link
Member

dnhatn commented Oct 12, 2019

martijnvg added a commit to martijnvg/elasticsearch that referenced this issue Oct 21, 2019
Add more information to the error message to figure out what is going on.

Relates to elastic#43673
@martijnvg
Copy link
Member

Looks like this failure is caused by the fact that the thread that executes this test suite is interrupted. The Matcher#searchInterruptible(...) returns -2 in toRegex(...) method and we happily execute the if statement after that, which causes the exception. I think we should only execute that if statement if result variable is 0 or higher.

I found this, because I opened #48284, that adds more info to an exception message.

I think that this failure happens, because the test framework is reusing threads and perhaps the interrupted status isn't reset, which causes this test to fail (joni checks whether a thread is interrupted and then stops searching for matches). I think we need to check the current thread's interrupted status before executing these tests and unset the interrupted status.

martijnvg added a commit to martijnvg/elasticsearch that referenced this issue Oct 22, 2019
There is a watchdog in order to avoid long running (and expensive)
grok expressions. Currently the watchdog is thread based, threads
that run grok expressions are registered and after completion unregister.
If these threads stay registered for too long then the watch dog interrupts
these threads. Joni (the library that powers grok expressions) has a
mechanism that checks whether the current thread is interrupted and
if so abort the pattern matching.

Newer versions have an additional method to abort long running pattern
matching inside joni. Instead of checking the thread's interrupted flag,
joni now also checks a volatile field that can be set via a `Matcher`
instance. This is more efficient method for aborting long running matches.
(joni checks each 30k iterations whether interrupted flag is set vs.
just checking a volatile field)

Recently we upgraded to a recent joni version (elastic#47374), and this PR
is a followup of that PR.

This change should also fix elastic#43673, since it appears when unit tests
are ran the a test runner thread's interrupted flag may already have
been set, due to some thread reuse.
martijnvg added a commit that referenced this issue Oct 24, 2019
…48346)

There is a watchdog in order to avoid long running (and expensive)
grok expressions. Currently the watchdog is thread based, threads
that run grok expressions are registered and after completion unregister.
If these threads stay registered for too long then the watch dog interrupts
these threads. Joni (the library that powers grok expressions) has a
mechanism that checks whether the current thread is interrupted and
if so abort the pattern matching.

Newer versions have an additional method to abort long running pattern
matching inside joni. Instead of checking the thread's interrupted flag,
joni now also checks a volatile field that can be set via a `Matcher`
instance. This is more efficient method for aborting long running matches.
(joni checks each 30k iterations whether interrupted flag is set vs.
just checking a volatile field)

Recently we upgraded to a recent joni version (#47374), and this PR
is a followup of that PR.

This change should also fix #43673, since it appears when unit tests
are ran the a test runner thread's interrupted flag may already have
been set, due to some thread reuse.
martijnvg added a commit that referenced this issue Oct 24, 2019
…48346)

There is a watchdog in order to avoid long running (and expensive)
grok expressions. Currently the watchdog is thread based, threads
that run grok expressions are registered and after completion unregister.
If these threads stay registered for too long then the watch dog interrupts
these threads. Joni (the library that powers grok expressions) has a
mechanism that checks whether the current thread is interrupted and
if so abort the pattern matching.

Newer versions have an additional method to abort long running pattern
matching inside joni. Instead of checking the thread's interrupted flag,
joni now also checks a volatile field that can be set via a `Matcher`
instance. This is more efficient method for aborting long running matches.
(joni checks each 30k iterations whether interrupted flag is set vs.
just checking a volatile field)

Recently we upgraded to a recent joni version (#47374), and this PR
is a followup of that PR.

This change should also fix #43673, since it appears when unit tests
are ran the a test runner thread's interrupted flag may already have
been set, due to some thread reuse.
@droberts195
Copy link
Contributor

This problem occurred in the 7.5 branch in https://gradle-enterprise.elastic.co/s/rck3jol2wuye6

Is it worth backporting the fix to 7.5?

tlrx added a commit that referenced this issue Feb 6, 2020
@tlrx
Copy link
Member

tlrx commented Feb 6, 2020

Tests are failing regularly on 7.5 branch so I muted the whole GrokTests class that branch in a508b10.

Here is the build scan of today's failure (https://gradle-enterprise.elastic.co/s/y4nfudrjljl3s) but you can find ~10 for the last 30 days in the build stats.

Also, these tests are timing out regularly too in both 7.5 and 7.5 branches (builds scans:https://gradle-enterprise.elastic.co/s/mhly7mq3el6vy, https://gradle-enterprise.elastic.co/s/2xdryigg2jsn6)

@tlrx tlrx reopened this Feb 6, 2020
@rjernst rjernst added the Team:Data Management Meta label for data/management team label May 4, 2020
@probakowski
Copy link
Contributor

I'm going to close it as there are no failures since February and we are done with 7.5 branch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP Team:Data Management Meta label for data/management team >test-failure Triaged test failures from CI v8.0.0-alpha1
Projects
None yet
Development

Successfully merging a pull request may close this issue.