-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] fix case sensitivity for wildcard queries #5462
Conversation
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
@@ -162,7 +186,7 @@ public Query wildcardQuery(String value, MultiTermQuery.RewriteMethod method, bo | |||
} | |||
|
|||
Term term; | |||
if (getTextSearchInfo().getSearchAnalyzer() != null) { | |||
if (getTextSearchInfo().getSearchAnalyzer() != null && normalizeIfAnalyzed) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nknize I think we still do have a conflict here: it makes sense to apply the caseInsensitive
hint when user does not specify the analyzer to be used (in this case getTextSearchInfo().getSearchAnalyzer()
is the default one), but I think we should not do that if the analyzer
or/and analyze_wildcard
are provided
[1] https://www.elastic.co/guide/en/elasticsearch/reference/7.10/query-dsl-query-string-query.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's why QueryStringQueryParser calls normalizedWildcardQuery
method. It ignores the caseInsensitive
parameter for query string queries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that was caused the confusion to me: we already have 2 variants of wildcardQuery
method + now a new one called normalizedWildcardQuery
, could we drop normalizedWildcardQuery
since this is just delegate to wildcardQuery
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We sure can.. it's just syntactic sugar. I just thought it was a more descriptive call in the QueryStringQueryParser
than calling wildcardQuery(value, method, false, true, context)
, but I can throw javadocs there to describe the logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah wait.. nvm..I did it this way because the delegation for normalizedWildcardQuery is specific to StringFieldType
only... It's not the same for keyword
and constant
fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is non issue: since you have this new method
public Query wildcardQuery(
String value,
MultiTermQuery.RewriteMethod method,
boolean caseInsensitive,
boolean normalizeIfAnalyzed,
QueryShardContext context
)
only StringFieldType
could implement it, others won't. Another option I was thinking, may be we could embed this into TextSearchInfo
/ MappedFieldType
as a property, since as you mentioned it is specific to StringFieldType
only?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I mean is QueryStringQueryParser calls the generic MappedFieldType.wildcardQuery method. I can eliminate the normalized version of this method call by bumping the new method up the class hierarchy but I'd prefer not do that since normalizeIfAnalyzed
is only relevant to the StringFIeldType. (and I don't like the idea of having to keep track of what the caseInsensitive
and normalizeIfAnalyzed
boolean logic means if we add another field type that doesn't care).
I think this API is cleaner? But I should add javadocs to describe that normalizeWildcardQuery
normalizes to lower case whereas wildcardQuery
optionally lowers case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Javadocs would help for sure, I don't think my suggestion (to have overloaded wildcardQuery
) is any better than new normalizedWildcardQuery
- removes confusion in one place but hurts another one. What do you think about enriching TextSearchInfo
for StringFieldType
? (than we don't need this normalizeIfAnalyzed
argument at all) I believe this could be exactly the right place to do so:
/**
* Encapsulates information about how to perform text searches over a field
*
* @opensearch.internal
*/
public class TextSearchInfo {
...
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about enriching TextSearchInfo for StringFieldType?
The problem is QueryStringQueryParser
needs to change the normalization behavior at search runtime and TextSearchInfo is defined at index creation time for encapsulating Lucene fieldType parameters (e.g., norms, offsets, positions). This could be hacked around by adding a setter to dynamically change TextSearchInfo based on the query but that changes the purpose of TextSearchInfo
and I think adds confusion on when to do this and for what type of queries. It might be worth exploring but I think in a follow up enhancement issue since the surface area impacted is larger than the scope of this bug fix? This may also highlight that TextSearchInfo
isn't the best classname.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see thanks @nknize
Gradle Check (Jenkins) Run Completed with:
|
Codecov Report
@@ Coverage Diff @@
## main #5462 +/- ##
============================================
- Coverage 71.10% 70.93% -0.17%
+ Complexity 58238 58154 -84
============================================
Files 4711 4711
Lines 277573 277579 +6
Branches 40180 40182 +2
============================================
- Hits 197358 196912 -446
- Misses 64078 64530 +452
Partials 16137 16137
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
Fixes the wildcard query to not normalize the pattern when case_insensitive is set by the user. This is achieved by creating a new normalizedWildcardQuery method so that query_string queries (which do not support case sensitivity) can still normalize the pattern when the default analyzer is used; maintaining existing behavior. Signed-off-by: Nicholas Walter Knize <[email protected]>
Signed-off-by: Nicholas Walter Knize <[email protected]>
Signed-off-by: Nicholas Walter Knize <[email protected]>
Signed-off-by: Nicholas Walter Knize <[email protected]>
Signed-off-by: Nicholas Walter Knize <[email protected]>
8cea795
to
7e73510
Compare
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
Fixes the wildcard query to not normalize the pattern when case_insensitive is set by the user. This is achieved by creating a new normalizedWildcardQuery method so that query_string queries (which do not support case sensitivity) can still normalize the pattern when the default analyzer is used; maintaining existing behavior. Signed-off-by: Nicholas Walter Knize <[email protected]> (cherry picked from commit ce25dec) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Fixes the wildcard query to not normalize the pattern when case_insensitive is set by the user. This is achieved by creating a new normalizedWildcardQuery method so that query_string queries (which do not support case sensitivity) can still normalize the pattern when the default analyzer is used; maintaining existing behavior. Signed-off-by: Nicholas Walter Knize <[email protected]> (cherry picked from commit ce25dec) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Signed-off-by: Nicholas Walter Knize <[email protected]> Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Fixes the wildcard query to not normalize the pattern when case_insensitive is set by the user. This is achieved by creating a new normalizedWildcardQuery method so that query_string queries (which do not support case sensitivity) can still normalize the pattern when the default analyzer is used; maintaining existing behavior. Signed-off-by: Nicholas Walter Knize <[email protected]>
…ature/identity (#5581) * Fix flaky ShardIndexingPressureConcurrentExecutionTests (#5439) Add conditional check on assertNull to fix flaky tests. Signed-off-by: Rishikesh1159 <[email protected]> * Fix bwc for cluster manager throttling settings (#5305) Signed-off-by: Dhwanil Patel <[email protected]> * Update ingest-attachment plugin dependencies: Apache Tika 3.6.0, Apache Mime4j 0.8.8, Apache Poi 5.2.3, Apache PdfBox 2.0.27 (#5448) Signed-off-by: Andriy Redko <[email protected]> Signed-off-by: Andriy Redko <[email protected]> * Enhance CheckpointState to support no-op replication (#5282) * CheckpointState enhanced to support no-op replication Signed-off-by: Ashish Singh <[email protected]> Co-authored-by: Bukhtawar Khan<[email protected]> * [BUG] org.opensearch.repositories.s3.RepositoryS3ClientYamlTestSuiteIT/test {yaml=repository_s3/20_repository_permanent_credentials/Snapshot and Restore with repository-s3 using permanent credentials} flaky: randomizing basePath (#5482) Signed-off-by: Andriy Redko <[email protected]> Signed-off-by: Andriy Redko <[email protected]> * [Bug] fix case sensitivity for wildcard queries (#5462) Fixes the wildcard query to not normalize the pattern when case_insensitive is set by the user. This is achieved by creating a new normalizedWildcardQuery method so that query_string queries (which do not support case sensitivity) can still normalize the pattern when the default analyzer is used; maintaining existing behavior. Signed-off-by: Nicholas Walter Knize <[email protected]> * Support OpenSSL Provider with default Netty allocator (#5460) Signed-off-by: Andriy Redko <[email protected]> Signed-off-by: Andriy Redko <[email protected]> * Revert "build no-jdk distributions as part of release build (#4902)" (#5465) This reverts commit 8c9ca4e. It seems that this wasn't entirely the correct way and is currently blocking us from removing the `build.sh` from the `opensearch-build` repository (i.e. this `build.sh` here is not yet being used). See the discussion in opensearch-project/opensearch-build#2835 for further details. Signed-off-by: Ralph Ursprung <[email protected]> Signed-off-by: Ralph Ursprung <[email protected]> * Add max_shard_size parameter for Shrink API (fix supported version after backport) (#5503) Signed-off-by: Andriy Redko <[email protected]> Signed-off-by: Andriy Redko <[email protected]> * Sync CODEOWNERS with MAINTAINERS. (#5501) Signed-off-by: Daniel (dB.) Doubrovkine <[email protected]> Signed-off-by: Daniel (dB.) Doubrovkine <[email protected]> * Added jackson dependency to server (#5366) * Added jackson dependency to server Signed-off-by: Ryan Bogan <[email protected]> * Updated CHANGELOG Signed-off-by: Ryan Bogan <[email protected]> * Update build.gradle files Signed-off-by: Ryan Bogan <[email protected]> * Add RuntimePermission to fix errors Signed-off-by: Ryan Bogan <[email protected]> Signed-off-by: Ryan Bogan <[email protected]> * Fix flaky test BulkIntegrationIT.testDeleteIndexWhileIndexing (#5491) Signed-off-by: Poojita Raj <[email protected]> Signed-off-by: Poojita Raj <[email protected]> * Add release notes for 2.4.1 (#5488) Signed-off-by: Xue Zhou <[email protected]> Signed-off-by: Xue Zhou <[email protected]> * Properly skip OnDemandBlockSnapshotIndexInputTests.testVariousBlockSize on Windows. (#5511) PR #5397 skipped this test in @before block but still frequently throws a TestCouldNotBeSkippedException. This is caused by the after block still executing and throwing an exception while cleaning the directory created at the path in @before. Moving the assumption to the individual test prevents this exception by ensuring the path exists. Signed-off-by: Marc Handalian <[email protected]> Signed-off-by: Marc Handalian <[email protected]> * Merge first batch of feature/extensions into main (#5347) * Merge first batch of feature/extensions into main Signed-off-by: Ryan Bogan <[email protected]> * Fixed CHANGELOG Signed-off-by: Ryan Bogan <[email protected]> * Fixed newline errors Signed-off-by: Ryan Bogan <[email protected]> * Renaming and CHANGELOG fixes Signed-off-by: Ryan Bogan <[email protected]> * Refactor extension loading into private method Signed-off-by: Ryan Bogan <[email protected]> * Removed skipValidation and added connectToExtensionNode method Signed-off-by: Ryan Bogan <[email protected]> * Remove unnecessary feature flag calls Signed-off-by: Ryan Bogan <[email protected]> * Renaming and exception handling Signed-off-by: Ryan Bogan <[email protected]> * Change latches to CompletableFuture Signed-off-by: Ryan Bogan <[email protected]> * Removed unnecessary validateSettingKey call Signed-off-by: Ryan Bogan <[email protected]> * Fix azure-core dependency Signed-off-by: Ryan Bogan <[email protected]> * Update SHAs Signed-off-by: Ryan Bogan <[email protected]> * Remove unintended dependency changes Signed-off-by: Ryan Bogan <[email protected]> * Removed dynamic settings regitration, removed info() method, and added NoopExtensionsManager Signed-off-by: Ryan Bogan <[email protected]> * Add javadoc Signed-off-by: Ryan Bogan <[email protected]> * Fixed spotless failure Signed-off-by: Ryan Bogan <[email protected]> * Removed NoopExtensionsManager Signed-off-by: Ryan Bogan <[email protected]> * Added functioning NoopExtensionsManager Signed-off-by: Ryan Bogan <[email protected]> * Added missing javadoc Signed-off-by: Ryan Bogan <[email protected]> * Remove forbiddenAPI Signed-off-by: Ryan Bogan <[email protected]> * Fix spotless Signed-off-by: Ryan Bogan <[email protected]> * Change logger.info to logger.error in handleException Signed-off-by: Ryan Bogan <[email protected]> * Fix ExtensionsManagerTests Signed-off-by: Ryan Bogan <[email protected]> * Removing unrelated change Signed-off-by: Ryan Bogan <[email protected]> * Update SHAs Signed-off-by: Ryan Bogan <[email protected]> Signed-off-by: Ryan Bogan <[email protected]> * Bump commons-compress from 1.21 to 1.22 (#5520) Bumps commons-compress from 1.21 to 1.22. --- updated-dependencies: - dependency-name: org.apache.commons:commons-compress dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * [Segment Replication] Trigger a round of replication for replica shards during peer recovery when segment replication is enabled (#5332) * Fix new added replica shards falling behind primary. Signed-off-by: Rishikesh1159 <[email protected]> * Trigger a round of replication during peer recovery when segment replication is enabled. Signed-off-by: Rishikesh1159 <[email protected]> * Remove unnecessary start replication overloaded method. Signed-off-by: Rishikesh1159 <[email protected]> * Add test for failure case and refactor some code. Signed-off-by: Rishikesh1159 <[email protected]> * Apply spotless check. Signed-off-by: Rishikesh1159 <[email protected]> * Addressing comments on the PR. Signed-off-by: Rishikesh1159 <[email protected]> * Remove unnecessary condition check. Signed-off-by: Rishikesh1159 <[email protected]> * Apply spotless check. Signed-off-by: Rishikesh1159 <[email protected]> * Add step listeners to resolve forcing round of segment replication. Signed-off-by: Rishikesh1159 <[email protected]> Signed-off-by: Rishikesh1159 <[email protected]> * Adding support to register settings dynamically (#5495) * Adding support to register settings dynamically Signed-off-by: Ryan Bogan <[email protected]> * Update CHANGELOG Signed-off-by: Ryan Bogan <[email protected]> * Removed unnecessary registerSetting methods Signed-off-by: Ryan Bogan <[email protected]> * Change setting registration order Signed-off-by: Ryan Bogan <[email protected]> * Add unregisterSettings method Signed-off-by: Ryan Bogan <[email protected]> * Remove unnecessary feature flag Signed-off-by: Ryan Bogan <[email protected]> Signed-off-by: Ryan Bogan <[email protected]> * Updated 1.3.7 release notes date (#5536) Signed-off-by: owaiskazi19 <[email protected]> Signed-off-by: owaiskazi19 <[email protected]> * Pre conditions check before updating weighted routing metadata (#4955) * Pre conditions check to allow weight updates for non decommissioned attribute Signed-off-by: Rishab Nahata <[email protected]> * Atomically update cluster state with decommission status and corresponding action (#5093) * Atomically update the cluster state with decommission status and its corresponding action in the same execute call Signed-off-by: Rishab Nahata <[email protected]> * Update Netty to 4.1.86.Final (#5529) Signed-off-by: Andriy Redko <[email protected]> Signed-off-by: Andriy Redko <[email protected]> * Update release date in 2.4.1 release notes (#5549) Signed-off-by: Suraj Singh <[email protected]> Signed-off-by: Suraj Singh <[email protected]> * Update 2.4.1 release notes (#5552) Signed-off-by: Andriy Redko <[email protected]> Signed-off-by: Andriy Redko <[email protected]> * Refactor fuzziness interface on query builders (#5433) * Refactor Object to Fuzziness type for all query builders Signed-off-by: noCharger <[email protected]> * Revise on bwc Signed-off-by: noCharger <[email protected]> * Update change log Signed-off-by: noCharger <[email protected]> Signed-off-by: noCharger <[email protected]> Co-authored-by: Daniel (dB.) Doubrovkine <[email protected]> * Upgrade lucene version (#5570) * Added bwc version 2.4.2 Signed-off-by: Daniel (dB.) Doubrovkine <[email protected]> * Added 2.4.2. Signed-off-by: Daniel (dB.) Doubrovkine <[email protected]> * Update Lucene snapshot to 9.5.0-snapshot-d5cef1c Signed-off-by: Suraj Singh <[email protected]> * Update changelog entry Signed-off-by: Suraj Singh <[email protected]> * Add 2.4.2 bwc version Signed-off-by: Suraj Singh <[email protected]> * Internal changes post lucene upgrade Signed-off-by: Suraj Singh <[email protected]> Signed-off-by: Daniel (dB.) Doubrovkine <[email protected]> Signed-off-by: Suraj Singh <[email protected]> Co-authored-by: opensearch-ci-bot <[email protected]> Co-authored-by: Daniel (dB.) Doubrovkine <[email protected]> * Add CI bundle pattern to distribution download (#5348) * Add CI bundle pattern for ivy repo Signed-off-by: Zelin Hao <[email protected]> * Gradle update Signed-off-by: Zelin Hao <[email protected]> * Extract path Signed-off-by: Zelin Hao <[email protected]> * Change with customDistributionDownloadType Signed-off-by: Zelin Hao <[email protected]> * Add default for exception handle Signed-off-by: Zelin Hao <[email protected]> * Add documentations Signed-off-by: Zelin Hao <[email protected]> Signed-off-by: Zelin Hao <[email protected]> * Bump protobuf-java from 3.21.9 to 3.21.11 in /plugins/repository-hdfs (#5519) * Bump protobuf-java from 3.21.9 to 3.21.11 in /plugins/repository-hdfs Bumps [protobuf-java](https://github.com/protocolbuffers/protobuf) from 3.21.9 to 3.21.11. - [Release notes](https://github.com/protocolbuffers/protobuf/releases) - [Changelog](https://github.com/protocolbuffers/protobuf/blob/main/generate_changelog.py) - [Commits](protocolbuffers/protobuf@v3.21.9...v3.21.11) --- updated-dependencies: - dependency-name: com.google.protobuf:protobuf-java dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> * Updating SHAs Signed-off-by: dependabot[bot] <[email protected]> * Updated changelog Signed-off-by: Owais Kazi <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: Owais Kazi <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: dependabot[bot] <dependabot[bot]@users.noreply.github.com> Co-authored-by: Owais Kazi <[email protected]> Co-authored-by: Suraj Singh <[email protected]> Signed-off-by: Rishikesh1159 <[email protected]> Signed-off-by: Dhwanil Patel <[email protected]> Signed-off-by: Andriy Redko <[email protected]> Signed-off-by: Ashish Singh <[email protected]> Signed-off-by: Nicholas Walter Knize <[email protected]> Signed-off-by: Ralph Ursprung <[email protected]> Signed-off-by: Daniel (dB.) Doubrovkine <[email protected]> Signed-off-by: Ryan Bogan <[email protected]> Signed-off-by: Poojita Raj <[email protected]> Signed-off-by: Xue Zhou <[email protected]> Signed-off-by: Marc Handalian <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: owaiskazi19 <[email protected]> Signed-off-by: Rishab Nahata <[email protected]> Signed-off-by: Suraj Singh <[email protected]> Signed-off-by: noCharger <[email protected]> Signed-off-by: Zelin Hao <[email protected]> Signed-off-by: Owais Kazi <[email protected]> Co-authored-by: Rishikesh Pasham <[email protected]> Co-authored-by: Dhwanil Patel <[email protected]> Co-authored-by: Andriy Redko <[email protected]> Co-authored-by: Ashish <[email protected]> Co-authored-by: Nick Knize <[email protected]> Co-authored-by: Ralph Ursprung <[email protected]> Co-authored-by: Daniel (dB.) Doubrovkine <[email protected]> Co-authored-by: Ryan Bogan <[email protected]> Co-authored-by: Poojita Raj <[email protected]> Co-authored-by: Xue Zhou <[email protected]> Co-authored-by: Marc Handalian <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Owais Kazi <[email protected]> Co-authored-by: Rishab Nahata <[email protected]> Co-authored-by: Suraj Singh <[email protected]> Co-authored-by: Louis Chu <[email protected]> Co-authored-by: opensearch-ci-bot <[email protected]> Co-authored-by: Zelin Hao <[email protected]> Co-authored-by: dependabot[bot] <dependabot[bot]@users.noreply.github.com>
Description
Fixes the
wildcard
query to not normalize the pattern whencase_insensitive
is set by the user. This is achieved by creating a new normalizedWildcardQuery method so that query_string queries (which do not support case sensitivity) can still normalize the pattern when the default analyzer is used; maintaining existing behavior.Issues Resolved
closes #5461
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.