Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TermQuery with empty string value is mistakenly treated as ConditionLess and excluded from search request body #281

Closed
david-alpert-nl opened this issue Jul 15, 2023 · 14 comments · Fixed by #283
Labels
bug Something isn't working

Comments

@david-alpert-nl
Copy link

david-alpert-nl commented Jul 15, 2023

What is the bug?

When I try to use the .MustNot(...) fluent expression to add a must_not clause to my search query the must_not part of the expression is missing from the resulting request body.

How can one reproduce the bug?

I tried to add a unit test to this project's source code but could not figure out the testing infrastructure in this project.

Instead, consider this scenario (adapted from an existing reproduce test):

public class GithubIssueMustNotMissingFromRequestBody
{
	public class SampleDomainObject
	{
		[JsonPropertyName("first_name")]
		public string? FirstName { get; set; }

		[JsonPropertyName("last_name")]
		public string? LastName { get; set; }
	}

	[Fact]
	public void MustNotClauseIsIncludedInSearchRequestBody()
	{
		var connectionSettings = new ConnectionSettings(new InMemoryConnection()).DisableDirectStreaming();
		var client = new OpenSearchClient(connectionSettings);

		var action = () =>
			client.Search<SampleDomainObject>(s => s
				.Query(q => q
					.Bool(b => b
						.Must(m => m.Exists(e => e.Field("last_name")))
						.MustNot(m => m.Term(t => t.Field("last_name.keyword").Value(string.Empty)))
					)
				)
				.Index("index")
				.Source(sfd => null)
			);

		var response = action.Should().NotThrow().Subject;

		var json = Encoding.UTF8.GetString(response.ApiCall.RequestBodyInBytes);
		json.Should().Be(@"{""query"":{""bool"":{""must"":[{""exists"":{""field"":""last_name""}}],""must_not"":[{""term"":{""last_name.keyword"":{""value"":""""}}}]}}}");
	}
}

What is the expected behavior?

I expect this unit test to pass, successfully converting the .MustNot(...) call into a must_not clause in the search request.

Instead it fails returning the following result:

Xunit.Sdk.XunitException
Expected json to be "{"query":{"bool":{"must":{"exists":{"field":"last_name"}},"must_not":{"term":{"last_name.keyword":{"value":""}}}}}}" with a length of 105, but "{"query":{"bool":{"must":[{"exists":{"field":"last_name"}}]}}}" has a length of 61, differs near "[{"" (index 25).

What is your host/environment?

I cloned this repo to my MacOS laptop:

  • OS: macOS Monterey v12.6
  • MacBook Pro (16-inch, 2021)
  • Chip: Apple M1 Max

I then locally added a new Tests.Unit C# project with the following nuget packages attempting to reproduce the existing Tests.Reproduce dependencies but without the complicated test discovery infrastructure which seems to confuse my JetBrains Rider IDE:

  • xunit/2.5.0
  • xunit.runner.visualstudio/2.5.0
  • Microsoft.NET.Test.Sdk/17.6.3
  • FluentAssertions/6.10.0

Do you have any screenshots?

image

NOTE: I have updated the code slightly since taking that screenshot to use e => e.Field("last_name") instead of e => e.Field(f => f.LastName)) but the failures still stand.

Do you have any additional context?

I created this test case to reproduce an issue where I can submit the following query to an AWS OpenSearch engine using curl or the OpenSearch dashboard:

GET /my-index/_search
{
     "query": {
        "bool": {
          "must": [{
            "exists": { "field": "last_name"}
          }],
          "must_not": [{
            "term": {"last_name.keyword": { "value": "" }}
          }]
        }
      },
}

and the query returns predictably different results whether I include that must_not clause.

my data set has several documents which have empty last_name values and when the query includes the must_not clause those are filtered out; when the query is missing the must_not clause like below the documents with empty last_name values are included

GET /my-index/_search
{
     "query": {
        "bool": {
          "must": {
            "exists": { "field": "last_name"}
          }
        }
      },
}

now I am trying to express that same query with the must_not clause using the OpenSearch.SearchAsync<>(...) method and the fluent query builder and despite trying several different syntax patterns I am unable to generate a request that includes the must_not clause.

@david-alpert-nl david-alpert-nl added bug Something isn't working untriaged labels Jul 15, 2023
@david-alpert-nl
Copy link
Author

david-alpert-nl commented Jul 15, 2023

here is another test which I believe is narrowing down the problem

	[Fact]
	public void MustNotClauseIsPresentInMemory()
	{
		Func<SearchDescriptor<SampleDomainObject>, ISearchRequest> selector = s => s
			.Query(q => q
				.Bool(b => b
					.Must(m => m.Exists(e => e.Field("last_name")))
					.MustNot(m => m.Term(t => t.Field("last_name.keyword").Value(string.Empty)))
				)
			)
			.Index("index")
			.Source(sfd => null);

		var searchRequest = selector.Invoke(new SearchDescriptor<SampleDomainObject>());
		var query = searchRequest.Query as IQueryContainer;
		
		// this is fine
		query.Bool.Must.Should().NotBeEmpty();
		query.Bool.Must.First().Should().NotBeNull("Must");
		
		// this too...
		query.Bool.MustNot.Should().NotBeEmpty();
		// ... but this is fails as the collection contains one `<null>` entry which explains why it's missing from the serialized request JSON
		query.Bool.MustNot.First().Should().NotBeNull("MustNot");
	}

fails with

Xunit.Sdk.XunitException
Expected query.Bool.MustNot.First() not to be <null> because MustNot.
   at FluentAssertions.Execution.XUnit2TestFramework.Throw(String message)

@david-alpert-nl
Copy link
Author

david-alpert-nl commented Jul 15, 2023

the .MustNot method seems to work fine as this test passes

	[Fact]
	public void MustNotExistsClauseShouldNotBeNull()
	{
		Func<SearchDescriptor<SampleDomainObject>, ISearchRequest> selector = s => s
			.Query(q => q
				.Bool(b => b
					.Must(m => m.Exists(e => e.Field("last_name")))
					.MustNot(m => m.Exists(e => e.Field("last_name")))
				)
			)
			.Index("index")
			.Source(sfd => null);

		var searchRequest = selector.Invoke(new SearchDescriptor<SampleDomainObject>());
		var query = searchRequest.Query as IQueryContainer;

		// this is fine
		query.Bool.Must.Should().NotBeEmpty();
		query.Bool.Must.First().Should().NotBeNull("Must");

		// MustNot ... Exists seems to work
		query.Bool.MustNot.Should().NotBeEmpty();
		query.Bool.MustNot.First().Should().NotBeNull("MustNot");
	}

@david-alpert-nl david-alpert-nl changed the title fluent .MustNot(...) expressions missing from the request body fluent .MustNot(mn => mn.Term(...)) expressions missing from the request body Jul 15, 2023
@david-alpert-nl david-alpert-nl changed the title fluent .MustNot(mn => mn.Term(...)) expressions missing from the request body fluent .MustNot(mn => mn.Term(...).Value(string.Empty)) expressions missing from the request body Jul 15, 2023
@david-alpert-nl
Copy link
Author

david-alpert-nl commented Jul 15, 2023

in fact, a MustNot with a Term phrase and a non-empty Value parameter serializes fine; set the Value parameter to string.Empty and the entire must_not clause goes missing from the request.

so this may be a serialization configuration issue where objects with value of string.Empty are treated as default/empty values and ignored during serialization... 🤔

potentially an issue in the OpenSearch.Client.Extensions where the code uses str.IsNullOrEmpty() ? null : ... instead of str == null ? null : ...?

@david-alpert-nl
Copy link
Author

david-alpert-nl commented Jul 15, 2023

this succeeds

	[Fact]
	public void TermQueryWithNonEmptyValueSerializesToNonNullResult()
	{
		Func<QueryContainerDescriptor<SampleDomainObject>, QueryContainer> termQuery =
			m => m.Term(t => t.Field("last_name.keyword").Value("doe"));

		var result = termQuery.Invoke(new QueryContainerDescriptor<SampleDomainObject>());

		result.Should().NotBeNull();
	}

but this fails

	[Fact]
	public void TermQueryWithEmptyValueSerializesToNonNullResult()
	{
		Func<QueryContainerDescriptor<SampleDomainObject>, QueryContainer> termQuery =
			m => m.Term(t => t.Field("last_name.keyword").Value(string.Empty));

		var result = termQuery.Invoke(new QueryContainerDescriptor<SampleDomainObject>());

		result.Should().NotBeNull();
	}

with

Xunit.Sdk.XunitException
Expected result not to be <null>.
   at FluentAssertions.Execution.XUnit2TestFramework.Throw(String message)

@david-alpert-nl
Copy link
Author

david-alpert-nl commented Jul 15, 2023

think I found it:

internal static bool IsConditionless(ITermQuery q) => q.Value == null || q.Value.ToString().IsNullOrEmpty() || q.Field.IsConditionless();

I cannot unit test that internal static method but it's clear that a TermQuery with a value of String.Empty or "" will evaluate to null which makes it impossible to replicate this valid OpenSearch query

GET /my-index/_search
{
     "query": {
        "bool": {
          "must": [{
            "exists": { "field": "last_name"}
          }],
          "must_not": [{
            "term": {"last_name.keyword": { "value": "" }}
          }]
        }
      },
}

@david-alpert-nl
Copy link
Author

david-alpert-nl commented Jul 15, 2023

confirmed that's my problem

	[Theory]
	[InlineData("null", null, true)]
	[InlineData("non-empty string", "doe", false)]
	[InlineData("empty string", "", false)]
	public void TermQueryIsConditionless(string scenario, string? val, bool expected)
	{
		bool SimulateIsConditionless(ITermQuery q)
		{
			return q.Value == null || string.IsNullOrEmpty(q.Value.ToString());
		}

		var temrQuery = new TermQuery { Value = val };

		var result = SimulateIsConditionless(temrQuery);

		result.Should().Be(expected, $"{scenario} should be conditionless: ${expected}");
	}

image

I would be happy to contribute a fix if someone can

  • confirm that this is in fact a bug in the code; and
  • show me how to run the tests in this project (when I get latest and I run build.sh I have failing tests)

@david-alpert-nl david-alpert-nl changed the title fluent .MustNot(mn => mn.Term(...).Value(string.Empty)) expressions missing from the request body TermQuery with empty string value is mistakenly treated as ConditionLess and excluded from search request body Jul 15, 2023
@david-alpert-nl
Copy link
Author

it seems that the fix is to update

internal static bool IsConditionless(ITermQuery q) => q.Value == null || q.Value.ToString().IsNullOrEmpty() || q.Field.IsConditionless();

to:

 internal static bool IsConditionless(ITermQuery q) => q.Value?.ToString() == null || q.Field.IsConditionless(); 

davidalpert added a commit to davidalpert/opensearch-net that referenced this issue Jul 16, 2023
… missing from the request body

More properly this test, if accepted as a legit failing expectation
should live inside the Tests.Reproduce project but I could not figure
out how to run tests in that project. If accepted this can move or be
recreated wherever it needs to live
@Xtansia
Copy link
Collaborator

Xtansia commented Jul 16, 2023

Hi @david-alpert-nl,

Thanks for putting in such a concerted effort to investigate the issue and submit a fix, it's greatly appreciated!

@Xtansia Xtansia removed the untriaged label Jul 16, 2023
@david-alpert-nl
Copy link
Author

I may have discovered a workaround.

* By default an empty term is conditionless so will be rewritten. Sometimes sending an empty term to
* match nothing makes sense. You can either use the `ConditionlessQuery` construct from OSC to provide a fallback or make the
* query verbatim as followed:

led me to this

/// <summary>
/// Whether the query should be treated as verbatim. A verbatim query will be serialized as part of the request,
/// irrespective
/// of whether it is <see cref="Conditionless" /> or not.
/// </summary>
[IgnoreDataMember]
bool IsVerbatim { get; set; }

and now when I add .Verbatim() to my TermQuery it gets serialized as expected.

I am going to close this issue as resolved, working as designed, and close GH-283 also.

@wbeckler
Copy link

wbeckler commented Jul 17, 2023 via email

davidalpert added a commit to davidalpert/opensearch-net that referenced this issue Jul 17, 2023
davidalpert added a commit to davidalpert/opensearch-net that referenced this issue Jul 17, 2023
Further investigation shows that a TermQuery with an empty
string value is expected to be 'conditionless' in this
codebase and the TermQuery implementation includes a
.Verbatim() method to allow the client to serialize
the query clause even though it evaluates to
conditionless

this allows me to create a query like this

    GET /index/_search
    {
      "query": {
        "bool": {
          "must": [
            {"exists": { "field": "last_name"}}
          ],
          "must_not": [
            {"term": {"last_name.keyword": {"value": ""}}}
          ]
        }
      }
    }

using the following syntax

    client.Search<SampleDomainObject>(s => s
      .Query(q => q
        .Bool(b => b
          .Must(m => m.Exists(e => e.Field("last_name")))
          .MustNot(m => m.Term(t => t.Verbatim().Field("last_name.keyword").Value(string.Empty)))
        )
      )
      .Index("index")
      .Source(sfd => null)
    );

thus resolving that opensearch-projectGH-281 is not a bug and is working as designed.

Signed-off-by: David Alpert <[email protected]>
@david-alpert-nl
Copy link
Author

great idea; I will look for an opportunity to make that more clear in the docs.

@david-alpert-nl
Copy link
Author

@dblock
Copy link
Member

dblock commented Jul 17, 2023

Thanks @david-alpert-nl.

Btw, we've been trying to merge docs closer to the code into https://github.com/opensearch-project/opensearch-net/blob/main/USER_GUIDE.md and will eventually cleanup duplicates. Is this useful here?

@david-alpert-nl
Copy link
Author

perfect; I can update USER_GUIDE.md in my open PR that adds tests documenting this issue.

davidalpert added a commit to davidalpert/opensearch-net that referenced this issue Jul 17, 2023
…y string serializes as <null>

Signed-off-by: David Alpert <[email protected]>
davidalpert added a commit to davidalpert/opensearch-net that referenced this issue Jul 17, 2023
Further investigation shows that a TermQuery with an empty
string value is expected to be 'conditionless' in this
codebase and the TermQuery implementation includes a
.Verbatim() method to allow the client to serialize
the query clause even though it evaluates to
conditionless

this allows me to create a query like this

    GET /index/_search
    {
      "query": {
        "bool": {
          "must": [
            {"exists": { "field": "last_name"}}
          ],
          "must_not": [
            {"term": {"last_name.keyword": {"value": ""}}}
          ]
        }
      }
    }

using the following syntax

    client.Search<SampleDomainObject>(s => s
      .Query(q => q
        .Bool(b => b
          .Must(m => m.Exists(e => e.Field("last_name")))
          .MustNot(m => m.Term(t => t.Verbatim().Field("last_name.keyword").Value(string.Empty)))
        )
      )
      .Index("index")
      .Source(sfd => null)
    );

thus resolving that opensearch-projectGH-281 is not a bug and is working as designed.

Signed-off-by: David Alpert <[email protected]>
davidalpert added a commit to davidalpert/opensearch-net that referenced this issue Jul 20, 2023
…y string serializes as <null>

Signed-off-by: David Alpert <[email protected]>
davidalpert added a commit to davidalpert/opensearch-net that referenced this issue Jul 20, 2023
Further investigation shows that a TermQuery with an empty
string value is expected to be 'conditionless' in this
codebase and the TermQuery implementation includes a
.Verbatim() method to allow the client to serialize
the query clause even though it evaluates to
conditionless

this allows me to create a query like this

    GET /index/_search
    {
      "query": {
        "bool": {
          "must": [
            {"exists": { "field": "last_name"}}
          ],
          "must_not": [
            {"term": {"last_name.keyword": {"value": ""}}}
          ]
        }
      }
    }

using the following syntax

    client.Search<SampleDomainObject>(s => s
      .Query(q => q
        .Bool(b => b
          .Must(m => m.Exists(e => e.Field("last_name")))
          .MustNot(m => m.Term(t => t.Verbatim().Field("last_name.keyword").Value(string.Empty)))
        )
      )
      .Index("index")
      .Source(sfd => null)
    );

thus resolving that opensearch-projectGH-281 is not a bug and is working as designed.

Signed-off-by: David Alpert <[email protected]>
davidalpert added a commit to davidalpert/opensearch-net that referenced this issue Jul 20, 2023
davidalpert added a commit to davidalpert/opensearch-net that referenced this issue Jul 20, 2023
davidalpert added a commit to davidalpert/opensearch-net that referenced this issue Aug 11, 2023
Xtansia pushed a commit that referenced this issue Aug 14, 2023
… ignored as "conditionless" (#283)

* test: reproduce GH-281 TermQuery with Value of empty string serializes as <null>

Signed-off-by: David Alpert <[email protected]>

* test: demonstrate GH-128 is a feature, not a bug

Further investigation shows that a TermQuery with an empty
string value is expected to be 'conditionless' in this
codebase and the TermQuery implementation includes a
.Verbatim() method to allow the client to serialize
the query clause even though it evaluates to
conditionless

this allows me to create a query like this

    GET /index/_search
    {
      "query": {
        "bool": {
          "must": [
            {"exists": { "field": "last_name"}}
          ],
          "must_not": [
            {"term": {"last_name.keyword": {"value": ""}}}
          ]
        }
      }
    }

using the following syntax

    client.Search<SampleDomainObject>(s => s
      .Query(q => q
        .Bool(b => b
          .Must(m => m.Exists(e => e.Field("last_name")))
          .MustNot(m => m.Term(t => t.Verbatim().Field("last_name.keyword").Value(string.Empty)))
        )
      )
      .Index("index")
      .Source(sfd => null)
    );

thus resolving that GH-281 is not a bug and is working as designed.

Signed-off-by: David Alpert <[email protected]>

* refactor: address PR feedback

GH-281

Signed-off-by: David Alpert <[email protected]>

* docs: update USAGE.md with an example of IsVerbatim/Verbatim()

GH-281

Signed-off-by: David Alpert <[email protected]>

---------

Signed-off-by: David Alpert <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
4 participants