Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] DLS performance has regressed with new serialization format #3776

Closed
peternied opened this issue Nov 30, 2023 · 14 comments · Fixed by #4264
Closed

[BUG] DLS performance has regressed with new serialization format #3776

peternied opened this issue Nov 30, 2023 · 14 comments · Fixed by #4264
Labels
bug Something isn't working triaged Issues labeled as 'Triaged' have been reviewed and are deemed actionable.

Comments

@peternied
Copy link
Member

peternied commented Nov 30, 2023

What is the bug?
Some users of OpenSearch have seen a performance decrease associated with DLS queries. Larger DLS queries (more characters)
have a larger impact.

How can one reproduce the bug?

  1. Checkout my fork with the repro git clone https://github.com/peternied/security.git
  2. Checkout the most recent security plugin version with the new tests git checkout dls-perf
  3. Execute the new tests ./gradlew integrationTest --tests org.opensearch.security.DlsTests.testDlsLargerQueryScenarios -x jacocoTestReport
  4. Open the test report ./build/reports/tests/integrationTest/classes/org.opensearch.security.DlsTests.html
  5. View the standard output
Creating 5 indices with 1 document
User, Count, Avg, Max, Min, Std ms reader, 100, 11.07, 17, 8, 1.67
Attached READER with role DLS_ONLY_LONG_VALUE
User, Count, Avg, Max, Min, Std ms reader, 100, 8.45, 17, 6, 1.64
Finished checks in 5282ms
Creating 50 indices with 1 document
User, Count, Avg, Max, Min, Std ms reader, 100, 13.68, 21, 10, 2.27
Attached READER with role DLS_ONLY_LONG_VALUE
User, Count, Avg, Max, Min, Std ms reader, 100, 252.04, 538, 228, 31.20
Finished checks in 30905ms
Creating 100 indices with 1 document
User, Count, Avg, Max, Min, Std ms reader, 100, 12.56, 19, 10, 1.78
Attached READER with role DLS_ONLY_LONG_VALUE
User, Count, Avg, Max, Min, Std ms reader, 100, 984.57, 1027, 916, 21.51
Finished checks in 107909ms

What is the expected behavior?
After the DLS_ONLY_LONG_VALUE is added, the AVG should not jump up so much.

Do you have any additional context?
You can collect measures from the 2.3 build by running git checkout dls-perf from that branch to collect numbers from that version

This issue was introduced in the following PR which did improve serialization in many scenarios, but seems to be impacting DLS queries as stored in headers.

@peternied peternied added bug Something isn't working untriaged Require the attention of the repository maintainers and may need to be prioritized and removed untriaged Require the attention of the repository maintainers and may need to be prioritized labels Nov 30, 2023
@peternied
Copy link
Member Author

I've tracked the problem to this serialization code that seems to take ~1ms in 2.3, whereas it takes ~13ms in 2.11

threadContext.putHeader(
ConfigConstants.OPENDISTRO_SECURITY_DLS_QUERY_HEADER,
Base64Helper.serializeObject((Serializable) dlsQueries)
);
if (log.isDebugEnabled()) {
log.debug("attach DLS info: {}", dlsQueries);
}

@krishna-ggk
Copy link
Contributor

Is there JDK version difference between 2.3 and 2.11 in the test?

@peternied
Copy link
Member Author

It uses the default JDK on your machine to specify, set export JAVA_HOME= to run the tests with different JDKs

@peternied
Copy link
Member Author

Previous testing wasn't properly switching to using only the custom vs jdk (de)serializers which lead to incomplete results, its clear that the custom serialization process is producing the issue with the 981ms respond time compared to 63ms.

Test methodology

Modified Base64Helper.java by deleting other reference to Base64JDKHelper or Base64CustomHelper to force serialization process.

Ran with ./gradlew integrationTest --tests org.opensearch.security.DlsTests.testDlsLargerQueryScenarios -x jacocoTestReport

public static String serializeObject(final Serializable object, final boolean useJDKSerialization) {
return useJDKSerialization ? Base64JDKHelper.serializeObject(object) : Base64CustomHelper.serializeObject(object);
}
public static String serializeObject(final Serializable object) {
return serializeObject(object, false);
}
public static Serializable deserializeObject(final String string) {
return deserializeObject(string, false);
}
public static Serializable deserializeObject(final String string, final boolean useJDKDeserialization) {
return useJDKDeserialization ? Base64JDKHelper.deserializeObject(string) : Base64CustomHelper.deserializeObject(string);
}

Custom Serialization Enabled Results

Creating 5 indices with 1 document
User, Count, Avg, Max, Min, Std ms reader, 100, 10.45, 18, 8, 1.75
Attached READER with role DLS_ONLY_LONG_VALUE
User, Count, Avg, Max, Min, Std ms reader, 100, 8.85, 20, 6, 1.92
Finished checks in 5305ms
Creating 50 indices with 1 document
User, Count, Avg, Max, Min, Std ms reader, 100, 21.29, 310, 10, 34.49
Attached READER with role DLS_ONLY_LONG_VALUE
User, Count, Avg, Max, Min, Std ms reader, 100, 249.17, 278, 225, 9.03
Finished checks in 31874ms
Creating 100 indices with 1 document
User, Count, Avg, Max, Min, Std ms reader, 100, 12.75, 21, 10, 1.97
Attached READER with role DLS_ONLY_LONG_VALUE
User, Count, Avg, Max, Min, Std ms reader, 100, 981.53, 1050, 950, 17.50
Finished checks in 107491ms

JDK Serialization Enabled Results

Creating 5 indices with 1 document
User, Count, Avg, Max, Min, Std ms reader, 100, 11.67, 20, 9, 1.90
Attached READER with role DLS_ONLY_LONG_VALUE
User, Count, Avg, Max, Min, Std ms reader, 100, 8.62, 13, 6, 1.20
Finished checks in 5354ms
Creating 50 indices with 1 document
User, Count, Avg, Max, Min, Std ms reader, 100, 15.4, 23, 12, 2.31
Attached READER with role DLS_ONLY_LONG_VALUE
User, Count, Avg, Max, Min, Std ms reader, 100, 33.01, 81, 21, 11.79
Finished checks in 9024ms
Creating 100 indices with 1 document
User, Count, Avg, Max, Min, Std ms reader, 100, 17.73, 30, 14, 3.03
Attached READER with role DLS_ONLY_LONG_VALUE
User, Count, Avg, Max, Min, Std ms reader, 100, 63.54, 69, 59, 1.88
Finished checks in 11701ms

@peternied peternied changed the title [BUG] DLS performance has regressed since 2.3 [BUG] DLS performance has regressed with new serialization format Nov 30, 2023
@peternied
Copy link
Member Author

@parasjain1 - can you look into this can be setting can be disabled or fixed? I think this would justify a patch release - happy to help drive that process.

@mgodwan
Copy link
Member

mgodwan commented Nov 30, 2023

Test carried out for a map with duplicate values: https://gist.github.com/mgodwan/52801a25c13a9b0a00c6de0e7cbb0a73

The size of the data is smaller for jdk serialization as it uses ObjectOutpuStream which does not write duplicate objects again, but only handles.

I agree that it will help to have a config to control the serialization format on the fly, while also seeing if we can have same optimization of similar sense in OpenSearch core streams we use. I can create an issue for the same.

@cwperks
Copy link
Member

cwperks commented Nov 30, 2023

Would it make sense to update the GENERIC case of serialization/de-serialization to use JDK serialization?

i.e.

For Serialization

case GENERIC:
    streamOutput.writeByte((byte) CustomSerializationFormat.GENERIC.id);
    streamOutput.writeBytes(SerializationUtils.serialize(object));
    break;

For Deserialization

case GENERIC:
    obj = SerializationUtils.deserialize(streamInput.readByteArray());
    break;

@parasjain1
Copy link
Contributor

I had an idea to de-dupe the strings before serializing the objects and I ran few serialization benchmarks with this optimization for Map<String, Set> object that holds the DLS/FLS queries. The results showed that although it brought in major improvement but was still a lot slower than JDK serialization.

Sharing results below -
Test env - Mac M1 Pro (non-isolated system), JMH, JDK 11.0.7

Custom without de-dupe
 Benchmark                      (type)  Mode  Cnt          Score          Error  Units
 Base64HelperBench.deserialize     map  avgt   20  909312747.900 ± 95012981.669  ns/op
 Base64HelperBench.serialize       map  avgt   20  946489912.550 ±  7850240.160  ns/op

Custom with de-dupe
 Benchmark                      (type)  Mode   Cnt        Score       Error  Units
 Base64HelperBench.deserialize     map  avgt  1000  4494519.993 ± 17543.639  ns/op
 Base64HelperBench.serialize       map  avgt  1000  5656989.656 ± 34793.074  ns/op

JDK
 Benchmark                      (type)  Mode   Cnt       Score      Error  Units
 Base64HelperBench.deserialize     map  avgt  1000  268348.242 ± 7420.959  ns/op
 Base64HelperBench.serialize       map  avgt  1000  199770.825 ± 7573.412  ns/op

@parasjain1
Copy link
Contributor

In response to @cwperks's suggestion to use JDK serde in case of GENERIC, this will introduce complexity with regards to handling an additional serialization protocol which might be temporary until #11430 is resolved.

I've concluded that we can introduce a way to enable / disable custom serialization. The idea is to allow users with limited or no DLS/FLS use case to continue leveraging the faster custom serialization protocol other scenarios.

Below is the approach that I'll be implementing. This assumes that the changes will be part of a patch release version 2.11.1

  • Introduce a new boolean node attribute and a dynamic cluster setting - plugins.security.custom_serialization.enabled. It is to be noted that with this change the ** 2.11.1 nodes will have this node attribute present while the 2.11.0 nodes won’t.

    • This property will allow users to opt for the type of serialization protocol they want to use.
    • Default will be false for now, i.e. use JDK serialization. Can be changed to true once we’ve resolved this regression issue.
  • Requests sent via 2.11.1 nodes will have an additional boolean header plugins.security.custom_serialization.enabled with the value same as the node attribute. These will be used during message reception as we won’t have a reference to the remote node that has sent the message.

  • For requests sent from the 2.11.1 nodes to -

    • Case 1 : a node running OS version < 2.11
      • always use JDK to serialize the headers [this check is already present]
    • Case 2: a node running OS 2.11.0
      • while sending a request we have reference to the remote node and can check for the presence of the node attribute plugins.security.custom_serialization.enabled. In this case the attribute will not be present on the remote node. Thus, we serialize via custom serialization.
    • Case 3: a node running OS >= 2.11.1
      • The remote node will have the attribute plugins.security.custom_serialization.enabled set - if the attribute value is true serialize via custom serialization, JDK otherwise.
  • For requests received on the 2.11.1 nodes from -

    • Case 1 : a node running OS version < 2.11
      • always use JDK to deserialize the headers [this check is already present]
    • Case 2: a node running OS 2.11.0
      • Request header plugins.security.custom_serialization.enabled will be absent, always use custom serialization
    • Case 3: a node running OS >= 2.11.1
      • Request header plugins.security.custom_serialization.enabled will be present - if header value is true deserialize via custom serialization, JDK otherwise

@parasjain1
Copy link
Contributor

Have raised a PR - #3789

@stephen-crawford
Copy link
Contributor

[Triage] Marking this as triaged since it is being addressed by Paras' PR.

@stephen-crawford stephen-crawford added the triaged Issues labeled as 'Triaged' have been reviewed and are deemed actionable. label Dec 11, 2023
@nibix
Copy link
Collaborator

nibix commented Feb 13, 2024

It's not a quick fix, but I just wanted to point out that #3870 would fix this, as the concept proposes a new DLS/FLS implementation which no longer relies on the serialization of DLS/FLS configurations.

@pmarjou22
Copy link

pmarjou22 commented Aug 22, 2024

I confirm the performance problem for DLS is still existing in version 2.16. We had to rollback to 2.10, this is a blocker for our scenarios in production, the opensearch cluster completely unusable after upgrade once non administrator start using the application.

We are able to reproduce the problem locally on a vanilla opensearch docker

Using 100 indexes (sample ecommerce clones with 2K documents) , we confirmed with JMeter the same behavior that explained in this bug

Version | Admin Rights (Call / Sec) | Limited Rights via DLS (Call / Sec)
2.10 | 33.2 | 21.7
2.13 | 32.6 | 10.2
2.14 | 46.65 | 3.3
2.15 | 45 | 3.1
2.16 | 36.2 | 3.2

dls filter :
{"bool":{"filter":[{"match_all":{}},{"bool":{"should":[{"match_phrase":{"currency":"EUR"}},{"match_phrase":{"currency":"molestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestiemolestie"}}],"minimum_should_match":1}}]}}

POST path /sample_data_ecommerce/_search?request_cache=false
POST body
{"size":500,"query":{"bool":{"filter":[{"bool":{"should":[{"match_phrase":{"currency":"EUR"}},{"match_phrase":{"currency":"${randomVar}"}},{"match_phrase":{"currency":"CAD"}},{"match_phrase":{"currency":"POR"}},{"match_phrase":{"currency":"REE"}},{"match_phrase":{"currency":"DSS"}},{"match_phrase":{"currency":"FDF"}},{"match_phrase":{"currency":"KJK"}},{"match_phrase":{"currency":"NBV"}},{"match_phrase":{"currency":"AZE"}},{"match_phrase":{"currency":"WXW"}},{"match_phrase":{"currency":"JDS"}},{"match_phrase":{"currency":"MLA"}},{"match_phrase":{"currency":"TRE"}}],"minimum_should_match":1}}],"must_not":[{"match_phrase":{"customer_full_name":"Frances Davidson"}},{"match_phrase":{"customer_first_name":"Frances"}},{"match_phrase":{"user":"elyssa"}}]}}}

@peternied
Copy link
Member Author

@pmarjou22 Thanks for posting, could you create a new issue in this repo with these details? It looks like there might have been a second performance impacting issues as on 2.10 -> 2.11 the drop is explained with #2802, however, the drop from 2.13 -> 2.14 might be another issue.

Could you help us determine the root cause by run jprofiler on the OpenSearch v2.16+ node processing the request to point towards a root cause?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triaged Issues labeled as 'Triaged' have been reviewed and are deemed actionable.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants