Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow total memory to be overridden #78750

Merged
merged 31 commits into from
Oct 16, 2021
Merged
Show file tree
Hide file tree
Changes from 17 commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
cd22163
Allow total memory to be overridden
droberts195 Oct 6, 2021
0f1dd57
Fix formatting
droberts195 Oct 6, 2021
26c5579
Fix test
droberts195 Oct 6, 2021
431c8bd
Fix more tests
droberts195 Oct 6, 2021
7d09701
Merge branch 'master' into add_memory_override
droberts195 Oct 6, 2021
6acad57
Adding packaging test
droberts195 Oct 6, 2021
0f27a97
Update docs/changelog/78750.yaml
droberts195 Oct 6, 2021
824019a
Fix changelog
droberts195 Oct 6, 2021
898fec6
Update docs/changelog/78750.yaml
droberts195 Oct 6, 2021
82c7937
Fix test
droberts195 Oct 6, 2021
d1d58e6
Adding a launcher test for option parse failure
droberts195 Oct 6, 2021
017263b
Addressing comments related to the launcher code
droberts195 Oct 7, 2021
2a74474
Add missing word
droberts195 Oct 7, 2021
36410d2
Merge branch 'master' into add_memory_override
droberts195 Oct 7, 2021
a4bc21a
Adjust comment
droberts195 Oct 7, 2021
795f155
Adding a packaging test
droberts195 Oct 7, 2021
42a000a
Adding an archive test and fixing package test
droberts195 Oct 7, 2021
3c1eabd
Move packaging test out of systemd section
droberts195 Oct 11, 2021
8df4836
Merge branch 'master' into add_memory_override
droberts195 Oct 11, 2021
b4c646c
Fix test
droberts195 Oct 11, 2021
4e7b3c1
Merge branch 'master' into add_memory_override
droberts195 Oct 13, 2021
449cf2a
Using always present adjusted_total instead of optional total_override
droberts195 Oct 13, 2021
ee59b9f
Fixing tests
droberts195 Oct 13, 2021
2f71542
Merge branch 'master' into add_memory_override
droberts195 Oct 13, 2021
7dcd8ec
Merge branch 'master' into add_memory_override
droberts195 Oct 15, 2021
be14364
Address code review comments
droberts195 Oct 15, 2021
8741c98
Adapt to packaging test framework changes
droberts195 Oct 15, 2021
9731c2d
Merge branch 'master' into add_memory_override
droberts195 Oct 15, 2021
71f7a2a
Packaging tests need https now
droberts195 Oct 15, 2021
1cd19bf
Set up security after calling install()
droberts195 Oct 15, 2021
becace8
Merge branch 'master' into add_memory_override
elasticmachine Oct 16, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -123,7 +123,6 @@ private List<String> jvmOptions(final Path config, Path plugins, final String es
throws InterruptedException, IOException, JvmOptionsFileParserException {

final List<String> jvmOptions = readJvmOptionsFiles(config);
final MachineDependentHeap machineDependentHeap = new MachineDependentHeap(new DefaultSystemMemoryInfo());

if (esJavaOpts != null) {
jvmOptions.addAll(
Expand All @@ -132,6 +131,9 @@ private List<String> jvmOptions(final Path config, Path plugins, final String es
}

final List<String> substitutedJvmOptions = substitutePlaceholders(jvmOptions, Collections.unmodifiableMap(substitutions));
final MachineDependentHeap machineDependentHeap = new MachineDependentHeap(
new OverridableSystemMemoryInfo(substitutedJvmOptions, new DefaultSystemMemoryInfo())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of having two different memory info types here rather than always wrapping the default.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes testing more brittle, because then the logic to determine which to use has to live in JvmOptionsParser, which is poor coupling IMO. The benefit of this is that neither the options parser nor machine dependent heap has to know anything about this magic system property, that's completely encapsulated in OverrideableSystemMemoryInfo, which can then be unit tested on its own with ease.

);
substitutedJvmOptions.addAll(machineDependentHeap.determineHeapSettings(config, substitutedJvmOptions));
final List<String> ergonomicJvmOptions = JvmErgonomics.choose(substitutedJvmOptions);
final List<String> systemJvmOptions = SystemJvmOptions.systemJvmOptions();
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
/*
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
* or more contributor license agreements. Licensed under the Elastic License
* 2.0 and the Server Side Public License, v 1; you may not use this file except
* in compliance with, at your election, the Elastic License 2.0 or the Server
* Side Public License, v 1.
*/

package org.elasticsearch.tools.launchers;

import java.util.List;
import java.util.Objects;

/**
* A {@link SystemMemoryInfo} which returns a user-overridden memory size if one
* has been specified using the {@code es.total_memory_bytes} system property, or
* else returns the value provided by a fallback provider.
*/
public final class OverridableSystemMemoryInfo implements SystemMemoryInfo {

private final List<String> userDefinedJvmOptions;
private final SystemMemoryInfo fallbackSystemMemoryInfo;

public OverridableSystemMemoryInfo(final List<String> userDefinedJvmOptions, SystemMemoryInfo fallbackSystemMemoryInfo) {
this.userDefinedJvmOptions = Objects.requireNonNull(userDefinedJvmOptions);
this.fallbackSystemMemoryInfo = Objects.requireNonNull(fallbackSystemMemoryInfo);
}

@Override
public long availableSystemMemory() throws SystemMemoryInfoException {

return userDefinedJvmOptions.stream()
.filter(option -> option.startsWith("-Des.total_memory_bytes="))
.map(totalMemoryBytesOption -> {
try {
long bytes = Long.parseLong(totalMemoryBytesOption.split("=", 2)[1]);
if (bytes < 0) {
throw new IllegalArgumentException("Negative memory size specified in [" + totalMemoryBytesOption + "]");
}
return bytes;
} catch (NumberFormatException e) {
throw new IllegalArgumentException("Unable to parse number of bytes from [" + totalMemoryBytesOption + "]", e);
}
})
.reduce((previous, current) -> current) // this is effectively findLast(), so that ES_JAVA_OPTS overrides jvm.options
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn’t we fail if the option is specified more than once?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that usually Elasticsearch fails if duplicate settings are provided. But not with those that are specified using Java system properties.

For example, if I run this:

echo '-Des.index.max_number_of_shards=1111' > config/jvm.options.d/shards.options
bin/elasticsearch

then Elasticsearch runs with maximum shards per index set to 1111.

If I run this:

echo '-Des.index.max_number_of_shards=1111' > config/jvm.options.d/shards.options
ES_JAVA_OPTS=-Des.index.max_number_of_shards=1212 bin/elasticsearch

then Elasticsearch runs with maximum shards per index set to 1212.

So with system properties, specifying them multiple times isn't an error and ES_JAVA_OPTS takes precedence over jvm.options.

I am matching that behaviour here.

If it's undesirable and needs to be changed then I think that should be done for all system properties rather than just this one.

.orElse(fallbackSystemMemoryInfo.availableSystemMemory());
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
/*
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
* or more contributor license agreements. Licensed under the Elastic License
* 2.0 and the Server Side Public License, v 1; you may not use this file except
* in compliance with, at your election, the Elastic License 2.0 or the Server
* Side Public License, v 1.
*/

package org.elasticsearch.tools.launchers;

import org.elasticsearch.tools.launchers.SystemMemoryInfo.SystemMemoryInfoException;

import java.util.List;

import static org.hamcrest.Matchers.is;
import static org.junit.Assert.assertThat;
import static org.junit.Assert.fail;

public class OverridableSystemMemoryInfoTests extends LaunchersTestCase {

private static final long FALLBACK = -1L;

public void testNoOptions() throws SystemMemoryInfoException {
final SystemMemoryInfo memoryInfo = new OverridableSystemMemoryInfo(List.of(), fallbackSystemMemoryInfo());
assertThat(memoryInfo.availableSystemMemory(), is(FALLBACK));
}

public void testNoOverrides() throws SystemMemoryInfoException {
final SystemMemoryInfo memoryInfo = new OverridableSystemMemoryInfo(List.of("-Da=b", "-Dx=y"), fallbackSystemMemoryInfo());
assertThat(memoryInfo.availableSystemMemory(), is(FALLBACK));
}

public void testValidSingleOverride() throws SystemMemoryInfoException {
final SystemMemoryInfo memoryInfo = new OverridableSystemMemoryInfo(
List.of("-Des.total_memory_bytes=123456789"),
fallbackSystemMemoryInfo()
);
assertThat(memoryInfo.availableSystemMemory(), is(123456789L));
}

public void testValidOverrideInList() throws SystemMemoryInfoException {
final SystemMemoryInfo memoryInfo = new OverridableSystemMemoryInfo(
List.of("-Da=b", "-Des.total_memory_bytes=987654321", "-Dx=y"),
fallbackSystemMemoryInfo()
);
assertThat(memoryInfo.availableSystemMemory(), is(987654321L));
}

public void testMultipleValidOverridesInList() throws SystemMemoryInfoException {
final SystemMemoryInfo memoryInfo = new OverridableSystemMemoryInfo(
List.of("-Des.total_memory_bytes=123456789", "-Da=b", "-Des.total_memory_bytes=987654321", "-Dx=y"),
fallbackSystemMemoryInfo()
);
assertThat(memoryInfo.availableSystemMemory(), is(987654321L));
}

public void testNegativeOverride() throws SystemMemoryInfoException {
final SystemMemoryInfo memoryInfo = new OverridableSystemMemoryInfo(
List.of("-Da=b", "-Des.total_memory_bytes=-123", "-Dx=y"),
fallbackSystemMemoryInfo()
);
try {
memoryInfo.availableSystemMemory();
fail("expected to fail");
} catch (IllegalArgumentException e) {
assertThat(e.getMessage(), is("Negative memory size specified in [-Des.total_memory_bytes=-123]"));
}
}

public void testUnparsableOverride() throws SystemMemoryInfoException {
final SystemMemoryInfo memoryInfo = new OverridableSystemMemoryInfo(
List.of("-Da=b", "-Des.total_memory_bytes=invalid", "-Dx=y"),
fallbackSystemMemoryInfo()
);
try {
memoryInfo.availableSystemMemory();
fail("expected to fail");
} catch (IllegalArgumentException e) {
assertThat(e.getMessage(), is("Unable to parse number of bytes from [-Des.total_memory_bytes=invalid]"));
}
}

private static SystemMemoryInfo fallbackSystemMemoryInfo() {
return () -> FALLBACK;
}
}
6 changes: 6 additions & 0 deletions docs/changelog/78750.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
pr: 78750
summary: Allow total memory to be overridden
area: Packaging
type: enhancement
issues:
- 65905
11 changes: 11 additions & 0 deletions docs/reference/cluster/nodes-stats.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -1036,6 +1036,17 @@ Total amount of physical memory.
(integer)
Total amount of physical memory in bytes.

`total_override`::
(<<byte-units,byte value>>)
If the amount of physical memory has been overridden using the `es.total_memory_bytes`
system property then this reports the overridden value. Otherwise it is not present.

`total_override_in_bytes`::
(integer)
If the amount of physical memory has been overridden using the `es.total_memory_bytes`
system property then this reports the overridden value in bytes. Otherwise it is not
present.

`free`::
(<<byte-units,byte value>>)
Amount of free physical memory.
Expand Down
12 changes: 12 additions & 0 deletions docs/reference/cluster/stats.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -916,6 +916,18 @@ Total amount of physical memory across all selected nodes.
(integer)
Total amount, in bytes, of physical memory across all selected nodes.

`total_override`::
(<<byte-units,byte value>>)
If the amount of physical memory has been overridden using the `es.total_memory_bytes`
system property on all selected nodes then this reports the sum of the overridden
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the total memory was only overridden on some of the selected nodes?

Copy link
Contributor Author

@droberts195 droberts195 Oct 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is a tricky one. I opted not to report any override at all at the cluster level if some nodes have overrides and others don't. (You can still get all the values from the node stats.) I guess the alternative would be to report the sum of overrides on the nodes that have overrides, plus un-overridden total on the nodes that don't, but only if at least one node has an override. Maybe that's better - I'd be interested to hear if subsequent reviewers have any thoughts on this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also ties in with #78750 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we need to have any distinct concept of override in the stats at all. With available disk space, when limited through cgroups, we do not show what the real disk has available. Why not just show this as “this is the memory available”?

values. Otherwise it is not present.

`total_override_in_bytes`::
(integer)
If the amount of physical memory has been overridden using the `es.total_memory_bytes`
system property on all selected nodes then this reports the sum of the overridden
values in bytes. Otherwise it is not present.

`free`::
(<<byte-units, byte units>>)
Amount of free physical memory across all selected nodes.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -346,6 +346,27 @@ public void test73CustomJvmOptionsDirectoryFilesWithoutOptionsExtensionIgnored()
}
}

public void test74CustomJvmOptionsTotalMemoryOverride() throws Exception {
final Path heapOptions = installation.config(Paths.get("jvm.options.d", "total_memory.options"));
try {
setHeap(null); // delete default options
// Work as though total system memory is 850MB
append(heapOptions, "-Des.total_memory_bytes=891289600\n");

startElasticsearch();

final String nodesStatsResponse = makeRequest(Request.Get("http://localhost:9200/_nodes/stats"));
assertThat(nodesStatsResponse, containsString("\"total_override_in_bytes\":891289600"));
final String nodesResponse = makeRequest(Request.Get("http://localhost:9200/_nodes"));
// 40% of 850MB
assertThat(nodesResponse, containsString("\"heap_init_in_bytes\":356515840"));

stopElasticsearch();
} finally {
rm(heapOptions);
}
}

public void test80RelativePathConf() throws Exception {

withCustomConfig(tempConf -> {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -868,22 +868,45 @@ public void test140CgroupOsStatsAreAvailable() throws Exception {
* logic sets the correct heap size, based on the container limits.
*/
public void test150MachineDependentHeap() throws Exception {
final List<String> xArgs = machineDependentHeapTest("942m", List.of());

// This is roughly 0.4 * 942
assertThat(xArgs, hasItems("-Xms376m", "-Xmx376m"));
}

/**
* Check that when available system memory is constrained by a total memory override as well as Docker,
* the machine-dependant heap sizing logic sets the correct heap size, preferring the override to the
* container limits.
*/
public void test151MachineDependentHeapWithSizeOverride() throws Exception {
final List<String> xArgs = machineDependentHeapTest(
"942m",
// 799014912 = 762m
List.of("-Des.total_memory_bytes=799014912")
);

// This is roughly 0.4 * 762, in particular it's NOT 0.4 * 942
assertThat(xArgs, hasItems("-Xms304m", "-Xmx304m"));
}

private List<String> machineDependentHeapTest(final String containerMemory, final List<String> extraJvmOptions) throws Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should make sure we're testing this scenario in our ArchiveTests and PackageTests as well to get full coverage across packaging types.

// Start by ensuring `jvm.options` doesn't define any heap options
final Path jvmOptionsPath = tempDir.resolve("jvm.options");
final Path containerJvmOptionsPath = installation.config("jvm.options");
copyFromContainer(containerJvmOptionsPath, jvmOptionsPath);

final List<String> jvmOptions = Files.readAllLines(jvmOptionsPath)
.stream()
.filter(line -> (line.startsWith("-Xms") || line.startsWith("-Xmx")) == false)
.collect(Collectors.toList());
final List<String> jvmOptions = Stream.concat(
Files.readAllLines(jvmOptionsPath).stream().filter(line -> (line.startsWith("-Xms") || line.startsWith("-Xmx")) == false),
extraJvmOptions.stream()
).collect(Collectors.toList());

Files.writeString(jvmOptionsPath, String.join("\n", jvmOptions));

// Now run the container, being explicit about the available memory
runContainer(
distribution(),
builder().memory("942m").volume(jvmOptionsPath, containerJvmOptionsPath).envVar("ELASTIC_PASSWORD", PASSWORD)
builder().memory(containerMemory).volume(jvmOptionsPath, containerJvmOptionsPath).envVar("ELASTIC_PASSWORD", PASSWORD)
);
waitForElasticsearch(installation, USERNAME, PASSWORD);

Expand All @@ -897,12 +920,9 @@ public void test150MachineDependentHeap() throws Exception {
final JsonNode jsonNode = new ObjectMapper().readTree(jvmArgumentsLine.get());

final String argsStr = jsonNode.get("message").textValue();
final List<String> xArgs = Arrays.stream(argsStr.substring(1, argsStr.length() - 1).split(",\\s*"))
return Arrays.stream(argsStr.substring(1, argsStr.length() - 1).split(",\\s*"))
.filter(arg -> arg.startsWith("-X"))
.collect(Collectors.toList());

// This is roughly 0.4 * 942
assertThat(xArgs, hasItems("-Xms376m", "-Xmx376m"));
}

/**
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -298,6 +298,32 @@ public void test81CustomPathConfAndJvmOptions() throws Exception {
cleanup();
}

public void test82JvmOptionsTotalMemoryOverride() throws Exception {
assumeTrue(isSystemd());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would this test only work with systemd?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied this bit from the test immediately above, which is the most similar one in the file. There's a comment on line 256 saying all the tests below are for systemd only. I'll rename this test and move it higher up the file so that it's not in the systemd-only section.


install();

assertPathsExist(installation.envFile);
stopElasticsearch();

withCustomConfig(tempConf -> {
// Work as though total system memory is 850MB
append(installation.envFile, "ES_JAVA_OPTS=\"-Des.total_memory_bytes=891289600\"");

startElasticsearch();

final String nodesStatsResponse = makeRequest(Request.Get("http://localhost:9200/_nodes/stats"));
assertThat(nodesStatsResponse, containsString("\"total_override_in_bytes\":891289600"));

// 40% of 850MB
assertThat(sh.run("ps auwwx").stdout, containsString("-Xmx340m -Xms340m"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is failing in CI - I had a look and I didn't see these options being applied.

I'd probably refine this assertion, to first find the Elasticsearch invocation, then break it into arguments, then do some kind of assertThat(arguments, hasItems("-Xmx340m", "-Xms340m")) assertion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the command line has -Xmx1g -Xms1g. That implies that the ergonomic heap sizing code is applying defaults. Presumably that’s because it cannot determine the node roles. I’ll have a closer look next week.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found the cause. It wasn't node roles. PackagingTestCase has a @Before method that explicitly sets the heap to 1g. I've added a setHeap(null) to try to get rid of that in my test.


stopElasticsearch();
});

cleanup();
}

public void test83SystemdMask() throws Exception {
try {
assumeTrue(isSystemd());
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -266,20 +266,33 @@ private OsStats(List<NodeInfo> nodeInfos, List<NodeStats> nodeStatsList) {
this.allocatedProcessors = allocatedProcessors;

long totalMemory = 0;
Long totalMemoryOverride = 0L;
long freeMemory = 0;
for (NodeStats nodeStats : nodeStatsList) {
if (nodeStats.getOs() != null) {
long total = nodeStats.getOs().getMem().getTotal().getBytes();
org.elasticsearch.monitor.os.OsStats.Mem mem = nodeStats.getOs().getMem();
long total = mem.getTotal().getBytes();
if (total > 0) {
totalMemory += total;
}
// Only report a total memory override for the whole cluster if every node has overridden total memory
if (totalMemoryOverride != null) {
if (mem.getTotalOverride() != null) {
long totalOverride = mem.getTotalOverride().getBytes();
if (totalOverride > 0) {
totalMemoryOverride += totalOverride;
}
} else {
totalMemoryOverride = null;
}
}
long free = nodeStats.getOs().getMem().getFree().getBytes();
if (free > 0) {
freeMemory += free;
}
}
}
this.mem = new org.elasticsearch.monitor.os.OsStats.Mem(totalMemory, freeMemory);
this.mem = new org.elasticsearch.monitor.os.OsStats.Mem(totalMemory, totalMemoryOverride, freeMemory);
}

public int getAvailableProcessors() {
Expand Down
Loading