Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Improve hard_limit audit message #42086

Merged
merged 5 commits into from
May 17, 2019

Conversation

edsavage
Copy link
Contributor

Improve the hard_limit memory audit message by reporting how many bytes
over the configured memory limit the job was at the point of the last
allocation failure.

Previously the model memory usage was reported, however this was
inaccurate and hence of limited use - primarily because the total
memory used by the model can decrease significantly after the models'
status is changed to hard_limit but before the model size stats are
reported from autodetect to ES.

While this PR contains the changes to the format of the hard_limit audit
message it is dependent on modifications to the ml-cpp backend to
send additional data fields in the model size stats message. These
changes will follow in a subsequent PR. It is worth noting that this PR
must be merged prior to the ml-cpp one, to keep CI tests happy.

Relates #38034

Improve the hard_limit memory audit message by reporting how many bytes
over the configured memory limit the job was at the point of the last
allocation failure.

Previously the model memory usage was reported, however this was
inaccurate and hence of limited use -  primarily because the total
memory used by the model can decrease significantly after the models
status is changed to hard_limit but before the model size stats are
reported from autodetect to ES.

While this PR contains the changes to the format of the hard_limit audit
message it is dependent on modifications to the ml-cpp backend to
send additional data fields in the model size stats message. These
changes will follow in a subsequent PR. It is worth noting that this PR
must be merged prior to the ml-cpp one, to keep CI tests happy.

Relates elastic#38034
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core

@@ -148,6 +162,14 @@ public long getModelBytes() {
return modelBytes;
}

public long getModelBytesExceeded() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return value needs to be Long, otherwise a user could get an NPE.

return modelBytesExceeded;
}

public long getModelBytesMemoryLimit() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return value needs to be Long, otherwise a user could get an NPE.

@@ -219,6 +242,8 @@ public boolean equals(Object other) {

private final String jobId;
private long modelBytes;
private long modelBytesExceeded;
private long modelBytesMemoryLimit;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two need to be Long, otherwise they'll default to 0 and zeroes will then propagate into objects that should really contain nulls.

@@ -31,6 +31,8 @@
public void testDefaultConstructor() {
ModelSizeStats stats = new ModelSizeStats.Builder("foo").build();
assertEquals(0, stats.getModelBytes());
assertEquals(0, stats.getModelBytesExceeded());
assertEquals(0, stats.getModelBytesMemoryLimit());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two will change to null if the other changes I recommended are made. So that makes the values set by the default constructed builder inconsistent. But this is better than filling in values that didn't really exist in a JSON document.

if (in.getVersion().onOrAfter(Version.V_7_2_0)) {
modelBytesMemoryLimit = in.readOptionalLong();
} else {
modelBytesMemoryLimit = 0L;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be null, so we remember that the field didn't exist.

@@ -192,6 +224,14 @@ public long getModelBytes() {
return modelBytes;
}

public long getModelBytesExceeded() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return value needs to be Long, otherwise a user could get an NPE.

return modelBytesExceeded;
}

public long getModelBytesMemoryLimit() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return value needs to be Long, otherwise a user could get an NPE.

@@ -262,6 +304,8 @@ public boolean equals(Object other) {

private final String jobId;
private long modelBytes;
private long modelBytesExceeded;
private long modelBytesMemoryLimit;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two need to be Long, otherwise they'll default to 0 and zeroes will then propagate into objects that should really contain nulls.

@@ -22,6 +22,8 @@
public void testDefaultConstructor() {
ModelSizeStats stats = new ModelSizeStats.Builder("foo").build();
assertEquals(0, stats.getModelBytes());
assertEquals(0, stats.getModelBytesExceeded());
assertEquals(0, stats.getModelBytesMemoryLimit());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two will change to null if the other changes I recommended are made. So that makes the values set by the default constructed builder inconsistent. But this is better than filling in values that didn't really exist in a JSON document.

@@ -31,6 +31,8 @@
public void testDefaultConstructor() {
ModelSizeStats stats = new ModelSizeStats.Builder("foo").build();
assertEquals(0, stats.getModelBytes());
assertEquals(null, stats.getModelBytesExceeded());
assertEquals(null, stats.getModelBytesMemoryLimit());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use assertNull for these two.

edsavage added 2 commits May 17, 2019 08:10
Use an appropriate hard_limit audit message when model size stats
originate from a version prior to 7.2
Copy link
Contributor

@droberts195 droberts195 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@edsavage edsavage merged commit 8c01a8d into elastic:master May 17, 2019
edsavage added a commit that referenced this pull request May 17, 2019
Improve the hard_limit memory audit message by reporting how many bytes
over the configured memory limit the job was at the point of the last
allocation failure.

Previously the model memory usage was reported, however this was
inaccurate and hence of limited use -  primarily because the total
memory used by the model can decrease significantly after the models
status is changed to hard_limit but before the model size stats are
reported from autodetect to ES.

While this PR contains the changes to the format of the hard_limit audit
message it is dependent on modifications to the ml-cpp backend to
send additional data fields in the model size stats message. These
changes will follow in a subsequent PR. It is worth noting that this PR
must be merged prior to the ml-cpp one, to keep CI tests happy.
edsavage added a commit to edsavage/ml-cpp that referenced this pull request May 18, 2019
Add the current model memory limit and the number of bytes in
excess of that at the point of the last allocation failure to the model
size stats. These will be used to construct a (hopefully) more
informative hard_limit audit message.

The reported memory usage is also scaled to take into account the byte
limit margin, which is in play in the initial period of a jobs' lifetime
and is used to scale down the high memory limit. This should give a more
accurate representation of how close the memory usage is to the high
limit.

relates elastic/elasticsearch#42086

closes elastic/elasticsearch#38034
edsavage added a commit to elastic/ml-cpp that referenced this pull request May 18, 2019
Add the current model memory limit and the number of bytes in
excess of that at the point of the last allocation failure to the model
size stats. These will be used to construct a (hopefully) more
informative hard_limit audit message.

The reported memory usage is also scaled to take into account the byte
limit margin, which is in play in the initial period of a jobs' lifetime
and is used to scale down the high memory limit. This should give a more
accurate representation of how close the memory usage is to the high
limit.

relates elastic/elasticsearch#42086

closes elastic/elasticsearch#38034
edsavage added a commit to edsavage/ml-cpp that referenced this pull request May 18, 2019
Add the current model memory limit and the number of bytes in
excess of that at the point of the last allocation failure to the model
size stats. These will be used to construct a (hopefully) more
informative hard_limit audit message.

The reported memory usage is also scaled to take into account the byte
limit margin, which is in play in the initial period of a jobs' lifetime
and is used to scale down the high memory limit. This should give a more
accurate representation of how close the memory usage is to the high
limit.

relates elastic/elasticsearch#42086

closes elastic/elasticsearch#38034
edsavage added a commit to elastic/ml-cpp that referenced this pull request May 18, 2019
edsavage added a commit that referenced this pull request May 19, 2019
Muting a number of AutoDetectMemoryLimitIT tests to give CI a chance to
settle before easing in required backend changes.

relates elastic/ml-cpp#486
relates #42086
edsavage added a commit that referenced this pull request May 19, 2019
Muting a number of AutoDetectMemoryLimitIT tests to give CI a chance to
settle before easing in required backend changes.

relates elastic/ml-cpp#486
relates #42086
@edsavage edsavage deleted the improve_hard_limit_audit_message branch May 22, 2019 09:17
gurkankaymak pushed a commit to gurkankaymak/elasticsearch that referenced this pull request May 27, 2019
Improve the hard_limit memory audit message by reporting how many bytes
over the configured memory limit the job was at the point of the last
allocation failure.

Previously the model memory usage was reported, however this was
inaccurate and hence of limited use -  primarily because the total
memory used by the model can decrease significantly after the models
status is changed to hard_limit but before the model size stats are
reported from autodetect to ES.

While this PR contains the changes to the format of the hard_limit audit
message it is dependent on modifications to the ml-cpp backend to
send additional data fields in the model size stats message. These
changes will follow in a subsequent PR. It is worth noting that this PR
must be merged prior to the ml-cpp one, to keep CI tests happy.
gurkankaymak pushed a commit to gurkankaymak/elasticsearch that referenced this pull request May 27, 2019
Muting a number of AutoDetectMemoryLimitIT tests to give CI a chance to
settle before easing in required backend changes.

relates elastic/ml-cpp#486
relates elastic#42086
matriv added a commit to matriv/elasticsearch that referenced this pull request Apr 16, 2020
Fix MINIMUM_SCALE, MAXIMUM_SCALE and SQL_DATETIME_SUB
ODBC metadata for the DATE & TIME data types.

Fixes: elastic#42086
matriv added a commit that referenced this pull request Apr 16, 2020
Fix MINIMUM_SCALE, MAXIMUM_SCALE and SQL_DATETIME_SUB
ODBC metadata for the DATE & TIME data types.

Fixes: #42086
matriv added a commit to matriv/elasticsearch that referenced this pull request Apr 16, 2020
Fix MINIMUM_SCALE, MAXIMUM_SCALE and SQL_DATETIME_SUB
ODBC metadata for the DATE & TIME data types.

Fixes: elastic#42086
(cherry picked from commit c23677c)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants