-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Improve hard_limit audit message #42086
[ML] Improve hard_limit audit message #42086
Conversation
Improve the hard_limit memory audit message by reporting how many bytes over the configured memory limit the job was at the point of the last allocation failure. Previously the model memory usage was reported, however this was inaccurate and hence of limited use - primarily because the total memory used by the model can decrease significantly after the models status is changed to hard_limit but before the model size stats are reported from autodetect to ES. While this PR contains the changes to the format of the hard_limit audit message it is dependent on modifications to the ml-cpp backend to send additional data fields in the model size stats message. These changes will follow in a subsequent PR. It is worth noting that this PR must be merged prior to the ml-cpp one, to keep CI tests happy. Relates elastic#38034
Pinging @elastic/ml-core |
@@ -148,6 +162,14 @@ public long getModelBytes() { | |||
return modelBytes; | |||
} | |||
|
|||
public long getModelBytesExceeded() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The return value needs to be Long
, otherwise a user could get an NPE.
return modelBytesExceeded; | ||
} | ||
|
||
public long getModelBytesMemoryLimit() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The return value needs to be Long
, otherwise a user could get an NPE.
@@ -219,6 +242,8 @@ public boolean equals(Object other) { | |||
|
|||
private final String jobId; | |||
private long modelBytes; | |||
private long modelBytesExceeded; | |||
private long modelBytesMemoryLimit; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two need to be Long
, otherwise they'll default to 0
and zeroes will then propagate into objects that should really contain null
s.
...nt/rest-high-level/src/main/java/org/elasticsearch/client/ml/job/process/ModelSizeStats.java
Show resolved
Hide resolved
@@ -31,6 +31,8 @@ | |||
public void testDefaultConstructor() { | |||
ModelSizeStats stats = new ModelSizeStats.Builder("foo").build(); | |||
assertEquals(0, stats.getModelBytes()); | |||
assertEquals(0, stats.getModelBytesExceeded()); | |||
assertEquals(0, stats.getModelBytesMemoryLimit()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two will change to null
if the other changes I recommended are made. So that makes the values set by the default constructed builder inconsistent. But this is better than filling in values that didn't really exist in a JSON document.
if (in.getVersion().onOrAfter(Version.V_7_2_0)) { | ||
modelBytesMemoryLimit = in.readOptionalLong(); | ||
} else { | ||
modelBytesMemoryLimit = 0L; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be null
, so we remember that the field didn't exist.
@@ -192,6 +224,14 @@ public long getModelBytes() { | |||
return modelBytes; | |||
} | |||
|
|||
public long getModelBytesExceeded() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The return value needs to be Long
, otherwise a user could get an NPE.
return modelBytesExceeded; | ||
} | ||
|
||
public long getModelBytesMemoryLimit() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The return value needs to be Long
, otherwise a user could get an NPE.
@@ -262,6 +304,8 @@ public boolean equals(Object other) { | |||
|
|||
private final String jobId; | |||
private long modelBytes; | |||
private long modelBytesExceeded; | |||
private long modelBytesMemoryLimit; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two need to be Long
, otherwise they'll default to 0
and zeroes will then propagate into objects that should really contain null
s.
@@ -22,6 +22,8 @@ | |||
public void testDefaultConstructor() { | |||
ModelSizeStats stats = new ModelSizeStats.Builder("foo").build(); | |||
assertEquals(0, stats.getModelBytes()); | |||
assertEquals(0, stats.getModelBytesExceeded()); | |||
assertEquals(0, stats.getModelBytesMemoryLimit()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two will change to null
if the other changes I recommended are made. So that makes the values set by the default constructed builder inconsistent. But this is better than filling in values that didn't really exist in a JSON document.
…e_hard_limit_audit_message
@@ -31,6 +31,8 @@ | |||
public void testDefaultConstructor() { | |||
ModelSizeStats stats = new ModelSizeStats.Builder("foo").build(); | |||
assertEquals(0, stats.getModelBytes()); | |||
assertEquals(null, stats.getModelBytesExceeded()); | |||
assertEquals(null, stats.getModelBytesMemoryLimit()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use assertNull
for these two.
Use an appropriate hard_limit audit message when model size stats originate from a version prior to 7.2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Improve the hard_limit memory audit message by reporting how many bytes over the configured memory limit the job was at the point of the last allocation failure. Previously the model memory usage was reported, however this was inaccurate and hence of limited use - primarily because the total memory used by the model can decrease significantly after the models status is changed to hard_limit but before the model size stats are reported from autodetect to ES. While this PR contains the changes to the format of the hard_limit audit message it is dependent on modifications to the ml-cpp backend to send additional data fields in the model size stats message. These changes will follow in a subsequent PR. It is worth noting that this PR must be merged prior to the ml-cpp one, to keep CI tests happy.
Add the current model memory limit and the number of bytes in excess of that at the point of the last allocation failure to the model size stats. These will be used to construct a (hopefully) more informative hard_limit audit message. The reported memory usage is also scaled to take into account the byte limit margin, which is in play in the initial period of a jobs' lifetime and is used to scale down the high memory limit. This should give a more accurate representation of how close the memory usage is to the high limit. relates elastic/elasticsearch#42086 closes elastic/elasticsearch#38034
Add the current model memory limit and the number of bytes in excess of that at the point of the last allocation failure to the model size stats. These will be used to construct a (hopefully) more informative hard_limit audit message. The reported memory usage is also scaled to take into account the byte limit margin, which is in play in the initial period of a jobs' lifetime and is used to scale down the high memory limit. This should give a more accurate representation of how close the memory usage is to the high limit. relates elastic/elasticsearch#42086 closes elastic/elasticsearch#38034
Add the current model memory limit and the number of bytes in excess of that at the point of the last allocation failure to the model size stats. These will be used to construct a (hopefully) more informative hard_limit audit message. The reported memory usage is also scaled to take into account the byte limit margin, which is in play in the initial period of a jobs' lifetime and is used to scale down the high memory limit. This should give a more accurate representation of how close the memory usage is to the high limit. relates elastic/elasticsearch#42086 closes elastic/elasticsearch#38034
Muting a number of AutoDetectMemoryLimitIT tests to give CI a chance to settle before easing in required backend changes. relates elastic/ml-cpp#486 relates #42086
Muting a number of AutoDetectMemoryLimitIT tests to give CI a chance to settle before easing in required backend changes. relates elastic/ml-cpp#486 relates #42086
Improve the hard_limit memory audit message by reporting how many bytes over the configured memory limit the job was at the point of the last allocation failure. Previously the model memory usage was reported, however this was inaccurate and hence of limited use - primarily because the total memory used by the model can decrease significantly after the models status is changed to hard_limit but before the model size stats are reported from autodetect to ES. While this PR contains the changes to the format of the hard_limit audit message it is dependent on modifications to the ml-cpp backend to send additional data fields in the model size stats message. These changes will follow in a subsequent PR. It is worth noting that this PR must be merged prior to the ml-cpp one, to keep CI tests happy.
Muting a number of AutoDetectMemoryLimitIT tests to give CI a chance to settle before easing in required backend changes. relates elastic/ml-cpp#486 relates elastic#42086
Fix MINIMUM_SCALE, MAXIMUM_SCALE and SQL_DATETIME_SUB ODBC metadata for the DATE & TIME data types. Fixes: elastic#42086
Fix MINIMUM_SCALE, MAXIMUM_SCALE and SQL_DATETIME_SUB ODBC metadata for the DATE & TIME data types. Fixes: #42086
Fix MINIMUM_SCALE, MAXIMUM_SCALE and SQL_DATETIME_SUB ODBC metadata for the DATE & TIME data types. Fixes: elastic#42086 (cherry picked from commit c23677c)
Improve the hard_limit memory audit message by reporting how many bytes
over the configured memory limit the job was at the point of the last
allocation failure.
Previously the model memory usage was reported, however this was
inaccurate and hence of limited use - primarily because the total
memory used by the model can decrease significantly after the models'
status is changed to hard_limit but before the model size stats are
reported from autodetect to ES.
While this PR contains the changes to the format of the hard_limit audit
message it is dependent on modifications to the ml-cpp backend to
send additional data fields in the model size stats message. These
changes will follow in a subsequent PR. It is worth noting that this PR
must be merged prior to the ml-cpp one, to keep CI tests happy.
Relates #38034