-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Improve the accuracy of model memory control #122
Conversation
a function of elapsed time not buckets and assert on memory used vs target in limit test.
764246f
to
f927102
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Left a few minor comments.
~CDataGatherer(); | ||
CDataGatherer(const CDataGatherer&) = delete; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a cool language feature!
result += core::CMemory::dynamicSize(window); | ||
// The 0.3 is a rule-of-thumb estimate of the worst case | ||
// compression ratio we achieve on the test state. | ||
result += 0.3 * core::CMemory::dynamicSize(window); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we only compress on persist?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, no. As of #100, we compress the raw bytes of some this object we actually hold in memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I recall you saying we'd do that but I missed the fact it's already in. Cool.
lib/model/CAnomalyDetectorModel.cc
Outdated
@@ -95,7 +95,7 @@ CAnomalyDetectorModel::CAnomalyDetectorModel(bool isForPersistence, | |||
: // The copy of m_DataGatherer is a shallow copy. This would be unacceptable | |||
// if we were going to persist the data gatherer from within this class. | |||
// We don't, so that's OK, but the next issue is that another thread will be | |||
// modifying the data gatherer m_DataGatherer points to whilst this object | |||
// modifying the data gatherer m_DataGatherer points too whilst this object |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a typo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a typo before, the to[o] in this context means as well which is too rather than to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's still kind of a weird sentence but fair enough!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait a sec, I'm sorry you're actually completely right. I'd somehow (repeatedly) misread!
@@ -478,9 +478,7 @@ void CEventRateModel::debugMemoryUsage(core::CMemoryUsage::TMemoryUsagePtr mem) | |||
} | |||
|
|||
std::size_t CEventRateModel::memoryUsage() const { | |||
return this->CIndividualModel::memoryUsage() + | |||
core::CMemory::dynamicSize(m_InterimBucketCorrector); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is the memory for the interim bucket corrected accounted for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is getting accounted for in this->CEventRateModel::computeMemoryUsage()
. This is all tied up with the model memory estimation process we use, i.e. measuring the computed memory usage periodically then using a regression on those measurements. The extra memory used is effectively accounted for in that regressions' parameters.
// will be the overwhelmingly common source of additional memory | ||
// so the model memory should be accurate (on average) in this | ||
// time frame. | ||
double scale{1.0 - static_cast<double>(elapsedTime) / |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems the scale is fixed for a given bucket span. Should we consider setting it on construction?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could do given current usage. Although in the current usage this is called at the bucket frequency this doesn't feel like it has to be the case and we'd lose the ability to do this if that changed. Also, I think if we move this away from the API to adjust the margin it hides an important part of the implementation from the caller. If I changed how this function is called I think it would be easy to overlook that I need to change the value computed in the constructor. I think on balance I prefer to keep as is for that reason. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. This gives us flexibility to change the way it works if needed in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…31289) To avoid temporary failures, this also disables these tests until elastic/ml-cpp#122 is committed.
…31289) To avoid temporary failures, this also disables these tests until elastic/ml-cpp#122 is committed.
This makes a number of changes targeting our current memory control functionality. Specifically, these are:
CAnomalyDetectorModel
object and one held by aCAnomalyDetector
object. However, the memory was only accounted for byCAnomalyDetectorModel
. Since the reference count is two we were effectively halving its accounted memory. I've changed theCResourceMonitor
to work in terms ofCAnomalyDetector
objects and now account for both references to the data gatherer. This incidentally also means we account for the static size ofCAnomalyDetector
which was also be lost by the resource monitor. The impact can be large, especially for population models, for example inCAnomalyJobLimitTest::testModelledEntityCountForFixedMemoryLimit
we model 45% fewer over field values as a result.CDataGatherer
. For example, we have access to the partition field name via the search key so don't need a copy in this class as well. (Note that this also reduces the number of parameters to constructor, which had quite a lot of fallout on the unit tests.)As a result we now have accurate control of the true memory (I measured consistently within 5% in a unit test with a variety of realistic data characteristics).
Note I factored out the test changes, which are mainly fallout from the
CDataGatherer
constructor signature change and some tidy ups, from the functional code changes.This affects results for memory limited jobs only.