-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Clean left behind model state docs #30659
[ML] Clean left behind model state docs #30659
Conversation
Pinging @elastic/ml-core |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like it will fix the known problem.
I just left a suggestion on how we could improve encapsulation and extend it to any type of orphaned state document.
while (stateDocIdsIterator.hasNext()) { | ||
Deque<String> stateDocIds = stateDocIdsIterator.next(); | ||
for (String stateDocId : stateDocIds) { | ||
int modelStateSuffixIndex = stateDocId.indexOf("_model_state_"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to encapsulate this in the ModelState
class - say have a method jobIdFromDocId
or something like that.
for (String stateDocId : stateDocIds) { | ||
int modelStateSuffixIndex = stateDocId.indexOf("_model_state_"); | ||
if (modelStateSuffixIndex < 0) { | ||
// e.g. quantiles, etc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Theoretically we could also get orphaned quantiles and categorizer state documents (due to bugs we don't know about yet or people meddling with the contents of other indices).
So it would be good to add jobIdFromDocId
methods to the Quantiles
and CategorizerState
classes too. The pattern could be they return null
if they don't understand the format. Then only continue
if none of the classes we store in the state index can find the job ID from the document ID.
} | ||
|
||
private void executeDeleteUnusedStateDocs(BulkRequestBuilder deleteUnusedStateRequestBuilder, ActionListener<Boolean> listener) { | ||
LOGGER.info("Found {} unused model state documents; attempting to delete", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By convention the {}
should be in square brackets.
Also, if we delete all types of orphaned documents then model state
-> just state
(and in the other message below too).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
As one final robustness improvement, you could use lastIndexOf()
instead of indexOf()
in the 3 new extractJobId
methods. It would make a difference if somebody named their job detect_anomalies_in_my_quantiles
or something like that. But maybe I'm being too paranoid!
@droberts195 No, definitely a good idea. Paranoid is good! |
It is possible for model state documents to be left behind in the state index. This may be because of bugs or uncontrollable scenarios. In any case, those documents may take up quite some disk space when they add up. This commit adds a step in the expired data deletion that is part of the daily maintenance service. The new step searches for state documents that do not belong to any of the current jobs and deletes them. Closes elastic#30551
245d0bc
to
84f4923
Compare
It is possible for state documents to be left behind in the state index. This may be because of bugs or uncontrollable scenarios. In any case, those documents may take up quite some disk space when they add up. This commit adds a step in the expired data deletion that is part of the daily maintenance service. The new step searches for state documents that do not belong to any of the current jobs and deletes them. Closes #30551
It is possible for state documents to be left behind in the state index. This may be because of bugs or uncontrollable scenarios. In any case, those documents may take up quite some disk space when they add up. This commit adds a step in the expired data deletion that is part of the daily maintenance service. The new step searches for state documents that do not belong to any of the current jobs and deletes them. Closes #30551
…ngs-to-true * elastic/master: (25 commits) [DOCS] Replace X-Pack terms with attributes [ML] Clean left behind model state docs (elastic#30659) Correct typos filters agg docs duplicated 'bucket' word removal (elastic#30677) top_hits doc example description update (elastic#30676) [Docs] Replace InetSocketTransportAddress with TransportAdress (elastic#30673) [TEST] Account for increase in ML C++ memory usage (elastic#30675) User proper write-once semantics for GCS repository (elastic#30438) Remove bogus file accidentally added Add detailed assert message to IndexAuditUpgradeIT (elastic#30669) Adjust fast forward for token expiration test (elastic#30668) Improve explanation in rescore (elastic#30629) Deprecate `nGram` and `edgeNGram` names for ngram filters (elastic#30209) Watcher: Fix watch history template for dynamic slack attachments (elastic#30172) Fix _cluster/state to always return cluster_uuid (elastic#30656) [Tests] Add debug information to CorruptedFileIT Preserve REST client auth despite 401 response (elastic#30558) [test] packaging: add windows boxes (elastic#30402) Make xpack modules instead of a meta plugin (elastic#30589) Mute ShrinkIndexIT ...
* 6.x: Mute testCorruptFileThenSnapshotAndRestore Plugins: Remove meta plugins (#30670) Upgrade to Lucene-7.4.0-snapshot-59f2b7aec2 (#30726) Docs: Add uptasticsearch to list of clients (#30738) [TEST] Reduce forecast overflow to disk test memory limit (#30727) [DOCS] Removes redundant index.asciidoc files (#30707) [DOCS] Moves X-Pack configurationg pages in table of contents (#30702) [ML][TEST] Fix bucket count assertion in ModelPlotsIT (#30717) [ML][TEST] Make AutodetectMemoryLimitIT less fragile (#30716) [Build] Add test admin when starting gradle run with trial license and [ML] provide tmp storage for forecasting and possibly any ml native jobs #30399 Tests: Fail if test watches could not be triggered (#30392) Watcher: Prevent duplicate watch triggering during upgrade (#30643) [ML] add version information in case of crash of native ML process (#30674) Add detailed assert message to IndexAuditUpgradeIT (#30669) Preserve REST client auth despite 401 response (#30558) Make TransportClusterStateAction abide to our style (#30697) [DOCS] Fixes edit URLs for stack overview (#30583) [DOCS] Add missing callout in IndicesClientDocumentationIT Backport get settings API changes to 6.x (#30494) Silence sleep based watcher test [DOCS] Replace X-Pack terms with attributes Improve explanation in rescore (#30629) [test] packaging: add windows boxes (#30402) [ML] Clean left behind model state docs (#30659) filters agg docs duplicated 'bucket' word removal (#30677) top_hits doc example description update (#30676) MovingFunction Pipeline agg backport to 6.x (#30658) [Docs] Replace InetSocketTransportAddress with TransportAdress (#30673) [TEST] Account for increase in ML C++ memory usage (#30675) User proper write-once semantics for GCS repository (#30438) Deprecate `nGram` and `edgeNGram` names for ngram filters (#30209) Watcher: Fix watch history template for dynamic slack attachments (#30172) Fix _cluster/state to always return cluster_uuid (#30656)
* master: Scripting: Remove getDate methods from ScriptDocValues (#30690) Upgrade to Lucene-7.4.0-snapshot-59f2b7aec2 (#30726) [Docs] Fix single page :docs:check invocation (#30725) Docs: Add uptasticsearch to list of clients (#30738) [DOCS] Removes out-dated x-pack/docs/en/index.asciidoc [DOCS] Removes redundant index.asciidoc files (#30707) [TEST] Reduce forecast overflow to disk test memory limit (#30727) Plugins: Remove meta plugins (#30670) [DOCS] Moves X-Pack configurationg pages in table of contents (#30702) TEST: Add engine log to testCorruptFileThenSnapshotAndRestore [ML][TEST] Fix bucket count assertion in ModelPlotsIT (#30717) [ML][TEST] Make AutodetectMemoryLimitIT less fragile (#30716) Default copy settings to true and deprecate on the REST layer (#30598) [Build] Add test admin when starting gradle run with trial license and This implementation lazily (on 1st forecast request) checks for available diskspace and creates a subfolder for storing data outside of Lucene indexes, but as part of the ES data paths. Tests: Fail if test watches could not be triggered (#30392) [ML] add version information in case of crash of native ML process (#30674) Make TransportClusterStateAction abide to our style (#30697) Change required version for Get Settings transport API changes to 6.4.0 (#30706) [DOCS] Fixes edit URLs for stack overview (#30583) Silence sleep based watcher test [TEST] Adjust version skips for movavg/movfn tests [DOCS] Replace X-Pack terms with attributes [ML] Clean left behind model state docs (#30659) Correct typos filters agg docs duplicated 'bucket' word removal (#30677) top_hits doc example description update (#30676) [Docs] Replace InetSocketTransportAddress with TransportAdress (#30673) [TEST] Account for increase in ML C++ memory usage (#30675) User proper write-once semantics for GCS repository (#30438) Remove bogus file accidentally added Add detailed assert message to IndexAuditUpgradeIT (#30669) Adjust fast forward for token expiration test (#30668) Improve explanation in rescore (#30629) Deprecate `nGram` and `edgeNGram` names for ngram filters (#30209) Watcher: Fix watch history template for dynamic slack attachments (#30172) Fix _cluster/state to always return cluster_uuid (#30656) [Tests] Add debug information to CorruptedFileIT # Conflicts: # test/framework/src/main/java/org/elasticsearch/indices/analysis/AnalysisFactoryTestCase.java
It is possible for state documents to be left behind in the state index. This may be because of bugs or uncontrollable scenarios. In any case, those documents may take up quite some disk space when they add up. This commit adds a step in the expired data deletion that is part of the daily maintenance service. The new step searches for state documents that do not belong to any of the current jobs and deletes them. Closes elastic#30551
It is possible for model state documents to be
left behind in the state index. This may be
because of bugs or uncontrollable scenarios.
In any case, those documents may take up quite
some disk space when they add up. This commit
adds a step in the expired data deletion that
is part of the daily maintenance service. The
new step searches for state documents that
do not belong to any of the current jobs and
deletes them.
Closes #30551