Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Transform] improve performance by using point in time API for search #74984

Merged
merged 9 commits into from
Jul 14, 2021

Conversation

hendrikmuhs
Copy link

Use point in time API for every checkpoint in transform. Using point in time reduces pressure on the source indexes, e.g. less refreshes. In case, pit isn't supported (e.g. when searching remote clusters) it falls back to ordinary search requests as before.

closes #73481

@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

@hendrikmuhs hendrikmuhs force-pushed the transform-pit branch 2 times, most recently from 691872c to 4737c80 Compare July 8, 2021 18:25
@hendrikmuhs
Copy link
Author

retest this please

Copy link
Contributor

@przemekwitek przemekwitek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines +124 to +126
if (getNextCheckpoint().getCheckpoint() != pitCheckpoint) {
closePointInTime();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that we should not move this execution thread forward until the pit is closed.

It is conceivable right (though unlikely) that this closePointInTime() is executing, but doSearch is being handled and consequently, we close the wrong PIT and leave one left over.

Copy link
Author

@hendrikmuhs hendrikmuhs Jul 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pit is "copied" (not literally, but the reference) and set to null in the sync part of closePointInTime(), see line 470++. So you are right that we might open a new pit while still closing the other, however that's allowed and I don't see a race condition that could lead to mixing up the two.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hendrikmuhs 100%, I misread the method. Setting a local variable synchronously should avoid that problem :).

Comment on lines +479 to +482
ActionListener.wrap(response -> { logger.trace("[{}] closed pit search context [{}]", getJobId(), oldPit); }, e -> {
// note: closing the pit should never throw, even if the pit is invalid
logger.error(new ParameterizedMessage("[{}] Failed to close point in time reader", getJobId()), e);
})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the logger.trace should have a message supplier like () -> new ParameterizedMessage to prevent strings from being created when trace is disabled.

Not a huge deal as this is not a "hot path"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this applies. This is only a problem if one or more arguments needs to be constructed, e.g. if getJobId() would build the id and therefore execute something. This is not the case.

The message string itself gets only constructed after the check whether trace is enabled or not.

I wish we have static code analysis for this this, it's such a common problem.

pit = new PointInTimeBuilder(response.getPointInTimeId()).setKeepAlive(PIT_KEEP_ALIVE);
searchRequest.source().pointInTimeBuilder(pit);
pitCheckpoint = getNextCheckpoint().getCheckpoint();
logger.trace("[{}] using pit search context with id [{}]", getJobId(), pit.getEncodedId());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comment () -> new ParameterizedMessage seems better to me for trace

private static final Logger logger = LogManager.getLogger(ClientTransformIndexer.class);

private final Client client;
private final AtomicBoolean oldStatsCleanedUp = new AtomicBoolean(false);

private final AtomicReference<SeqNoPrimaryTermAndIndex> seqNoPrimaryTermAndIndex;
private PointInTimeBuilder pit;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does pit and disablePit need to be volatile? They are accessed from separate threads in different execution paths.

Copy link
Author

@hendrikmuhs hendrikmuhs Jul 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pit is always accessed by the indexer thread, even the onStop call originates from the indexer, not from the _stop transport if that's what you mean

But I am unsure, the async behavior of the indexer might indeed be problematic in this case. I will check other variables, too.

@hendrikmuhs
Copy link
Author

@elasticmachine update branch

Comment on lines +124 to +126
if (getNextCheckpoint().getCheckpoint() != pitCheckpoint) {
closePointInTime();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hendrikmuhs 100%, I misread the method. Setting a local variable synchronously should avoid that problem :).

@hendrikmuhs
Copy link
Author

Test results

Using pit refreshes on the source index can be reduced significantly:

image

This chart compares a baseline (indexing without doing any searches/transforms), transform without pit (<7.15) and transform using pit (>7.15).

The usage of pit reduces the number of refreshes significantly, correlated to that the number of merges goes down from 1440 to 560 and the time spend merging from 17.1 minutes to 9.1 minutes.

It depends: This is not a representative benchmark. The benefit of using pit depends on data, ingest rates, the transform configuration and other query executors like dashboards that use the same source index.

In summary you might not see a resource reduction in the same order of magnitude with your data, however pit should reduce overhead for all use cases where the source index isn't static (continuous transform).

@hendrikmuhs hendrikmuhs merged commit 15a3b35 into elastic:master Jul 14, 2021
@hendrikmuhs hendrikmuhs deleted the transform-pit branch July 14, 2021 10:00
hendrikmuhs pushed a commit that referenced this pull request Jul 14, 2021
…earch (#75333)

Use point in time API for every checkpoint in transform. Using point in time reduces pressure
on the source indexes, e.g. less refreshes. In case, pit isn't supported (e.g. when searching
remote clusters) it falls back to ordinary search requests as before.

closes #73481
backport #74984
masseyke pushed a commit to masseyke/elasticsearch that referenced this pull request Jul 16, 2021
…elastic#74984)

Use point in time API for every checkpoint in transform. Using point in time reduces pressure
on the source indexes, e.g. less refreshes. In case, pit isn't supported (e.g. when searching
remote clusters) it falls back to ordinary search requests as before.

closes elastic#73481
hendrikmuhs pushed a commit that referenced this pull request Jul 22, 2021
Fix a unreleased regression introduced in #74984. In case a pit search context disappeared the listener was called twice and the transform fails.
elasticsearchmachine pushed a commit to elasticsearchmachine/elasticsearch that referenced this pull request Jul 22, 2021
…c#75615)

Fix a unreleased regression introduced in elastic#74984. In case a pit search context disappeared the listener was called twice and the transform fails.
hendrikmuhs pushed a commit that referenced this pull request Jul 22, 2021
…75615) (#75619)

Fix a unreleased regression introduced in #74984. In case a pit search context disappeared the listener was called twice and the transform fails.
ywangd pushed a commit to ywangd/elasticsearch that referenced this pull request Jul 30, 2021
…c#75615)

Fix a unreleased regression introduced in elastic#74984. In case a pit search context disappeared the listener was called twice and the transform fails.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Transform] Use point in time search, optimize query execution
5 participants