Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize kinesis ingestion task assignment after resharding #12235

Conversation

AmatyaAvadhanula
Copy link
Contributor

@AmatyaAvadhanula AmatyaAvadhanula commented Feb 7, 2022

Description

  • When a kinesis stream is resharded, the original shards are closed. A large number of intermediate shards may also be created in the process which are eventually closed as well.
  • If a shard is closed before any records are put into it, it would be ideal to ignore this shard for ingestion, to increase efficiency. Skipping such tasks can help avoid waste of task resources due to unnecessary allocation.
  • While we read from kinesis for shards frequently, both open and closed shards are returned and it is expensive to determine if a closed shard was ever written to, since it requires polling each shard for its records.
  • Skipping "bad" shards when counting the partitions also helps since fewer slots may be allocated by the autoscaler.

KinesisRecordSupplier is used to get a list of all shards during Kinesis ingestion.
Additional methods have been added to determine which shards are closed and empty.
Repetitive calls to kinesis for shards' records are avoided by maintaining an in-memory cache in the supervisor.

Proposed Design:

  • The in memory cache is implemented in KinesisSupervisor to maintain closed shard (empty and non-empty, separately) to avoid making redundant expensive calls.
  • When the flag skipIgnorableShards is set to true, the cache is utilized and updated. This also means that "ignorable" shards no longer participate in task allocation for ingestion or for autoscaler estimations.
  • When a shard is discovered after being closed for the first time, it is added to the cache to ensure that unnecessary expensive calls are not made.
  • Active closed shards are computed to eliminate stale entries corresponding to expired shards

Limitations:

  • All closed shards (not just the empty ones) are polled once, which can be an overhead.
  • Since metadata is not updated for these shards, each supervisor restart would incur the above overhead again.

Alternative design:

Update the metdata with end offsets of closed and empty shards. This may be simpler to implement since it doesn't require a cache but would lead to waste of resources since a task would have to update the metadata

Key changed/added classes in this PR
  • KinesisSupervisor

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@AmatyaAvadhanula AmatyaAvadhanula marked this pull request as draft February 7, 2022 05:10
@AmatyaAvadhanula AmatyaAvadhanula marked this pull request as ready for review February 8, 2022 09:13
Copy link
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM. Minor comments.

Copy link
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approach looks good to me. Need some minor changes.

@@ -88,6 +91,11 @@
private final AWSCredentialsConfig awsCredentialsConfig;
private volatile Map<String, Long> currentPartitionTimeLag;

// Maintain sets of currently closed shards to find "bad" (closed and empty) shards
// Poll closed shards once and store the result to avoid redundant costly calls to kinesis
private final Set<String> emptyClosedShardIds = new TreeSet<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these be in thread-safe container, or have access be protected with lock?

Copy link
Contributor

@kfaraz kfaraz Feb 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out, @zachjsh.
This code would be executed by the SeekableStreamSupervisor while executing a RunNotice (scheduled when status of a task changes) as well as a DynamicAllocationTasksNotice (scheduled for auto-scaling). There is a possibility of contention between these two executions.

We can make the part where the caches are updated synchronized.
Just changing these two caches to a Concurrent version might not be enough as a whole new list of active shards is fetched in updateClosedShardCache() and the caches must be updated with this new state before any other action is performed.

cc: @AmatyaAvadhanula

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Synchronizing the whole method updateClosedShardCache would actually be preferable because the state returned by two subsequent calls to recordSupplier.getShards() can be different.
So this call should happen inside the synchronized block, as should the calls to recordSupplier.isClosedShardEmpty().

I hope this doesn't cause bottlenecks though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole method has been synchronized. Thanks!

@@ -2291,9 +2291,30 @@ protected boolean supportsPartitionExpiration()
return false;
}

protected boolean shouldSkipIgnorablePartitions()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we not do something similar for Kafka? Why is this not an issue with Kafka?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We haven't encountered something similar in Kafka yet. But the API has been put in place so that if KafkaSupervisor needs to do something similar, it can override the methods shouldSkipIgnorablePartitions() and getIgnorablePartitionIds().

For Kinesis, the ignorable partitions translate to empty and closed shards, which is a concept specific to Kinesis.

Copy link
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes, @AmatyaAvadhanula . LGTM.

* @return set of shards ignorable by kinesis ingestion
*/
@Override
protected Set<String> getIgnorablePartitionIds()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit - this should be called computeIgnorablePartitionIds() or loadIgnorablePartitionIds()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the current verb indicates that it is just a getter but behind the scenes it can do network calls etc to fetch the partition ids.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

@zachjsh zachjsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making changes. LGTM

@kfaraz kfaraz merged commit 1ec57cb into apache:master Feb 18, 2022
@abhishekagarwal87 abhishekagarwal87 added this to the 0.23.0 milestone May 11, 2022
AmatyaAvadhanula added a commit to AmatyaAvadhanula/druid that referenced this pull request Oct 13, 2022
AmatyaAvadhanula added a commit that referenced this pull request Oct 28, 2022
* Revert "Improve kinesis task assignment after resharding (#12235)"

This reverts commit 1ec57cb.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants