Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

partition_allocator: exclude internal topics from per node partition limit checks #11249

Merged

Conversation

bharathv
Copy link
Contributor

@bharathv bharathv commented Jun 7, 2023

Internal topics are now excluded from partition allocator limit checks. This means that a given allocation node can host any number of such internal topic partitions. A topic is considered internal in the following cases:

  • It belongs to redpanda or kafka_internal namespaces or
  • It belongs to kafka namespace but included in kafka_nodelete_topics

The rationale for do so is that these topics can potentially be created lazily, for example transactions related topics are created when a user uses transactions for the first time. This could be problematic if the user has already exhausted topic_partitions_per_shard limits and the creation of these internal topics just hangs. To avoid such scenarios we exclude these internal topics from limit checks. This should not be problematic in practice because total number of such internal partitions in a typical deployment is fairly small compared to total number of non internal partitions.

Fixes: #10995

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.1.x
  • v22.3.x
  • v22.2.x

Release Notes

  • none

@bharathv bharathv marked this pull request as ready for review June 8, 2023 04:17
ztlpn
ztlpn previously approved these changes Jun 29, 2023
Comment on lines 34 to 41
ss::sharded<members_table>& /*members_table*/,
config::binding<std::optional<size_t>> /*memory_per_partition*/,
config::binding<std::optional<int32_t>> /*fds_per_partition*/,
config::binding<uint32_t> /*partitions_per_shard*/,
config::binding<uint32_t> /*partitions_reserve_shard0*/,
config::binding<
std::vector<ss::sstring>> /*kafka_topics_skipping_allocation*/,
config::binding<bool> /*enable_rack_awareness*/);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: comments are unnecessary, these can just be parameter names

return [](const allocation_node& node) { return !node.is_full(); };
make_evaluator(const model::ntp& ntp, const replicas_t&) const final {
return [&ntp](const allocation_node& node) {
return !node.is_full(ntp);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: not necessary to fix right now, but I have a feeling that allocation_state should be just "dumb state" (e.g. for full node check it should just provide the number of currently allocated partitions) and all logic should be in evaluators. The reason is that 1) evaluators already contain some of the logic, why not all? and 2) they have more context available (existing replicas, previous node for that replica (not available right now, but we'll have to add it soon)) so we'll either have to pass it all to allocation_state or just decide everything in the evaluators.

Copy link
Contributor Author

@bharathv bharathv Jun 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right.. this thought did cross my mind, I agree with you. Makes sense to move to constraint once the context becomes more complicated.

internal_topics.cbegin(),
internal_topics.cend(),
[&ntp](const ss::sstring& topic) {
return ntp.ns() == model::kafka_namespace && topic == ntp.tp.topic();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I guess namespace check can be hoisted out of the any_of loop

@bharathv bharathv self-assigned this Jun 29, 2023
Now that the constraint check depends on the ntp being allocated, this
propagates ntp into hard_constraints evaluator.
These topics are meant to be excluded from the reservation checks.
To be used in the next set of commits.
@bharathv
Copy link
Contributor Author

Rebased to fix conflicts and addressed comments.

@bharathv bharathv requested a review from ztlpn June 30, 2023 16:06
@piyushredpanda piyushredpanda merged commit e6191cf into redpanda-data:dev Jul 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

controller: create internal topics at startup, or reserve partition allocator space for them
3 participants