Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add forward iterator to async_for_each #16676

Merged
merged 1 commit into from
Feb 23, 2024

Conversation

travisdowns
Copy link
Member

async_for_each originally assumed random access iterators, but we can also support forward iterators at the cost of some complexity.

The main limitation with forward iterators is that we cannot do arithmetic on the iterators to determine the size of the range and to pre-calculate end iterators. So we use counted loops instead when those iterators are used (we dispatch to a separate path for random access iterators).

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.3.x
  • v23.2.x
  • v23.1.x

Release Notes

  • none

async_for_each originally assumed random access iterators, but we
can also support forward iterators at the cost of some complexity.

The main limitation with forward iterators is that we cannot do
arithmetic on the iterators to determine the size of the range and
to pre-calculate end iterators. So we use counted loops instead when
those iterators are used (we dispatch to a separate path for random
access iterators).
if (counter.count >= Traits::interval) {
co_await ss::coroutine::maybe_yield();
counter.count = 0;
Traits::yield_called();
}
} while (begin != end);

counter.count += FIXED_COST;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this was only really needed in the fast case?

Copy link
Member Author

@travisdowns travisdowns Feb 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do it both cases both before and after this change. The old logic was:

fast() {

  if (fast case applies) {
     do_fast_case
     coutner += .... + FIXED_COST
     return;
  }
  
  return slow_case();
}

So we needed the increment in slow case since we do the fast case only if it applies.

New logic is like:

fast() {

     do_fast_case(... first chunk ...)
     coutner += .... + FIXED_COST
     if (fast_case did the whole thing) {
        return;
     }
  
  return slow_case();
}

That is, we do the fast case unconditionally and then check if it was enough. So FIXED_COST is already always added before slow_case executes so we do not need to do it again here.

That said, handling this "right" in the slow case doesn't matter much because FIXED_COST = 1 and so only makes a meaningful difference for small inputs (especially length=0) which would be handled by the fast case. So if it's useful we can be a bit sloppy in the math. The main to avoid is that processing a very long run of zero-length inputs does yield now and then.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New logic is about 20% slower though for small cases, not sure what's up with that.


// using container_types = ::testing::Types<std::vector<int>>;
using container_types
= ::testing::Types<std::vector<int>, std::unordered_map<int, int>>;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make all tests except one execute for both vector and unordered_map (latter of which has only a forward iterator).


task_counter t_counter;
ssx::async_for_each_counter<test_traits<3>>(
a_counter, v.begin(), v.end(), add_one)
ssx::async_for_each_counter<test_traits<4>>(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor math changes, related to exactly if we yield this time or the next when we are right at a boundary. I don't think it matters (people should not rely on the exact yielding semantics which are timing dependent anyway).

@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Feb 22, 2024

@travisdowns travisdowns merged commit 1bd3b60 into redpanda-data:dev Feb 23, 2024
16 checks passed
@vbotbuildovich
Copy link
Collaborator

/backport v23.3.x

@vbotbuildovich
Copy link
Collaborator

/backport v23.2.x

@vbotbuildovich
Copy link
Collaborator

Failed to create a backport PR to v23.3.x branch. I tried:

git remote add upstream https://github.com/redpanda-data/redpanda.git
git fetch --all
git checkout -b backport-pr-16676-v23.3.x-887 remotes/upstream/v23.3.x
git cherry-pick -x c667749ed2c1ae3f042d20e0c0218217b1096b3c

Workflow run logs.

@vbotbuildovich

This comment was marked as outdated.

@travisdowns
Copy link
Member Author

/backport v23.3.x

@travisdowns
Copy link
Member Author

/backport v23.2.x

@vbotbuildovich

This comment was marked as outdated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants