-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(s2n-quic-xdp): add async tasks #1730
Conversation
f5176b5
to
35ce613
Compare
let mut new_value = self.consumer().load(Ordering::Acquire); | ||
|
||
// Our cached copy has the size added so we also need to add the size here when comparing | ||
// | ||
// See `Self::init_producer` for more details | ||
new_value += self.size; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a fun bug to hunt down... The optimization that I put in to compare the new_value
against the cached value backfired and caused a deadlock when the consumer incremented the cursor by size
(meaning they consumed the entire ring).
I added a kani proof to make sure the fix worked as well.
Without
SUMMARY:
** 1 of 1720 failed (9 unreachable)
Failed Checks: attempt to add with overflow
File: "/home/cameron/Projects/aws/s2n-quic/tools/xdp/s2n-quic-xdp/src/ring/cursor.rs", line 154, in ring::cursor::Cursor::<u32>::acquire_producer
VERIFICATION:- FAILED
Verification Time: 65.64434s
With
SUMMARY:
** 0 of 1719 failed (9 unreachable)
VERIFICATION:- SUCCESSFUL
Verification Time: 72.64367s
// let the poller know how many items we consumed | ||
poller.release(comp, sent); | ||
|
||
// if all of the queues are closed then shut down the task | ||
if closed == txs.len() { | ||
trace!("all tx queues closed; shutting down"); | ||
return Poll::Ready(()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initially I had the ordering of these two reversed, which caused the fuzz tests to randomly fail at the end and drop a few descriptors.
debug_assert!(frame_size.is_power_of_two()); | ||
|
||
let shift = frame_size.trailing_zeros(); | ||
|
||
debug_assert_eq!( | ||
frame_size, | ||
2u32.pow(shift), | ||
"computing the square root of a power of two is counting the trailing zeros" | ||
); | ||
|
||
Self { shift } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea here is if we know that the frame_size is a power of two, it's much cheaper to shift bits than perform regular integer division.
let queues = queues as u64; | ||
debug_assert!(queues.is_power_of_two()); | ||
let mask = queues - 1; | ||
Self { mask } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here. We can mask off the final result and it's equivalent to doing a mod
operation.
} | ||
|
||
/// The number of items to send through the test queues | ||
pub const TEST_ITEMS: usize = 10_000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before pushing I set this to 10_000_000_000
and let it run for about 30 minutes so I think we can be pretty confident in it holding the invariants defined in the test harness.
/// | ||
/// This value is purposefully low to more frequently trigger corner cases of | ||
/// queues wrapping and/or getting full. | ||
pub const QUEUE_SIZE: usize = 16; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also be testing with larger queue sizes than this? I don't expect this is close to what we would actually want the queue size to be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this is much smaller than what the production queue sizes will be. However, I think after a certain size the implementation will behave the same. As long as you have a few primes and even numbers in the range I think we should be fine. That being said, it's not hard to add more at larger sizes if we'd like.
As mentioned in the comments it's set low enough where we're running into the edge cases more frequently so the testing time is better utilized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added large queue sizes in badcca6
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still reviewing, I've looked at everything except for src/task/*
|
||
#[test] | ||
fn rx_tx_test() { | ||
let _ = rx_tx(16); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was 16
chosen here for the same reason as QUEUE_SIZE_SMALL
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is arbitrary. Just needs to be a power of two and relatively small so we don't consume a bunch of memory in the unit test
|
||
pub fn attach_umem(&self, umem: &crate::umem::Umem) -> Result<()> { | ||
umem.attach(self)?; | ||
// TODO store the umem |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you need an issue for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nah. I'm going to be addressing it with the next PRs
frame_size.is_power_of_two(), | ||
tx_queues.len().is_power_of_two(), | ||
) { | ||
(0, _, _) => panic!("invalid tx_queues size"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(0, _, _) => panic!("invalid tx_queues size"), | |
(0, _, _) => unreachable!("tx_queues must be non-zero length"), |
/// which TX queues get which descriptors. This trait takes in a descriptor and decides if it | ||
/// pertains to a worker index or not. | ||
pub trait Assign: Unpin { | ||
fn assign(&self, desc: UmemDescriptor, idx: u64) -> bool; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would is_assigned
be more appropriate here since its not actually assigning anything?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that's fine
"the number of actual items should not exceed what was acquired" | ||
); | ||
|
||
let len = len.min(actual); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does the comment from completion_to_tx.rs
apply here as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is a bit different. We made a debug assertion above it so this should be the same as just doing let len = actual
, assuming that assertion holds. Just a little defensive here.
a4a97f0
to
919e7ce
Compare
Co-authored-by: Wesley Rosenblum <[email protected]>
Description of changes:
This PR adds a set of async tasks responsible for managing ring buffer and queue state
Fundamentally, each task takes a set of input sources and routes them to one or more output
queues. Each task is generic over the execution environment, meaning it can be using in
something driven by polling for events, like
tokio
, or spawned on its own thread in a busypoll loop.
Call-outs:
These new tests caught an issue in the ring cursor implementation that deadlocked if the consumer consumed the entire ring from being completely full. Very fun to debug that one...
Testing:
The ordering of operations in each of the tasks is critical for correctness. It's very easy to
get into a deadlock if things aren't exactly right. As such, each task has a fuzz test that
tries to show the tasks working properly, even in extreme cases.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.