Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(s2n-quic-xdp): add async tasks #1730

Merged
merged 5 commits into from
May 2, 2023
Merged

feat(s2n-quic-xdp): add async tasks #1730

merged 5 commits into from
May 2, 2023

Conversation

camshaft
Copy link
Contributor

@camshaft camshaft commented Apr 26, 2023

Description of changes:

This PR adds a set of async tasks responsible for managing ring buffer and queue state

Fundamentally, each task takes a set of input sources and routes them to one or more output
queues. Each task is generic over the execution environment, meaning it can be using in
something driven by polling for events, like tokio, or spawned on its own thread in a busy
poll loop.

Call-outs:

These new tests caught an issue in the ring cursor implementation that deadlocked if the consumer consumed the entire ring from being completely full. Very fun to debug that one...

Testing:

The ordering of operations in each of the tasks is critical for correctness. It's very easy to
get into a deadlock if things aren't exactly right. As such, each task has a fuzz test that
tries to show the tasks working properly, even in extreme cases.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Comment on lines 146 to 151
let mut new_value = self.consumer().load(Ordering::Acquire);

// Our cached copy has the size added so we also need to add the size here when comparing
//
// See `Self::init_producer` for more details
new_value += self.size;
Copy link
Contributor Author

@camshaft camshaft Apr 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a fun bug to hunt down... The optimization that I put in to compare the new_value against the cached value backfired and caused a deadlock when the consumer incremented the cursor by size (meaning they consumed the entire ring).

I added a kani proof to make sure the fix worked as well.

Without

SUMMARY:
 ** 1 of 1720 failed (9 unreachable)
Failed Checks: attempt to add with overflow
 File: "/home/cameron/Projects/aws/s2n-quic/tools/xdp/s2n-quic-xdp/src/ring/cursor.rs", line 154, in ring::cursor::Cursor::<u32>::acquire_producer

VERIFICATION:- FAILED
Verification Time: 65.64434s

With

SUMMARY:
 ** 0 of 1719 failed (9 unreachable)

VERIFICATION:- SUCCESSFUL
Verification Time: 72.64367s

Comment on lines +310 to +319
// let the poller know how many items we consumed
poller.release(comp, sent);

// if all of the queues are closed then shut down the task
if closed == txs.len() {
trace!("all tx queues closed; shutting down");
return Poll::Ready(());
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially I had the ordering of these two reversed, which caused the fuzz tests to randomly fail at the end and drop a few descriptors.

Comment on lines 54 to 65
debug_assert!(frame_size.is_power_of_two());

let shift = frame_size.trailing_zeros();

debug_assert_eq!(
frame_size,
2u32.pow(shift),
"computing the square root of a power of two is counting the trailing zeros"
);

Self { shift }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea here is if we know that the frame_size is a power of two, it's much cheaper to shift bits than perform regular integer division.

Comment on lines +105 to +109
let queues = queues as u64;
debug_assert!(queues.is_power_of_two());
let mask = queues - 1;
Self { mask }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. We can mask off the final result and it's equivalent to doing a mod operation.

}

/// The number of items to send through the test queues
pub const TEST_ITEMS: usize = 10_000;
Copy link
Contributor Author

@camshaft camshaft Apr 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before pushing I set this to 10_000_000_000 and let it run for about 30 minutes so I think we can be pretty confident in it holding the invariants defined in the test harness.

@camshaft camshaft marked this pull request as ready for review April 27, 2023 02:50
@WesleyRosenblum WesleyRosenblum self-requested a review April 27, 2023 18:16
///
/// This value is purposefully low to more frequently trigger corner cases of
/// queues wrapping and/or getting full.
pub const QUEUE_SIZE: usize = 16;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also be testing with larger queue sizes than this? I don't expect this is close to what we would actually want the queue size to be.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this is much smaller than what the production queue sizes will be. However, I think after a certain size the implementation will behave the same. As long as you have a few primes and even numbers in the range I think we should be fine. That being said, it's not hard to add more at larger sizes if we'd like.

As mentioned in the comments it's set low enough where we're running into the edge cases more frequently so the testing time is better utilized.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added large queue sizes in badcca6

Copy link
Contributor

@WesleyRosenblum WesleyRosenblum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still reviewing, I've looked at everything except for src/task/*

tools/xdp/s2n-quic-xdp/Cargo.toml Show resolved Hide resolved

#[test]
fn rx_tx_test() {
let _ = rx_tx(16);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was 16 chosen here for the same reason as QUEUE_SIZE_SMALL?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is arbitrary. Just needs to be a power of two and relatively small so we don't consume a bunch of memory in the unit test


pub fn attach_umem(&self, umem: &crate::umem::Umem) -> Result<()> {
umem.attach(self)?;
// TODO store the umem
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need an issue for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah. I'm going to be addressing it with the next PRs

tools/xdp/s2n-quic-xdp/src/task.rs Show resolved Hide resolved
frame_size.is_power_of_two(),
tx_queues.len().is_power_of_two(),
) {
(0, _, _) => panic!("invalid tx_queues size"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(0, _, _) => panic!("invalid tx_queues size"),
(0, _, _) => unreachable!("tx_queues must be non-zero length"),

/// which TX queues get which descriptors. This trait takes in a descriptor and decides if it
/// pertains to a worker index or not.
pub trait Assign: Unpin {
fn assign(&self, desc: UmemDescriptor, idx: u64) -> bool;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would is_assigned be more appropriate here since its not actually assigning anything?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's fine

tools/xdp/s2n-quic-xdp/src/task/rx.rs Show resolved Hide resolved
"the number of actual items should not exceed what was acquired"
);

let len = len.min(actual);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does the comment from completion_to_tx.rs apply here as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is a bit different. We made a debug assertion above it so this should be the same as just doing let len = actual, assuming that assertion holds. Just a little defensive here.

WesleyRosenblum
WesleyRosenblum previously approved these changes May 2, 2023
tools/xdp/s2n-quic-xdp/src/task/completion_to_tx.rs Outdated Show resolved Hide resolved
@camshaft camshaft merged commit e4314d2 into main May 2, 2023
@camshaft camshaft deleted the camshaft/xsk-tasks branch May 2, 2023 16:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants