Timeouts, overlapped operations, and composability #2545
Labels
discuss
invites discussion from contributors
methodology
issues related to the testing methodology
ooni/probe-engine
priority/medium
question
for open questions
research prototype
TL;DR The work on beacons (see #2531) led me to reflect on how using a composable channel-based pattern could help us to keep the DSL composability and enable overlapped operations in light of our timeout policy, thus allowing us to use the available time more efficiently.
Background My initial prototype for beacons (see #2531) had this structure:
Where:
GenerateTactics
was a function that internally called a resolver and generated tactics for the union of (a) the resolved and hardcoded IP addresses and (b) the acceptable SNIs;UseTactics
was a function that walked the list of tactics and attempted to connect and TLS handshake.In other words, if
Resolver
has this interface:the
GenerateTactics
interface was:This initial design stemmed from the observation that, by replacing a
Resolver
with aGenerateTactics
, and by adapting the TCP and TLS dial accordingly, we could implement the desired beacons functionality.In fact, the initial implementation of
GenerateTactics
was just a wrapper for aResolver
that converted the resolved IP addresses to tactics; and the initial implementation ofUseTactics
was refactored from a trivial loop that tries each available IP address with TCP connect and TLS handshake until one IP address work or all have failed.However, quite soon I modified
GenerateTactics
to become:This issue is here to explain (1) why I applied this change and (2) how we can stretch this design change to achieve beneficial outcomes in terms of efficiency (i.e., how many attempts we can pack in N seconds) and composability.
Efficiency I applied this change because I realized that I wanted
UseTactics
to start running as soon as possible (i.e., using the already known beacons addresses) without waiting for the underlying DNS lookup performed byGenerateTactics
to complete successfully or return an error. My reasoning was that the first attempt could start right away while the DNS lookup was still in progress. After thinking a bit more about this, I realized that, by applying this pattern systematically, we could pack more timeout-bound attempts into a fixed amount of seconds, even factoring in happy eyeballs. (In this context, happy eyeballs is the process of staggering the tactics such that they do not all start immediately—but crucially we don't wait for attempt N to fail to start attempt N+1.)Let us now abstract from the specific use case I was working on, and focus instead on Web Connectivity LTE. There, we roughly have the following structure:
As you can see, endpoint measurements need to wait for three DNS resolvers to complete. This fact reduces the measurement efficiency in light of timeouts. For example, if DNS over HTTPS times out, this timeout is likely four seconds, and this timeout is additive to additional timeouts we may see down the line (e.g., during TCP connect).
Crucially, in DNSOverUDP we also want to check whether there are additional IP addresses returned by late replies, which usually are caused by censorship (the GFW, for example, works like this). While we currently have support for collecting these late replies and include them as measurements in Web Connectivity v0.5, it is not very practical for the code to wait for them before returning IP addresses to the DNSBarrier state.
Imagine, instead, there was no DNSBarrier, rather just a channel that streams resolved IP addresses. In such a case we would be able to start testing early. This means that we would be able to overlap more operations in presence of timeouts and initiate measuring addresses from late replies (if not duplicate) when they become available.
Composability The DSL (
./internal/dslx
) composes functions; for example:creates a composed function that performs a TCP connect followed by a TLS handshake. Now, channels are also very composable in Go (and probably composing channels is as idiomatic, if not more, than composing functions).
So, this interface:
could become something like:
While still being composable, this pattern has the benefit that we can have overlapped operations as mentioned above.
What we should do The
./internal/dslx
package should be refactored to use a channel based pattern. This package is not heavily used yet, and I am still convinced we should use it to rewrite experiments because it has the functional property that we can decouple what and how. We also have completed the work of writing good QA tests with netem, which means we're now well positioned to start rewriting tests using the DSL. Using a channel based refactoring for the DSL is a good idea before starting to rewrite because it opens up the possibility, later on, to go down the stack and apply channel based patterns to other building blocks (e.g., the DNS-over-UDP resolver, such that we can always deliver to a consumer the additional IP addresses discovered by parsing late DNS replies).The text was updated successfully, but these errors were encountered: