Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More clearly define how seller trusted signals fetching works #1230

Merged
merged 37 commits into from
Sep 6, 2024

Conversation

morlovich
Copy link
Collaborator

@morlovich morlovich commented Jul 22, 2024

This explicitly represents a parallel process picking sets of requests to merge, and what rules such merging must follow.
(It does not specify how such subsets are selected, since implementations have considerable freedom in how to do it).


Preview | Diff

@morlovich
Copy link
Collaborator Author

@MattMenke2 Would appreciate your review on this as well, in part to see if it moves things to where specifying V2 would be easier. (Unfortunately bidder stuff is totally different...)

Copy link
Contributor

@MattMenke2 MattMenke2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite a full review - want to poke a bit more tomorrow.

spec.bs Outdated Show resolved Hide resolved
spec.bs Outdated Show resolved Hide resolved
spec.bs Outdated Show resolved Hide resolved
spec.bs Outdated Show resolved Hide resolved
spec.bs Show resolved Hide resolved
spec.bs Outdated Show resolved Hide resolved
spec.bs Outdated
:: A [=list=] of [=trusted scoring signals requests=] that hasn't yet been organized to aid
batching.
: <dfn>request map</dfn>
:: A [=map=] from a tuple of [=script fetcher=], [=URL=], {{unsigned short}} or null, and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is script fetcher here? Currently, it looks like there's a 1:1 mapping a batch to fetcher, so we could make it a batcher-wide value. Even if we wanted to merge requests for different fetchers, I assume we wouldn't want it as a key in the map, because then it would prevent merging requests. Am I missing something?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do want it to prevent merging requests, because if the script fetchers for the batch are not the same, the cross-origin header check gets weird.

I don't have a good answer as to what happens cross-auction, though. Or if we somehow have two component sellers with the same script.

(Similarly, there is a question of how much reuse of bidder script sellers can actually happen; I think it ought to be implementation-defined within some bounds, since for us it depends on process limits!)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We create a new batcher for each fetcher, though, so the fetcher is always the same, and could be a top-level pointer.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but this removes the decision from here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, it already is there, so I'm wondering why it's here as well? Do you plan to change things to use a global batcher? If we're partitioning by script fetcher anyways, I'm not sure what that would get us?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't share fetches when the script fetcher is different, I'm not sure that gets us anything, though?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could make the batcher global (doesn't V2 require that for the cache?) and then maybe have the fetcher per auction, or have some optional way for fetchers to be shared (which is what impl does if there are concurrent auctions)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

V2 does indeed use a global batcher. In V2, I also have no plans to partition anything based on the fetcher, just on the seller origin (and main frame origin, and URL, and doubtless a couple other things). Guess perhaps I should have led with that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

impl-wise, the partitioning by script fetcher roughly corresponds to different script URLs getting different worklet processes, and in cross-origin case those waiting for script headers before asking trusted signals for stuff. I guess in spec those two would eventually need to be separate stages of the batching process, with cross-origin things first waiting for the headers before being considered for merging, but that doesn't seem to be necessary yet.

The spec just does a bad job of representing this sort of thing at all; the previous CL you reviewed changed "we fetch the script every single time we call scoreAd" to "we fetch it once per component auction", which is closer but not 100%... and re-arranging lifetime of ScriptFetchers is basically about that.

(One of my TODO entries is doing that for bid scripts, since those still fetch for every generateBid()).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Different script URLs share worklet processes, just not objects within the process.

Anyhow, I'm OK with this as-is, just wanted to dive into it a bit. Should finish reviewing this by EOD.

spec.bs Outdated Show resolved Hide resolved
spec.bs Outdated

1. Until this object is no longer needed:
1. Wait until |batcher|'s [=trusted scoring signals batcher/request queue=] is no longer empty or
some heuristically chosen amount of time has passed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"...or some time has passed and it's no longer empty", perhaps? I think we'd rather not think about running other steps with an empty queue.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, because this may be empty by the "request map" field might not be.

... Though the distinction between request queue and request map only exists because I have no idea on how to express an atomic update to "request map", so the ideal solution would be to only have request map, have things atomically inserted into it, and then the condition can be as you said.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is intended to trigger when the queue is empty, then "Wait until |batcher|'s [=trusted scoring signals batcher/request queue=] is no longer empty" doesn't work alone. I had assumed this was an exclusive or. I still think we want to rephrase this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well... it basically does two things:

  1. Wake up when there is new stuff in the queue.
  2. Wake up when we feel like batching things based on some timer.
    In either case we may send requests. Or might not. So I don't really understand what you mean by 'exclusive' here...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I read this as do one or the other, not having two different triggers in the same implementation. i.e.:

option1:

void ThreadFunc() {
while (running()) {
WaitForNonEmpty();
DoStuff();
}
}

or

void ThreadFunc() {
while (running()) {
WaitForHeuristic();
DoStuff();
}
}

As opposed to:

void ThreadFunc() {
while (running()) {
MagicallyCombineWaits(&WaitForNonEmpty, &WaitForHeuristic);
DoStuff();
}
}

I think we should make clear it's the latter. The heuristic wait is a mandatory part of the algorithm (at least if the other heuristic can leave map non-empty)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, yeah, that would be an issue. Do you perhaps have suggestions on clearer phrasing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe:

Wait until one of the following is true:

  1. |batcher|'s [=trusted scoring signals batcher/request queue=] is no longer empty.
  2. |batcher|'s [=trusted scoring signals batcher/request map=] is non-empty and some heuristically chosen amount of time has passed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this is indeed much better; done, though I used "at least one"

spec.bs Outdated Show resolved Hide resolved
Copy link
Contributor

@MattMenke2 MattMenke2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this look good.

@morlovich morlovich requested review from domfarolino and miketaylr and removed request for domfarolino July 26, 2024 12:58
@JensenPaul JensenPaul added the spec Relates to the spec label Jul 30, 2024
@morlovich
Copy link
Collaborator Author

Ping, Mike? Or should I bounce this back to Dom?

Copy link
Collaborator

@miketaylr miketaylr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very sorry for the delay. Rubber stamp given Matt's review (which I think is sufficient going forward).

@morlovich
Copy link
Collaborator Author

@qingxinwu This probably warrants extract attention to the "batch and fetch trusted scoring signals" algorithm, it does some weird stuff.

(And I am reminded I was supposed to be reviewing your CL...)

spec.bs Outdated Show resolved Hide resolved
spec.bs Outdated
To <dfn>batch and fetch trusted scoring signals</dfn> given a [=trusted scoring signals batcher=]
|batcher|:

1. Until this object is no longer needed:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"this object" means |batcher|? And I'm not sure what "no longer needed" means

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It means that the implementer gets to figure out memory management for this thing... (did change to |batcher|)

spec.bs Outdated Show resolved Hide resolved
spec.bs Outdated Show resolved Hide resolved
spec.bs Outdated Show resolved Hide resolved
spec.bs Outdated Show resolved Hide resolved
spec.bs Outdated Show resolved Hide resolved
spec.bs Outdated Show resolved Hide resolved
spec.bs Outdated Show resolved Hide resolved
spec.bs Outdated
1. |batcher|'s [=trusted scoring signals batcher/request queue=] [=map/is not empty=].
1. |batcher|'s [=trusted scoring signals batcher/request map=] [=map/is not empty=] and some
heuristically chosen amount of time has passed.
1. Atomically transfer all entries in |batcher|'s [=trusted scoring signals batcher/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this just a [=list/clone=], and then [=list/empty=] |batcher|'s [=trusted scoring signals batcher/request queue=]? I'm not sure if the "Atomically" makes much difference here spec wise?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's not atomic, another thread may append the entry in between the two operations, and the entry would get lost.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see.

  1. Add a first step to this algorithm to assert this is running in parallel. (e.g., https://github.com/WICG/turtledove/blob/main/spec.bs#L1635).
  2. Atomically do: 1. [=list/clone=] ... 1. [=list/empty=]... may be better, since it uses standard methods, instead of "transfer entries", which does not seem clear spec wise.
  3. Do you know if there's a good way to spec "atomically"? I vaguelly remembered that you asked about this in one PR. If not, I may want to tag Dominic about their ideas.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. No idea on 3

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@domfarolino Is there a way to spec the concept of "atomically"? Not sure if the term itself is enough.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah https://html.spec.whatwg.org/multipage/infrastructure.html#starting-a-new-parallel-queue (and the following example) may be also related.

Well only if the two execution contexts are actually the "same", in which they don't need to be distinct in parallel to each other. But they do need to be distinct I suppose, given @morlovich's comment.

Copy link
Collaborator

@qingxinwu qingxinwu Sep 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, just saw Maks's comment (that comment didn't show up yet when I commented 😂 ). So ignore my previous comment then.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note added. Using an actual queue like the parallel queue infra was considered, but it feels awkward to combine with the "just sleep for a bit, too, that's OK" stuff...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify what you mean by "just sleep for a bit, too, that's OK"? And what bearing it would have on your adoption of a parallel queue? (I'm not saying you should use a parallel queue here I'm just trying to make sure you don't think using one requires arbitrary synchronous sleep steps).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I wasn't really speaking of not using a parallel queue --- this does after all --- but rather of modelling the fetch request queue it uses on how the "start a new parallel queue" algorithm Qingxin linked to is written --- which is basically continuously spin-dequeueing. On further thought, that approach can actually work here, too, but I think it's less clear than talking about sleeping, since conceptually what this is saying is that implementations waiting for a few milliseconds to merge requests is an acceptable implementation choice. (As is not waiting!)

spec.bs Outdated Show resolved Hide resolved
spec.bs Outdated
empty [=list=].
1. [=list/Append=] |request| to |batcher|'s [=trusted scoring signals batcher/request map=]
[|key|].
1. Some number of times, heuristically, select a |key| and a non-empty sublist of |batcher|'s
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"some number of times", maybe "[=iteration/While=] |batcher|'s [=trusted scoring signals batcher/request map=] [=map/is not empty=]"?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That wouldn't do what's desired --- essentially an implementation can choose how much to batch, it doesn't have to handle everything it's got pending.

Copy link
Collaborator

@qingxinwu qingxinwu Aug 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then my understanding of the sentence here is:

for i in range(0, a heuristically chosen number) {
  select a key and a sublist of requestMap[key];
  handle the batch;
}

where after the loop, the request map does not need to be empty. Is that correct?
I thought after running the loop, we wanted to have all requests in requestMap handled (sure not in a single batch, but in whatever number of batches they chose). Otherwise, I'm looking for a place that ensures all pending requests are fetched in the end.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, since it might want to wait a bit to collect more stuff in request map to get a bigger batch.

Hmm, I guess we technically are not requiring eventual progress here. Not immediately sure how to do so.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe a note about this does not need to be drained during the for loop and can wait for more requests to batch, but just need to make sure all requests are batched (requestMap is empty) in the end ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rephrased the note there somewhat (not pushed yet, though, working on Dominic's suggestion, too).

@morlovich
Copy link
Collaborator Author

Hmm, build failure talking about [=fenced frame config/effective sandbox flags=]

@morlovich
Copy link
Collaborator Author

Looks to be because of WICG/fenced-frame@00676a5
I'll try to put together a minimal fix, though looks like there is a TODO to be TODONE by someone more familiar with this.

spec.bs Outdated Show resolved Hide resolved
spec.bs Outdated Show resolved Hide resolved
@qingxinwu
Copy link
Collaborator

almost there. Just tagged Dom for one question, and #1230 (comment) is the last comment I have.

@morlovich
Copy link
Collaborator Author

Looks like another change in fenced frame spec breaking out build, looking into it.

Copy link
Collaborator

@qingxinwu qingxinwu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@JensenPaul JensenPaul merged commit 5af3f7f into WICG:main Sep 6, 2024
2 checks passed
github-actions bot added a commit that referenced this pull request Sep 6, 2024
SHA: 5af3f7f
Reason: push, by JensenPaul

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
github-actions bot added a commit to brusshamilton/turtledove that referenced this pull request Sep 6, 2024
)

SHA: 5af3f7f
Reason: push, by brusshamilton

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
spec Relates to the spec
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants