Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profiler/instrumentation support and promise jobs #34

Merged
merged 22 commits into from
Mar 2, 2023

Conversation

mhofman
Copy link
Contributor

@mhofman mhofman commented Feb 4, 2023

This is the change included in Agoric/agoric-sdk#6920

When approved I plan on force pushing to master to avoid dealing with churning the upstream PR. Leaving this PR as "draft to indicate it won't me merged through GitHub)

I tried to cleanly merge @phoddie's fork, but there seemed to have been a lot of merge conflicts churn. Hopefully an actual merge commit will allow future syncs to succeed more easily. The merge with the most conflict resolution is a0627a8. I recommend reviewing with a tool that allows correctly visualizing merge commit and branching.

The instrumentation changes were disabled as I wasn't confident the fxIsConnected returning 1 unconditionally didn't have a side effect since it seems fxParseScript has an extra call in that case.

I also included a tweak to the run loop to process promise jobs according to a discussion I had with @phoddie and @patrick-soquet.

This also includes all the fixes of the psm branch which are already included on agoric-sdk master.

phoddie and others added 19 commits May 20, 2022 10:06
Rather than treating all unknown commands as a request for the worker
to silently exit (rc=0), we make a specific 'q' command for that
purpose. Unknown commands now cause an error exit (rc=2), with a
message (to stderr) about what the surprising command was.

This includes the characters that are only used in downstream
responses to upstream `issueCommand()` calls, which might help
diagnose desynchronization errors faster.
commit 'aa494cee6f6b5cda70f7c685535940dae4321252'
@mhofman mhofman force-pushed the mhofman/6759-update-xs branch from ccc51a6 to 2a42752 Compare February 7, 2023 23:46
@mhofman
Copy link
Contributor Author

mhofman commented Feb 7, 2023

Alright so apparently select fails with EBADF if the fd doesn't exist, which is still the case for 5 in our xsnap integration. I couldn't figure out how to dynamically test for the existence of the fd, so I gated the new select logic on mxInstrument.

Speaking of which, some checks in the code do #if mxInstrument and some others do #ifdef mxInstrument so my earlier disablement through -DmxInstrument=0 was not correct.

Btw, leaving mxInstrument=1 actually hangs indefinitely earlier if fd 5 is not connected, since it seems the create/login logic calls fxReceive which will hang. That's what I had understood when initially reading the code, and why I had attempted to disable the instrument logic.

To avoid churn I forced pushed the changes: ccc51a6..2a42752

@mhofman mhofman changed the base branch from master to mhofman/6759-update-xs-base February 8, 2023 16:01
@mhofman
Copy link
Contributor Author

mhofman commented Feb 8, 2023

Ugh apparently this repo requires approving reviews now before pushing to master, which is somewhat surprising given that agoric-sdk can just use any commit that exists in this repo. @warner do you want to relax the the requirement or should be land this as a merge commit, which would mean agoric-sdk is using the right-side merge parent instead of the merge commit itself?

@warner
Copy link
Collaborator

warner commented Feb 8, 2023

Let's see, we want agoric-sdk master (or, more precisely, the development version of agoric-sdk/packages/xsnap) to be using trunk of this repo. Let's do a full PR and review of this change to get it into xsnap-pub trunk, then let's update agoric-sdk to point at the merged/trunk commit instead of the right-side merge parent (even though the code will be the same).

@mhofman mhofman marked this pull request as ready for review February 8, 2023 20:34
@mhofman
Copy link
Contributor Author

mhofman commented Feb 8, 2023

Please review then :)

@mhofman mhofman changed the base branch from mhofman/6759-update-xs-base to master February 8, 2023 20:35
Copy link
Collaborator

@warner warner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question for you, but otherwise looks good

void fxRunLoop(txMachine* the)
{
c_timeval tv;
txNumber when;
txJob* job;
txJob** address;
fxEndJob(the);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a potential behavioral change, but I can't remember why we needed this here. I think you added it in 84aa01a , but the comment just says "Platform runloop tweaks, Snapshot dump tweaks".

I think the command we send down to the worker gets its first turn of execution before we call fxRunLoop(). I think fxEndJob() basically does finalization callbacks and fxCheckUnhandledRejections. We probably don't do a lot during that first turn (https://github.com/Agoric/agoric-sdk/blob/2529e82427277d98ea777c5a35e6bd753e65e6f7/packages/SwingSet/src/supervisors/supervisor-helper.js#L32 does a deliberate stall, which would consume the first turn), but it's conceivable that this might trigger organic GC, and then some finalizers could become ready to run.

I'm ok with this change as long as you can't think of any reason why we need to maintain the previous behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous behavior was "not expected" and buggy.

If the command job enqueued promises, it would not drain those, but instead run endJob first. Only at that point it would drain promise queues (for both jobs)

The expectation is that reactions form an I/O job are processed immediately, then come the "gc" jobs (and its reactions).

The previous addition was necessary because if the main command job didn't queue promises, endJob would not trigger. It was actually the wrong fix because the same thing would happen if a subsequent I/O job (e.g. setImmediate) didn't queue promises, it wouldn't run endJob either.

Now the logic does the expected thing:

  • Run the I/O job (command or timers), and drain promise reactions
  • Run GC jobs and check for unhandled errors, and drain promise reactions
  • Repeat

@mhofman mhofman merged commit 134c8a7 into master Mar 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants