Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unbork Emerynet: Cleanup Auctioneers, PriceAuthorities, and QuoteNotifiers #10725

Open
7 tasks
toliaqat opened this issue Dec 17, 2024 · 3 comments
Open
7 tasks
Labels
enhancement New feature or request

Comments

@toliaqat
Copy link
Contributor

What is the Problem Being Solved?

Emerynet currently suffers from severe performance degradation due to multiple generations of auctioneer vats running simultaneously. This has led to the accumulation of thousands of ephemeral observers and QuoteNotifiers (QNs). Each additional price submission triggers massive computations and delays, causing push-price operations to run for hours. The problem is aggravated by old auctioneers continually waking up on timers, creating new observer chains and further inflating the backlog of QNs. This situation makes Emerynet slow, unresponsive, and difficult to use for testing and validation.

Description of the Design

The proposed solution involves a multi-step cleanup and reset process:

(Optional) Halt Auctioneer Timers (for Gen4/Gen5)

Attempt to stop timers associated with problematic auctioneers (e.g., v661 and v665) to prevent further growth of observers/QNs. If this isn't possible via existing governance tools, move directly to terminating these vats.

Terminate Old Auctioneer Vats

Use governance actions (coreProposals) to kill all old auctioneers (including gen1 through gen5) on Emerynet. Removing these vats will cascade a cleanup of their associated observers, QNs, and references in scaledPriceAuthorities and priceFeeds over time.

Submit Prices for All Denoms

Push updated prices for every denom to trigger the release and garbage collection of QN references in scaledPriceAuthorities (sPA) and priceFeeds (pf). While each price submission will take hours initially, this step eventually clears the backlog of QNs and observers.

Create a New, Clean Auctioneer

After full cleanup, instantiate a fresh auctioneer vat that references updated priceAuthorities without legacy baggage. This ensures a stable and clean testing environment going forward.

Monitor and Verify Stability

Observe Emerynet after the cleanup to ensure:

  • Regular auction cycles proceed without excessive overhead.
  • Price pushes complete in minutes, not hours.
  • No runaway growth of QNs or observers reoccurs.

Security Considerations

  • Terminating vats and pushing prices are governed actions.
  • Ensure termination and re-creation of vats follow proper governance proposals to maintain chain security and trust.
  • Verify that no active purses or valuable resources remain locked in terminated vats.

Scaling Considerations

  • The cleanup process itself may be slow and computationally expensive, involving many hours of block time.
  • Once complete, the load and resource usage should return to manageable levels, enabling Emerynet to scale normally again.
  • This operation provides insights into future upgrades and how to handle large-scale cleanup to avoid performance bottlenecks.

Test Plan

  • Before Cleanup:
    Measure the current backlog and record how long price submissions take.
  • During Cleanup:
    • Terminate old auctioneer vats and push prices.
    • Continuously monitor logs and system metrics to ensure processes are proceeding.
  • After Cleanup:
    • Confirm that QNs and observers have been dramatically reduced.
    • Push a new price and confirm it completes within minutes.
    • Run a series of standard Emerynet tests to ensure normal operation.

Upgrade Considerations

  • Future upgrades should include mechanisms to cleanly stop or replace auctioneers and related infrastructures without leaving behind accumulations of QNs and observers.
  • Consider revisiting the design of scaledPriceAuthorities and priceFeeds to avoid complex observer chains.
  • Document lessons learned to ensure mainnet upgrades (like U18/U19) won’t encounter similar issues.

Sub-Tasks:

  • Investigate Timer Controls for Auctioneers

    • Determine whether it's possible to halt timers in gen4+gen5 auctioneers. If not, proceed to termination.
  • Terminate Old Auctioneer Vats

    • Identify and terminate old auctioneers (v39, v353, v367, v661, v665).
    • Use a coreProposal to kill these vats and confirm dropImports/retireImports behavior in kernel logs.
  • Submit Prices for All Denoms

    • Push updated prices for each managed denom to trigger cleanup.
    • Monitor the process; expect hours of block time but ensure ultimate QN reduction.
  • Confirm QN and Observer Cleanup

    • Verify that the number of QNs/observers in sPA and pf vats decreases significantly.
    • Ensure no new QNs accumulate during or after the process.
  • Create a New Auctioneer

    • After full cleanup, instantiate a new, clean auctioneer via a coreProposal.
    • Confirm that it references a stable, simplified priceAuthority setup.
  • Monitor Post-Cleanup Stability

  • Run test pushes of prices.

    • Validate normal operation and confirm that no runaway QN growth occurs again.
@toliaqat toliaqat added the enhancement New feature or request label Dec 17, 2024
@warner
Copy link
Member

warner commented Dec 26, 2024

@siarhei-agoric : for reference, #9483 (comment) has a lot of notes about how to find the necessary objects to terminate the old vats. And @gibson042 probably has even more ideas.

@Chris-Hibbert
Copy link
Contributor

Determine whether it's possible to halt timers in gen4+gen5 auctioneers. If not, proceed to termination.

The auctioneers are not designed to be halted, but it is straightforward to give them a long schedule. For instance a week.

The issue with this approach is that it's not possible to reset the schedule to a shorter period until that time elapses. So choosing a very long time (like a year) effectively disables that auction, since if you wanted to restart it to a shorter period, you'd have to wait for that year to elapse before the auctioneer would pay attention to a shorter period.

It's possible that giving them inconsistent parameters would turn the schedule off until a new set of parameters are supplied, but I haven't tested this to be sure what parameters would have this effect. If turning the schedule off were a priority, I could run some tests and likely find a workable approach.

@siarhei-agoric
Copy link
Contributor

siarhei-agoric commented Jan 3, 2025

Based on multiple conversations with @gibson042, @mhofman, and @Chris-Hibbert, along with some digging:

  • it should be safe to terminate old auctioneer vats (v39, v353, v367, v661, v665) in any order, regardless of any reference/message loops that they may have
  • at least one auctioneer vat (v39) does NOT have its adminFacet exported by its governor (v38, zcf-b1-9f877-auctioneer.governor)
  • the adminFacet is available within v38's baggage and can be accessed via vat upgrade path
  • we should be able to use @gibson042 's message injection / simulation tool test out vat terminations via both core eval and upgrade paths without having to bring up the whole chain.
  • the actual unborking will be rolled out as an upgrade with couple of special core evals within it to kill off the problematic vats.

Timeline (rough estimates):

  • track down all the remaining adminFacet refs: Jan. 3, 2025
  • use the message injection / simulation tool to test the vat killing: Jan. 6, 2025
  • write up core eval / proposal: Jan. 7, 2025
  • PR reviews: Jan. 8, 2025
  • cut a special release off u18/19 branch: Jan. 9, 2025
  • publish a governing action / proposal: Jan 10, 2025
  • upgrade, vats killed: some time soon after the proposal is voted in and adopted.

siarhei-agoric added a commit that referenced this issue Jan 8, 2025
refs: #10725

Initial set of changes to get auctioneer governors terminated.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants