remove the explicit wal_checkpoint(FULL) #7069

warner · 2023-02-24T18:56:19Z

What is the Problem Being Solved?

The current swingstore DB code uses WAL mode, with synchronous=FULL:

agoric-sdk/packages/swing-store/src/swingStore.js

Lines 203 to 204 in 40dc287

    
           db.exec(`PRAGMA journal_mode=WAL`); 
        
           db.exec(`PRAGMA synchronous=FULL`);

and then during commit(), it does both a DB COMMIT and a wal_checkpoint(FULL):

agoric-sdk/packages/swing-store/src/swingStore.js

Lines 517 to 529 in 40dc287

    
             const sqlCommit = db.prepare('COMMIT'); 
        
             const sqlCheckpoint = db.prepare('PRAGMA wal_checkpoint(FULL)'); 
        
             /** 
        
              * Commit unsaved changes. 
        
              */ 
        
             async function commit() { 
        
               assert(db); 
        
               if (db.inTransaction) { 
        
                 sqlCommit.run(); 
        
                 sqlCheckpoint.run(); 
        
               } 
        
             }

When SQLite is in WAL mode, it records new changes in the separate .wal file, and then periodically does a "checkpoint" to merge these changes back into the main .sqlite file. Read performance is slightly degraded if the WAL file gets too large (readers must look in two places to make sure they're seeing all potential writes), and that only gets fixed by the checkpoint operation. By default, SQLite performs a checkpoint/merge when it reaches some size (1000 pages, probably 1 or 2 MB of data). There are some interlock considerations if there are other readers or writers, and you can choose a checkpoint mode to adjust the aggressiveness of the checkpoint operation. The default of PASSIVE basically does as much work as it can, but will give up on the merge if there are other readers or writers at work. That doesn't cause any immediate problems: eventually there won't be any contention, and the checkpoint can finish to completion.

I originally recommended using the wal_checkpoint(FULL) as a way to get durability safety despite power loss. Upon closer reading of the SQLite WAL-mode docs, I changed my mind, and figured that the synchronous=FULL was sufficient to get durability. But we didn't wind up removing the FULL from the checkpoint step.

@mhofman noticed that wal_checkpoint(FULL) causes a significant stall: it is defined to "until there is no database writer and all readers are reading from the most recent database snapshot". In practice, he found that it causes the process to busywait for 5 seconds, waiting for a timeout, if an export has aread transaction open in another connection. That's 5s during which nothing else happens in the host application, which is pretty painful.

Description of the Design

The fix is to just remove the forced checkpoint. Our choice of PRAGMA synchronous=full gets us a proper flush/fsync after every COMMIT, which means we don't need or care about checkpoints doing their own flush. We can let SQLite perform best-effort checkpoints automatically, and eventually it will manage to clean everything up.

Security Considerations

none

Scaling Considerations

none

Test Plan

none

The text was updated successfully, but these errors were encountered:

We're using `synchronous=FULL`, so we don't need to perform an explicit `wal_checkpoint(FULL)` to obtain durability in the face of power loss. And the FULL-mode checkpoints cause a 5s busy-wait when we have an external reader connection (like the upcoming swing-store exporter, which runs in a thread or subprocess, with it's own DB connection). This commit just removes the call to `PRAGMA wal_checkpoint(FULL)` entirely. closes #7069

warner added enhancement New feature or request SwingSet package: SwingSet labels Feb 24, 2023

warner self-assigned this Feb 24, 2023

warner mentioned this issue Feb 24, 2023

fix(swing-store): remove explicit wal_checkpoint #7070

Merged

mergify bot closed this as completed in #7070 Feb 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

remove the explicit wal_checkpoint(FULL) #7069

remove the explicit wal_checkpoint(FULL) #7069

warner commented Feb 24, 2023

remove the explicit wal_checkpoint(FULL) #7069

remove the explicit wal_checkpoint(FULL) #7069

Comments

warner commented Feb 24, 2023

What is the Problem Being Solved?

Description of the Design

Security Considerations

Scaling Considerations

Test Plan