Skip to content

Commit

Permalink
Document main flows with state diagrams
Browse files Browse the repository at this point in the history
  • Loading branch information
ghostwords committed Sep 25, 2023
1 parent f187c31 commit e981f18
Showing 1 changed file with 116 additions and 0 deletions.
116 changes: 116 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

Runs distributed [Badger Sett](https://github.com/EFForg/badger-sett) scans on Digital Ocean. Yes, a group of badgers is called a _cete_, but "swarm" just sounds better.


## Setup

1. Check out this repository
Expand All @@ -13,3 +14,118 @@ Runs distributed [Badger Sett](https://github.com/EFForg/badger-sett) scans on D
7. Run `./main.sh` to initiate a new run.

Once you are told the run is resumable, you can stop the script with <kbd>Ctrl</kbd>-<kbd>C</kbd> and then later resume the in-progress run with `./main.sh -r`.


## Architecture

Badger Swarm converts a Badger Sett scan of X sites into N Badger Sett scans of X/N sites. This makes medium scans complete as quickly as small scans, and large scans complete in a reasonable amount of time.

Once a run is confirmed, scans get initialized in parallel. Each scan instance receives their portion of the site list.

```mermaid
stateDiagram-v2
[*] --> ConfirmRun
state fork1 <<fork>>
ConfirmRun --> fork1
fork1 --> BadgerInit1
fork1 --> BadgerInit2
fork1 --> BadgerInitN
state InitScans {
cr1: CreateDroplet
cr2: CreateDroplet
cr3: CreateDroplet
dep1: InstallDependencies
dep2: InstallDependencies
dep3: InstallDependencies
sta1: StartScan
sta2: StartScan
sta3: StartScan
state BadgerInit1 {
[*] --> cr1
cr1 --> dep1
dep1 --> UploadSiteList1
UploadSiteList1 --> sta1
sta1 --> [*]
}
--
state BadgerInit2 {
[*] --> cr2
cr2 --> dep2
dep2 --> UploadSiteList2
UploadSiteList2 --> sta2
sta2 --> [*]
}
--
state BadgerInitN {
[*] --> cr3
cr3 --> dep3
dep3 --> UploadSiteListN
UploadSiteListN --> sta3
sta3 --> [*]
}
}
state join1 <<join>>
BadgerInit1 --> join1
BadgerInit2 --> join1
BadgerInitN --> join1
join1 --> [*]
```

The run is now resumable. Scans are checked for progress and status (errored/stalled/complete) in parallel.

- If a scan fails, its instance is deleted and the scan gets reinitialized.
- When a scan fails to progress long enough, it is considered stalled. Stalled scans get restarted, which mostly means they get to keep going after skipping the site they got stuck on.
- When a scan finishes, the results are extracted and the instance is deleted.

This continues until all scans finish.

```mermaid
stateDiagram-v2
[*] --> PollForStatus
state fork2 <<fork>>
PollForStatus --> fork2
fork2 --> CheckBadgerScan1
fork2 --> CheckBadgerScan2
fork2 --> CheckBadgerScanN
state ManageScans {
state CheckBadgerScan1 {
[*] --> CheckForFailure
CheckForFailure --> CheckForStall
CheckForStall --> ExtractProgress
ExtractProgress --> [*]
}
--
state CheckBadgerScan2 {
[*] --> [*]
}
--
state CheckBadgerScanN {
[*] --> [*]
}
}
state join2 <<join>>
CheckBadgerScan1 --> join2
CheckBadgerScan2 --> join2
CheckBadgerScanN --> join2
join2 --> PrintProgress
state check1 <<choice>>
PrintProgress --> check1
check1 --> MergeResults : All scans finished
check1 --> PollForStatus : One or more scans still running
MergeResults --> [*]
```

On completion scan results are merged by Privacy Badger as if each result was manually imported on the Manage Data tab on Privacy Badger's options page.

0 comments on commit e981f18

Please sign in to comment.