-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[scanner integration] Enable landing page scanner and collect baseline scan data #518
Comments
Running |
Heads up the cron job should not generate a ton of data because the scanning should only add new rows when a scan result changes (see freedomofpress/securethenews#84 (comment)). |
Running now, will report results. |
🔴 Negative:
From that output it's not obvious which entry triggered the additional result. Re-running with |
Just encountered this bug myself. We started allowing directory entries that have the same domain, but didn't update the scanner code to account for it. I see two plausible solutions here:
My preference is 1, right now. I could imagine scenarios down the road where we might want the scanner to run separately on instances with the same domain. For example, if the scanner someday scans onion services as well as landing pages, we might have entries that share a domain but have different onion services and therefore would want to be scanned separately. |
I'm confused by your use of "domain" here. We have entries that share the same onion URL ([org example redacted]) but AFAICT we have no entries that share the same landing page URL. So if the scanner currently dies in the [redacted] case, it most definitely should scan both, since they're different landing pages. |
@eloquence You're right—we have shared onion URLs, not shared domains. Now I'm not totally sure again what's going on. I'll investigate more. 🕵🏻♀️ |
I've identified the cause of this error: [org example redacted] My favored solution right now, actually, is to tell
|
Thanks, @harrislapiroff . Since we currently don't have the problem you describe (at least as far as we know), now that the offending entry has been fixed, I'm assuming the |
Re-running |
Success!
Please inspect results on the live site now, @eloquence! |
The results on the live site are mostly what I'd expect:
It would still be very useful to generate a CSV from the latest results, for more systematically assessing how common certain issues are (e.g., use of subdomains) across the entire set of sites in the directory. @harrislapiroff could you give an initial estimate of how much work that would be? |
Just making a note that the relevant PR that we're committed to finishing as part of the 8/8-8/22 sprint is #527. Would also be great to then do a prod run of the |
@conorsch ran a fresh scan and provided baseline scan results (below, CSV). As such this task is complete for the current sprint, closing. In principle we're ready to enable regular runs; filing a separate infra ticket for that. For now, will add observations to relevant tickets. |
As one of the preparatory steps for #488, we should enable the existing landing page scanner and collect baseline data for all current SecureDrop directory entries. This will also take the scanner through its paces and help us discover operational issues that need to prioritized prior to a full integration.
Provided the scanner is stable, as part of this task, we should generate a CSV of the scan results for all landing pages.
Note that all code that displays information on SecureDrop.org based on scan results must be/remain disabled. Scan results should only be visible in the Wagtail admin interface, and be otherwise without consequence for the site.
The text was updated successfully, but these errors were encountered: