-
Notifications
You must be signed in to change notification settings - Fork 687
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
journalist notifications: allow 30 minutes variance in reboot times #3479
Conversation
The race condition goes as follows: - machine reboots - notification sent - machine reboots 24h later but reboots 1 minute faster than the previous day - notification is sent 23h59 minutes after the last one and suppressed The notification will be sent the next day but the information it contains will not reflect what happened in the past 24h because the cron job manage.py were-there-submissions-today updates it daily, one hour before the machine reboots. To resolve this race condition (or make it extremely unlikely) we allow for notifications to be sent at most every 23h30. Case 1: machine is slower to reboot the next day: - machine reboots - notification sent - machine reboots 24h later but reboots 5 minutes slower than the previous day - notification is processed 24h05 minutes after the last one, it is more tha 23h30 and therefore the notification is sent Case 2: machine is faster to reboot the next day: - machine reboots - notification sent - machine reboots 24h later but reboots 8 minutes faster than the previous day - notification is processed 23h52 minutes after the last one, it is more tha 23h30 and therefore the notification is sent
Codecov Report
@@ Coverage Diff @@
## develop #3479 +/- ##
========================================
Coverage 85.93% 85.93%
========================================
Files 34 34
Lines 2175 2175
Branches 241 241
========================================
Hits 1869 1869
Misses 250 250
Partials 56 56 Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was a little quick on that approval. One change.
@@ -20,6 +20,6 @@ | |||
cron: | |||
name: Update the number of submissions in the past 24h | |||
job: "{{ securedrop_code }}/manage.py were-there-submissions-today" | |||
special_time: daily | |||
hour: "{{ daily_reboot_time - 1 }}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to have a | default(4)
because prod playbooks won't have that value in them, so current admins will have things break.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also. Why -1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually this is wrong, it must be daily_reboot_time minus one hour (hence the - 1). Mapping [0-23] to 23,[0-22] with (daily_reboot_time + 23) % 24 would work don't you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to have a
| default(4)
because prod playbooks won't have that value in them, so current admins will have things break.
For other reviewers, this statement was false. There is a default in a vars file in ansible, so adding a default
filter is unnecessary.
This will need to be included in the release notes telling admins to run the script because we can't update prod instances, so the bug won't be "resolved" for them. |
Why would admins need to run the script? |
If the sysadmin chose to: a) activate journalist notifications b) modify the reboot time It is possible that the submissions count is older than expected. For instance, if the submissions count is done at midnight and the machine reboots at 11pm, the journalist will receive a submission count that is about 24h old and does not include submissions received in the past 23h00. The information sent to the journalist is correct, only it is older than one would expect. The submissions count is updated one hour before the machine reboots time so it is no more than one hour old, regardless of the reboot time chosen by the admin.
@@ -20,6 +20,7 @@ | |||
cron: | |||
name: Update the number of submissions in the past 24h | |||
job: "{{ securedrop_code }}/manage.py were-there-submissions-today" | |||
special_time: daily | |||
# 0 -> 23, 1 -> 0, 2 -> 1, ... 23 -> 22 | |||
hour: "{{ (daily_reboot_time + 23) % 24 }}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
>>> for x in range(0,24):
... print((x + 23) % 24)
...
23
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
The crontab entry needs to be changed. |
I need to test the jinja2 expression is correct |
shows the jinja expression is correct. I was worried about subtle Ansible / Jinja rules I may have missed. |
@heartsucker right, the admin needs to run |
@dachary I think this is good to merge since there's not an easy way for us to manipulate the crontab other than through the admin script. Anything else is too brittle, and asking admins to fun |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super clear changes, @dachary. Thanks for the detailed comments. @heartsucker already did the heavy lifting on review, so I'll add a 👍. Let's get this in.
@conorsch thanks for the review :-) |
@heartsucker thanks for the merge :-) |
Status
Ready for review
Description of Changes
Fixes: #3478
The race condition goes as follows:
The notification will be sent the next day but the information it
contains will not reflect what happened in the past 24h because the cron job
manage.py were-there-submissions-today
updates it daily, one hour before the machine reboots.
To resolve this race condition (or make it extremely unlikely) we
allow for notifications to be sent at most every 23h30.
Case 1: machine is slower to reboot the next day:
more tha 23h30 and therefore the notification is sent
Case 2: machine is faster to reboot the next day:
more tha 23h30 and therefore the notification is sent
Testing
Testinfra tests changing the time of the app server are implemented. I do not think manual testing is possible for this race condition but I'm open to ideas.
Deployment
./securedrop-admin install
must be run to fix the bugansible-base/roles/app/tasks/setup_cron.yml
updating the number of submissions in the past 24h will change to happen one hour before the reboot instead of midnightChecklist
If you made changes to the server application code:
make ci-lint
) and tests (make -C securedrop test
) pass in the development containerIf you made changes to documentation:
make docs-lint
) passed locally