Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

journalist notifications: allow 30 minutes variance in reboot times #3479

Merged
merged 2 commits into from Jun 5, 2018
Merged

Conversation

ghost
Copy link

@ghost ghost commented May 29, 2018

Status

Ready for review

Description of Changes

Fixes: #3478

The race condition goes as follows:

  • machine reboots
  • notification sent
  • machine reboots 24h later but reboots 1 minute faster than the previous day
  • notification is sent 23h59 minutes after the last one and suppressed

The notification will be sent the next day but the information it
contains will not reflect what happened in the past 24h because the cron job

manage.py were-there-submissions-today

updates it daily, one hour before the machine reboots.

To resolve this race condition (or make it extremely unlikely) we
allow for notifications to be sent at most every 23h30.

Case 1: machine is slower to reboot the next day:

  • machine reboots
  • notification sent
  • machine reboots 24h later but reboots 5 minutes slower than the previous day
  • notification is processed 24h05 minutes after the last one, it is
    more tha 23h30 and therefore the notification is sent

Case 2: machine is faster to reboot the next day:

  • machine reboots
  • notification sent
  • machine reboots 24h later but reboots 8 minutes faster than the previous day
  • notification is processed 23h52 minutes after the last one, it is
    more tha 23h30 and therefore the notification is sent

Testing

Testinfra tests changing the time of the app server are implemented. I do not think manual testing is possible for this race condition but I'm open to ideas.

Deployment

  • ./securedrop-admin install must be run to fix the bug
  • The cron job ansible-base/roles/app/tasks/setup_cron.yml updating the number of submissions in the past 24h will change to happen one hour before the reboot instead of midnight
  • As a result it is possible that the journalist notification does not reflect the

Checklist

If you made changes to the server application code:

  • Linting (make ci-lint) and tests (make -C securedrop test) pass in the development container

If you made changes to documentation:

  • Doc linting (make docs-lint) passed locally

The race condition goes as follows:

  - machine reboots
  - notification sent
  - machine reboots 24h later but reboots 1 minute faster than the previous day
  - notification is sent 23h59 minutes after the last one and suppressed

The notification will be sent the next day but the information it
contains will not reflect what happened in the past 24h because the cron job

   manage.py were-there-submissions-today

updates it daily, one hour before the machine reboots.

To resolve this race condition (or make it extremely unlikely) we
allow for notifications to be sent at most every 23h30.

Case 1: machine is slower to reboot the next day:

  - machine reboots
  - notification sent
  - machine reboots 24h later but reboots 5 minutes slower than the previous day
  - notification is processed 24h05 minutes after the last one, it is
    more tha 23h30 and therefore the notification is sent

Case 2: machine is faster to reboot the next day:

  - machine reboots
  - notification sent
  - machine reboots 24h later but reboots 8 minutes faster than the previous day
  - notification is processed 23h52 minutes after the last one, it is
    more tha 23h30 and therefore the notification is sent
@codecov-io
Copy link

codecov-io commented May 29, 2018

Codecov Report

Merging #3479 into develop will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff            @@
##           develop    #3479   +/-   ##
========================================
  Coverage    85.93%   85.93%           
========================================
  Files           34       34           
  Lines         2175     2175           
  Branches       241      241           
========================================
  Hits          1869     1869           
  Misses         250      250           
  Partials        56       56

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4bd7d8f...3ef83d5. Read the comment docs.

Copy link
Contributor

@heartsucker heartsucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was a little quick on that approval. One change.

@@ -20,6 +20,6 @@
cron:
name: Update the number of submissions in the past 24h
job: "{{ securedrop_code }}/manage.py were-there-submissions-today"
special_time: daily
hour: "{{ daily_reboot_time - 1 }}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to have a | default(4) because prod playbooks won't have that value in them, so current admins will have things break.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also. Why -1?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this is wrong, it must be daily_reboot_time minus one hour (hence the - 1). Mapping [0-23] to 23,[0-22] with (daily_reboot_time + 23) % 24 would work don't you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to have a | default(4) because prod playbooks won't have that value in them, so current admins will have things break.

For other reviewers, this statement was false. There is a default in a vars file in ansible, so adding a default filter is unnecessary.

@heartsucker
Copy link
Contributor

This will need to be included in the release notes telling admins to run the script because we can't update prod instances, so the bug won't be "resolved" for them.

@ghost
Copy link
Author

ghost commented May 29, 2018

telling admins to run the script

Why would admins need to run the script?

If the sysadmin chose to:

a) activate journalist notifications
b) modify the reboot time

It is possible that the submissions count is older than expected. For
instance, if the submissions count is done at midnight and the machine
reboots at 11pm, the journalist will receive a submission count that
is about 24h old and does not include submissions received in the past
23h00.

The information sent to the journalist is correct, only it is older
than one would expect.

The submissions count is updated one hour before the machine reboots
time so it is no more than one hour old, regardless of the reboot time
chosen by the admin.
@@ -20,6 +20,7 @@
cron:
name: Update the number of submissions in the past 24h
job: "{{ securedrop_code }}/manage.py were-there-submissions-today"
special_time: daily
# 0 -> 23, 1 -> 0, 2 -> 1, ... 23 -> 22
hour: "{{ (daily_reboot_time + 23) % 24 }}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

>>> for x in range(0,24):
...     print((x + 23) % 24)
... 
23
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22

@heartsucker
Copy link
Contributor

Why would admins need to run the script?

The crontab entry needs to be changed.

@ghost ghost changed the title journalist notifications: allow 30 minutes variance in reboot times WIP: journalist notifications: allow 30 minutes variance in reboot times May 29, 2018
@ghost
Copy link
Author

ghost commented May 29, 2018

I need to test the jinja2 expression is correct

@ghost
Copy link
Author

ghost commented May 29, 2018

( echo --- ; echo daily_reboot_time: 0 ) > var.yml ; ansible [email protected] -ehour='{{( daily_reboot_time + 23) % 24 }}' localhost -m debug -a "var=hour"

shows the jinja expression is correct. I was worried about subtle Ansible / Jinja rules I may have missed.

@ghost ghost changed the title WIP: journalist notifications: allow 30 minutes variance in reboot times journalist notifications: allow 30 minutes variance in reboot times May 29, 2018
@ghost
Copy link
Author

ghost commented May 29, 2018

The crontab entry needs to be changed.

@heartsucker right, the admin needs to run ./securedrop-admin install to fix the problem good point

@heartsucker
Copy link
Contributor

@dachary I think this is good to merge since there's not an easy way for us to manipulate the crontab other than through the admin script. Anything else is too brittle, and asking admins to fun securedrop-admin install periodically to fix non-critical bugs seems reasonable. I think spending additional development time on tweaking postinst to make this work is not worth the benefits.

Copy link
Contributor

@conorsch conorsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super clear changes, @dachary. Thanks for the detailed comments. @heartsucker already did the heavy lifting on review, so I'll add a 👍. Let's get this in.

@ghost
Copy link
Author

ghost commented Jun 5, 2018

@conorsch thanks for the review :-)

@heartsucker heartsucker merged commit 4a4122d into freedomofpress:develop Jun 5, 2018
@ghost
Copy link
Author

ghost commented Jun 6, 2018

@heartsucker thanks for the merge :-)

@emkll emkll mentioned this pull request Jun 13, 2018
12 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

journalist notifications: races against reboot time
3 participants