Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ossec: resolve journalist notification racing with reboots #3374

Merged
merged 4 commits into from May 9, 2018
Merged

ossec: resolve journalist notification racing with reboots #3374

merged 4 commits into from May 9, 2018

Conversation

ghost
Copy link

@ghost ghost commented May 5, 2018

Status

Ready for review

Description of Changes

Mitigates: #3368

The app server is rebooted every 24h and will send a notification at
boot time. The ossec server is also rebooted and will immediately send
the email to the journalist, regardless of when the previous mail was
sent (mail frequency is not a feature of ossec-maild). Always running
the localfile command at boot time is an undocumented OSSEC behavior
ossec/ossec-hids#1415 in 2.8.2 as well as
2.9.3.

This guarantees exactly one mail will be sent daily.

Setting the 25 hours frequency element is a safeguard:

  • against the following race a) command runs because the 24h period
    expires, b) the server reboots shortly after because it reboots
    every 24h, c) command runs again after the server is rebooted,
    causing two notifications to be sent in a row

  • in case the server does not reboot for some reason, the notification
    will still be sent every 25h

Fixes: #3367

Testing

  • Reboot the app server and verify the notification is sent

Deployment

N/A

Checklist

If you made changes to documentation:

  • Doc linting (make docs-lint) passed locally

@ghost ghost added this to the 0.7 milestone May 5, 2018
@ghost ghost requested review from conorsch and msheiny as code owners May 5, 2018 14:31
@codecov-io
Copy link

codecov-io commented May 5, 2018

Codecov Report

Merging #3374 into develop will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff            @@
##           develop    #3374   +/-   ##
========================================
  Coverage    85.79%   85.79%           
========================================
  Files           34       34           
  Lines         2154     2154           
  Branches       238      238           
========================================
  Hits          1848     1848           
  Misses         250      250           
  Partials        56       56

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 616f76e...0e5694b. Read the comment docs.

@ghost
Copy link
Author

ghost commented May 5, 2018

@emkll the second commit of this PR ossec: never send more than one journalist notification per 24h is good to have but I'm not sure if we want it for 0.7.0. My concern is about adding code that does not fix a bug, that close to the release. In practice journalists who could be inconvenienced would be the one associated with a sysadmin that not only configures journalists notifications but also manually reboots for some reason.

@@ -108,7 +108,7 @@
<localfile>
<log_format>full_command</log_format>
<command>head -1 /var/lib/securedrop/submissions_today.txt | grep '^[0-9]*$'</command>
<frequency>86400</frequency>
<frequency>90000</frequency> <!-- 25 hours -->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question re: #3368 - does change mean that this localfile won't be considered a daily notification and thus do_not_group will not apply?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah okay nevermind, I see that the corresponding rule in local_rules.xml is using rule_id=400600

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly. It's a little confusing but becomes clearer when you realize the mail alerts preferences are only read and interpreted by ossec-maild when parsing the alert logs.

@emkll
Copy link
Contributor

emkll commented May 7, 2018

Thanks @dachary for the quick fixes! I agree that we should minimize changes as much as possible so close to release. I've tested the first commit of this PR(ed9f041), initial testing confirms that this will prevent duplicate alerts if by ensuring the scheduled notification never happens right before server reboot.

Notifications are still sent after each reboot, if and only if /var/lib/securedrop/submissions_today.txt contains a value. Because it is not populated at install time, it will take between 24 and 48h to get a first journalist notification (as the first run occurs via cron, and not on install), which perhaps why we didn't catch this reboot behavior when testing the PR.

Note that there is an ignore attribute that could be use as a server rule, but unfortunately maxes out at 9999 seconds, making it ineffective in this scenario. Let's discuss this at standup today.

@ghost
Copy link
Author

ghost commented May 7, 2018

@redshiftzero @emkll removed the extra commit (and test does not need to run because it already did on the first commit ;-)

@redshiftzero
Copy link
Contributor

Okay after looking at this more and taking a step back, the choices are:

First commit only: Minimal change that indeed resolves #3367. This is a clever fix. The thing is it also produces behavior worthy of another bug report "Additional reboots send submission notifications to journalists" (I know that a note in the documentation is added in this PR, but some admins will likely miss the note and it is odd and unexpected behavior unless one is familiar with the implementation). The other issue is that these additional notifications might actually be incorrect - indicating that submissions have or have not occurred in the last 24 hours when the opposite is true (since the submissions_today.txt file has not been updated on the application server).

Both commits: Much larger diff in functionality and thus high risk given the time we have to test before release. It does resolve both issues however.

Am I understanding this correctly? If I am, I think we actually have to merge both commits to resolve all issues prior to release....

@ghost
Copy link
Author

ghost commented May 7, 2018

@redshiftzero I'd rather stick to the simpler fix.

  • The risk of introducing a regression with the larger fix are too high this close to the release, despite testing. The functionality being optional, it won't impact all SecurDrop instance. And in the subset of instances that activate it, it will only affect the even smaller subset (those that reboot more than every day).
  • If we go for the larger patch, we take the risk of announcing a functionality that is broken for everyone trying it because we missed something.

That being said, I'm deferring to your better judgement :-)

@redshiftzero
Copy link
Contributor

The risk of introducing a regression with the larger fix are too high this close to the release, despite testing.

Totally agree.

Well, the silver lining to pushing back the release by a week due to other significant issues (#3316), is that we now have time to carefully QA both commits, and this will fix all outstanding issues (if something doesn't work as we expect, we should have enough time to resolve it). Thanks for your work on this @dachary.

@ghost
Copy link
Author

ghost commented May 8, 2018

The fix enforcing a 24h delay was re-pushed

@emkll
Copy link
Contributor

emkll commented May 8, 2018

Thanks @dachary, seems like the first commit is working as intended, I receive emails upon every reboot, as well as at 4AM local time when my instance is scheduled to reboot.

I've been testing the 2nd commit of this PR (856b180), but I still receive more than 1 notification per 24h
screenshot from 2018-05-08 10-18-20

I suspect this is due to permissions on the /var/ossec/tmp/ folder, owned by root, for which the ossec group has no write permissions, and as such can't write the stamp file:

root@mon:/var/ossec/tmp# ls -lah
total 8.0K
dr-xr-x---   2 root ossec 4.0K May  8  10:16 .
dr-xr-x--- 14 root ossec 4.0K May  8  09:49 ..

It also seems like /var/ossec/tmp/journalist_notification_sent.stamp is not being explicitely ignored by syscheck in ossec.conf, so I suspect it would trigger some alerts if/when the .stamp file changes.

@ghost
Copy link
Author

ghost commented May 8, 2018

Added the make journalist notifications resilient to double ossec alerts commit to mitigate #3368 since we're unable to reproduce it.

@ghost
Copy link
Author

ghost commented May 8, 2018

@emkll good catch! I moved it to the logs directory because it's already writable by the ossec user.

docs/install.rst Outdated
@@ -75,6 +75,10 @@ worth checking the *Journalist Interface*. For this you will need:
the GPG private key, it is not possible to specify multiple
GPG keys.

.. note:: The notification is sent after the daily reboot of the
*Application Server*. If it is manually rebooted, additional
notifications will be sent.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: notification -> journalist notification (for clarity since this section of the documentation talks about OSSEC alerts nearby and people can get confused)

Also: We should remove "If it is manually rebooted, additional notifications will be sent." since it is resolved in 06b18b2

@ghost
Copy link
Author

ghost commented May 9, 2018

@redshiftzero thanks for catching the documentation inconsistency. Fixed and repushed :-)

Loic Dachary added 4 commits May 9, 2018 10:45
The app server is rebooted every 24h and will send a notification at
boot time. The ossec server is also rebooted and will immediately send
the email to the journalist, regardless of when the previous mail was
sent (mail frequency is not a feature of ossec-maild). Always running
the localfile command at boot time is an undocumented OSSEC behavior
ossec/ossec-hids#1415 in 2.8.2 as well as
2.9.3.

This guarantees exactly one mail will be sent daily.

Setting the 25 hours frequency element is a safeguard:

* against the following race a) command runs because the 24h period
  expires, b) the server reboots shortly after because it reboots
  every 24h, c) command runs again after the server is rebooted,
  causing two notifications to be sent in a row

* in case the server does not reboot for some reason, the notification
  will still be sent every 25h

Fixes: #3367
Under some circumstances daily journalist notifications may be grouped
with other ossec alerts. In all cases where this transient error was
observed, a well formed journalist notification alert was also
included in the payload. By changing the regular expression we make
the script resilient to payloads that contain unrelated content.

Mitigates #3368
@emkll
Copy link
Contributor

emkll commented May 9, 2018

Thanks @dachary for the quick fixes. I tested the latest changes and can confirm this addresses the issues we've been seeing in the 0.7.0 RCs.

One behavior that I've noticed is as follows:
Admin adds journalist notifications, instance reboots before daily reboot time, the notification triggers. Following notification occurs only at the 2nd daily reboot time. It seems that addressing this edge case would introduce a lot of complexity and I perhaps we should leave it as-is.

I'd appreciate if someone else could do a quick sanity-check/review of this PR, but consider it approved from my perspective.

Copy link
Contributor

@conorsch conorsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. While I have not tested the resolution of the race in VMs, the changes appear well structured. Thanks very much for the verbose commit messages, @dachary—will make future changes manageable.

shopt -s -o xtrace
PS4='${BASH_SOURCE[0]}:$LINENO: ${FUNCNAME[0]}: '

echo BUGOUS | main test_send_encrypted_alarm | \
echo BUGOUS | handle_notification test_send_encrypted_alarm | \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: should this be "BOGUS", as in a fake submission? (I realize this came in via #2803, just noting for clarity's sake.)

grep --count 'notification suppressed' /var/log/syslog > /tmp/after
test $(cat /tmp/before) -lt $(cat /tmp/after)
""")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great tests! Thanks for the careful attention here!

@emkll
Copy link
Contributor

emkll commented May 9, 2018

Thanks for the quick review @conorsch , merging!

@emkll emkll merged commit fb3772f into freedomofpress:develop May 9, 2018
@ghost
Copy link
Author

ghost commented May 9, 2018

@conorsch @emkll thanks for the careful review :-)

Admin adds journalist notifications, instance reboots before daily reboot time, the notification triggers. Following notification occurs only at the 2nd daily reboot time. It seems that addressing this edge case would introduce a lot of complexity and I perhaps we should leave it as-is.

I second your opinion @emkll

@emkll emkll mentioned this pull request May 10, 2018
21 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants