You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the journalist interface, if you select a large number of sources and delete them, the operation can take longer than the Apache timeout. Some sources' store directories will have been moved to the shredder before the failure. Thereafter, if you try to delete them, a ValueError will be thrown at line 246 of store.py, in move_to_shredder. The unhandled exception prevents the deletion of the source record, and this is how we get the zombie apocalypse.
Steps to Reproduce
In an environment using Apache (staging, prod VMs, QA hardware):
Add 500 sources with qa_loader.py --source-count 500.
Log in as a journalist.
Select all sources and click delete.
Expected Behavior
That the sources would be deleted without error.
Actual Behavior
You get a gateway timeout. Navigating back to the source list and trying again results in an internal server error, with a stacktrace in /var/log/apache2/journalist-error.log.
Comments
The fix might be as simple as checking for the existence of the source's store directory in journalist_app.utils.delete_collection and only calling move_to_shredder if it still exists. While there, the key pair deletion should be checked as well, so that if it's already gone, the source database record is still deleted.
The text was updated successfully, but these errors were encountered:
Since it's not a regression, agreement today was to undertake a timeboxed (~2 hour) attempt to resolve for 1.3.0; if it ends up being more complex, will likely bump to 1.4.0.
To no one's surprise, it turned out to be more complex. When the original deletion request times out, the mod_wsgi process is still seeing it to completion, because it doesn't care that Apache has given up.
Any workaround to press on past nonexistent source store directories or keypairs, can still fail when the source is queried in the final steps of journalist_app.utils.delete_collection, and has been deleted. A journalist could keep backing up to the source list, refreshing, selecting all and deleting, and it might look like their efforts are making headway as the list shrinks, but in fact they'll just be tripping over the failures while the initial request is still churning through the deletions.
We could lengthen the Apache timeout, or introduce a request timeout in the mod_wsgi configuration, but neither is a sure fix, and could introduce other problems. The right thing to do is ensure we generate a response to this request in a reasonable timeframe.
To that end, I'm going to look at adding a deleted flag to Source. Updating that should be quick. Another background worker process will periodically scan for deleted sources and do the work that is currently done in delete_collection. For now, we'll just omit deleted sources from the journalist interface.
For the sake of clarity, per chat w/ Jen, we'll keep this on the sprint for John to continue to investigate, during the 5/6-5/20 sprint, but we're still keeping it on the 1.4.0 (not 1.3.0) milestone, so it is not subject to the release/QA timeline pressure.
Description
In the journalist interface, if you select a large number of sources and delete them, the operation can take longer than the Apache timeout. Some sources' store directories will have been moved to the shredder before the failure. Thereafter, if you try to delete them, a
ValueError
will be thrown at line 246 ofstore.py
, inmove_to_shredder
. The unhandled exception prevents the deletion of the source record, and this is how we get the zombie apocalypse.Steps to Reproduce
In an environment using Apache (staging, prod VMs, QA hardware):
qa_loader.py --source-count 500
.Expected Behavior
That the sources would be deleted without error.
Actual Behavior
You get a gateway timeout. Navigating back to the source list and trying again results in an internal server error, with a stacktrace in
/var/log/apache2/journalist-error.log
.Comments
The fix might be as simple as checking for the existence of the source's store directory in
journalist_app.utils.delete_collection
and only callingmove_to_shredder
if it still exists. While there, the key pair deletion should be checked as well, so that if it's already gone, the source database record is still deleted.The text was updated successfully, but these errors were encountered: