-
Notifications
You must be signed in to change notification settings - Fork 713
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Misc fixes for LOD syncing and upgrade Morango #11525
Misc fixes for LOD syncing and upgrade Morango #11525
Conversation
Build Artifacts
|
These are just logs after just one LoD import: it synced Ok for about 90 minutes, and then run into the usual wall 🤷🏽♀️ |
Hi @bjester adding additional logs for 3 consecutively created Android LODs: https://drive.google.com/drive/folders/1__X6gsdgSh-s6UtdwvSk8qg-NiUG69FJ?usp=sharing I'm still not able to pinpoint the exact cause, hoping the logs will give you a better picture of what's going on! |
With the latest assets from yesterday, the first LoD (Android13) was syncing perfectly (user import, initial and new assignments download, added resources, etc.) for about 2 hours. But about 20 minutes after I synced another LoD (Android12, user created on the server's facility this time), the first one stopped, as in previous tests... 😞 On the second LoD (Android12), the cards of class assignments appeared much faster on the Home screen (in less than a minute), but the resources did not download at all (but the status on both the Coach side and on LoD would state Synced): every time I would try to tap on the card to actually open the lesson or a quiz, it would forward me to the Library screen. Eventually, the second device also stopped syncing, less than an hour after the setup, and I never got to see and interact with any of the resources. Android11-michc-db-logs.zip, (this is actually Android 12, I forgot it went through an upgrade) |
Since yesterday it took us some time with @rtibbles to figure out some subtleties of using
|
I am seeing this sequence in the logs:
The first is the current soud_sync_processing task being successfully completed, the next is the enqueueing of the follow up task (now that its prerequisite has completed) - however, I'm never seeing that task get processed. Unless something very strange is happening with WorkManager, the only thing I can think of is if somehow we are setting the delay to a huge increment. |
Just to confirm that even the very beginning of the learner import process on another LoD seems to interrupt the syncing of the LoD that was previously syncing correctly. LoD with the learner
Android 9 tablet is currently still syncing... |
Kolibri logs from the server and the 2 LoDs in this latest round of syncing. Windows10-server-db-logs.zip |
This is the correlated log sequence to the logcat logs (noting that the Python logs are in UTC, while the logcat logs are in CET, hence the time discrepancy):
So, this shows the same story of the task successfully completing and the follow up being enqueued, but it then never happens. |
Here are my logs from today: https://drive.google.com/drive/folders/1KFFHncnxAsezcemNOxIDr1Ti0qMv2hkZ?usp=sharing |
Promising results with the latest asset on a non-Xiaomi device: leaving the Kolibri app in the background did not interrupt syncing, and it was constant after various intervals of backgrounding. 👏🏽 After that I left both devices for the night (6-7 hours approximately) without closing the app. Upon resume the server reported last sync 7 hours ago, and the app Not recently synced . I re-engaged a resource but the sync was not resuming. Only after the server restart was the sync resumed and the most recent interactions recorded in class activity. |
I first started with testing only with 1 Android LOD - unfortunately it did stop syncing after several hours of working just fine and without an obvious reason to do so. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All makes sense to me.
I think the only other thing we can do on the Kolibri side here is making our job management more robust to job runner failure (which would be useful for both the Android App, and Cloud installations). Some sort of 'last updated' for a job or somesuch might be helpful to give us a heuristic to be able to reset tasks that report as running but are not really.
# since there should only ever be one processing job running at a time, if we encounter any in | ||
# the queue that are marked as syncing, we should reset their status to pending because it must | ||
# mean that the previous job was terminated unexpectedly | ||
SyncQueue.objects.filter(status=SyncQueueStatus.Syncing,).update( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@bjester here are the logs for an Android LOD that stopped syncing again after about 2 hours of syncing correctly: https://drive.google.com/drive/folders/1kfOwOcq7wrJhVKtXO49ykUx3Din4sJtL |
@pcenov Richard noticed this in the Android logs
Did you perhaps silence or disable the Kolibri notifications? |
@bjester yes, think I had the notifications disabled on this particular Samsung device. But looking closely at this it can also be an issue on Samsung's end as I don't actually see an option to turn the notifications back on and could be caused by a recent update to Samsung's One UI 5.1 (Android 13). |
Summary
JobRunning
Syncing
since there should only ever be one job running that code at a time, and records with that status indicate it was terminated unexpectedlyReferences
Resolves #11524
Includes learningequality/morango#205
Reviewer guidance
Sync
Testing checklist
PR process
Reviewer checklist
yarn
andpip
)