Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add side loading for single user sync of data that does not fall within single user scope #8079

Closed
jamalex opened this issue May 12, 2021 · 7 comments · Fixed by #8219
Closed
Assignees
Labels
P0 - critical Priority: Release blocker or regression
Milestone

Comments

@jamalex
Copy link
Member

jamalex commented May 12, 2021

This issue shall serve as the tech spec for #8038

The partitions for Lessons, Quizzes, and their respective Assignments do not fall within the scope of an individual user syncing certificate, and hence need to be synced across to a SoUD through a parallel (non-Morango) mechanism. However, this can still happen within the context of a Morango sync session, and use the open sync session as a means of authentication.

The proposal is to add a new public API endpoint in Kolibri, /api/public/assignment_sync/<user_id>, that can handle HEAD requests (to check the server's current ETag value before sending a bunch of data), GET requests (to retrieve the current set of assigned resources), and POST requests (to send the current set of assigned resources).

When the SoUD is the client, initiating a sync with a full-facility server:

  • After the "pull" phase of the sync, the client calculates its own ETag, then sends an ETag-conditional GET request to /api/public/assignment_sync/<user_id>.
  • The server calculates its own ETag, and if it matches the client-provided value, returns a 304 Not Modified.
  • Otherwise, it returns the serialized data for the assigned resources. If this happens, the client removes all Lesson/Quiz assignments it has stored for the user, and creates the records that it received.

When the full-facility device is the client, initiating a sync with the SoUD:

  • After the "push" phase of the sync, the client (full-facility device) calculates its own ETag, then sends a HEAD request to the SoUD's /api/public/assignment_sync/<user_id> to check its ETag.
  • If they match, it continues merrily.
  • If they don't match, it makes a GET request to the endpoint, then removes and replaces its local resources with the response.

All of these requests will include a header with the Sync Session ID. The server should validate that:

  • The corresponding Sync Session exists and is still active. Otherwise, return a 401 Unauthorized.
  • The user_id is for a user that is within scope of the Sync Sessions certificates. Otherwise, return a 403 Forbidden.
  • When it is validating records received, ensure the partition is correct, and the ID matches the computed ID. Otherwise, return a 400 Bad Request.

The proposed structure of the serialized payload is:

{
    "lessons": [{<lesson_1_serialization>}, {<lesson_2_serialization>}, ...],
    "quizzes": [{<quiz_1_serialization>}, {<quiz_2_serialization>}, ...]
}

There should be no need to sync the *Assignment objects themselves, as they are implicit in the lists of resources (these are all-inclusive lists of the resources currently assigned, so assignments can be deleted or created based on presence of a resource in here).

Note: when quiz/lesson resource and assignment model instances are created on a SoUD, they should be marked as "not dirty" to prevent unnecessary serialization (that could also cause issues if the device later synced the full facility). Similarly, the signal that tracks model deletion should be bypassed when they are removed.

@jamalex
Copy link
Member Author

jamalex commented Jun 21, 2021

We realized an issue with the current specifications, for some more complex scenarios involving SoUDs syncing with more than one full-facility servers that may themselves not be fully synced with one another (and used by different teachers). This is a tangible reality for S2S, where there are 3 teachers per school each with their own laptop for different subjects, with the same facilities and user accounts across them (but only syncing with one another intermittently). Learner tablets will connect with a particular teacher laptop depending on which teacher they're currently visiting.

The scenario of concern is as follows:

  • Student is assigned Lesson X on Laptop A
  • Student tablet syncs with Laptop A, and Lesson X is created on the tablet
  • Student is assigned Lesson Y on Laptop B
  • Before Laptop A and B have synced with each other, student tablet syncs with Laptop B.
  • Because Lesson X isn't assigned to the student on Laptop B, it is removed from the tablet.

The proposed solution is to go back to an earlier-brainstormed approach that involves creating a syncable record of each assignment that is specific to the individual student, and is scoped into their single-user read-only partition.

The proposed approach is:

  • Create a new model, IndividualAssignment, with the following fields:
    • kind: "lesson" or "exam" (or "quiz"? we seem to still use both terms across the backend)
    • user: foreign key to FacilityUser to which thing is assigned
    • target_id: ID of Exam or Lesson that has been assigned
    • serialized: the output of target.serialize() where target is the Lesson or Exam instance
  • During an individual user sync, at some stage prior to data being synced onto the limited device, the full device scans through all the Lesson/Quiz and IndividualAssignment objects, for the specific user being synced, that exist in its database, and does the following:
    • Creates new IndividualAssignment objects for any Lessons/Exams that are assigned to the user but don't currently have an IndividualAssignment on this device. The Lesson or Exam objects are serialized into the serialized field.
    • Removes any IndividualAssignment objects that exist on this device for Exams/Lessons not currently assigned to the user on this device.
  • These IndividualAssignment objects (or the deletion thereof) will then be synced to the limited device through standard Morango mechanisms along with the rest of the user data.
  • Then, the limited device will to the inverse: look for any ExamAssignment/Exam and LessonAssignment/Lesson objects it needs to create or delete based on the current set of IndividualAssignment objects it has. When creating/deleting these, it skips setting the dirty bits (and bypasses the deletion signal listener) so that these won't be reserialized later into the Morango Store.

The approach above gains its efficiency/correctness from being able to lean on Morango's "history tracking", to know when a record on one side vs the other has changed relative to a common ancestor. Specifically, it would address the scenario described above in the following way (where "IA" is shorthand for "IndividualAssignment"):

  • Student is assigned Lesson X on Laptop A
  • Student tablet syncs with Laptop A, triggering the creation of IA X, which syncs to the tablet and thereby creates and assigns Lesson X
  • Student is assigned Lesson Y on Laptop B
  • Before Laptop A and B have synced with each other, student tablet syncs with Laptop B, triggering the creation of IA Y, which is synced to the tablet, creating and assigning Lesson Y.
  • Because Laptop B did not have IA X, it did not modify or delete IA X, and hence it doesn't sync anything to the tablet that impacts the assignment of Lesson X.

On the other hand, if Laptop A and B had already synced, and then a teacher un-assigned Lesson X on Laptop B, before syncing with the tablet, the IA X that had already been synced to Laptop B would now be deleted when the tablet syncs with Laptop B, and the deletion would be propagated to the tablet as desired.

Working through a slight variation in the scenario:

  • Student is assigned Lesson X on Laptop A
  • Laptop A syncs to Laptop B, so now the assignment exists on both laptops (but no IAs have yet been created for it)
  • Tablet syncs with Laptop A, triggering the creation of IA X, which syncs to the tablet.
    • If the tablet now syncs with Laptop B, a new instance of IA X will be created and synced across there, but it will merge smoothly with the existing IA.
    • If the assignment were deleted on Laptop B prior to syncing with the tablet, then based on the design being proposed here, no IA would be affected/deleted, so the action of unassigning would not be propagated across to the tablet (until later, when Laptop A and B had synced with each other, bringing the IA and assignment deletion onto each of them, and then the tablet again synced with one of them).

So the tradeoff with the new approach (as proposed here) is that assigned materials may not disappear from limited devices as quickly. This seems better than the alternative (having stuff disappear prematurely), particularly because it's a much less likely scenario for our currently known primary use case, as the multiple school servers for S2S will largely be used to assign disjoint sets of lessons (as they're managed by different teachers, of different subjects). And worst case, it just means having some lessons stick around a bit longer. However, it's still not 100% ideal. The only approach I can think of that would work around this completely would be to have IA creation happen at the time of assigning or unassigning (rather than on-demand during syncing, for that particular user), based on signal listeners or similar. My main concerns with doing that would be increased database overhead every time an operation happens that impacts Lesson/Quiz or Assignment objects, rather than being as focal or on-demand.

@jamalex
Copy link
Member Author

jamalex commented Jun 21, 2021

A parallel question for @bjester around how best to tie something like this into the middleware/scenario operations sequencing stuff: what would be the best way to have specific operations that occur on the full and limited devices (a different operation on each) as part of an individual-user-sync process, regardless of which side initiated? For whichever side is the client in the relationship, I think I could imagine from what you showed how to inject some operations into that sequence, but I'm not sure the symmetry of these scenarios in terms of being able to specify custom stuff that happens on the server side at different parts of the sync process.

@jamalex
Copy link
Member Author

jamalex commented Jun 21, 2021

The only approach I can think of that would work around this completely would be to have IA creation happen at the time of assigning or unassigning (rather than on-demand during syncing, for that particular user), based on signal listeners or similar.

Another alternative would be to have IA creation/deletion also happen during full-facility syncs between the full devices (the laptops, in the scenarios above). This still adds a bit of overhead, but much less frequently than if we were doing it whenever a change happened. We'd then still need to also do it as part of the individual user sync in case further changes happened on that full device prior to the individual user sync. Given the extra bloat and overhead, we'd want to ensure we only did this for users likely to be individually synced (so maybe we'd set a flag on FacilityUser after that user has had an individual user sync).

@bjester
Copy link
Member

bjester commented Jun 22, 2021

but I'm not sure the symmetry of these scenarios in terms of being able to specify custom stuff that happens on the server side at different parts of the sync process.

@jamalex I think this can be handled through the default middleware configuration which will be configurable through settings. The default middleware is used by both client and server (by default), so the only requirement would be that it is able to detect from the context object that it should actually do something. So for instance, a custom middleware operation could be prepended to the cleanup stage. I would expect this operation to always return False allowing the default functionality of cleanup to be executed, but since it would be first in the list, it would get called first and therefore can handle anything related to side loading

@indirectlylit
Copy link
Contributor

indirectlylit commented Jun 24, 2021

Overall this seems fine to me, and I prefer it to the API-based side channel. (If we ever revisit that, recommend considering /api/public/assignments/<user_id> or possibly /api/public/users/<user_id>/assignments/ because the endpoint is not actually sync-specific.)

An edge-case limitation of the Morango IndividualAssignment strategy described above:

  • Full device A syncs a new assignment to Limited device
  • Limited device syncs with Full device B (which has not recently synced with Full device A)
  • Full device B does not get a copy of the new assignment

This does not seem particularly problematic because typically (in S2S at least) devices A and B will be used by different coaches for different classes, so they likely won't expect or need to see assignments from each other.

I do have a more general concern about the long-term added complexity of a new serialization and deserialization layer, but I understand that it may be unavoidable. One alternative variation we discussed might alleviate this:

  • on Kolibri upgrade, we migrate all assignments to be IndividualAssignments
  • IndividualAssignments use "soft" references rather than foreign keys to handle Morango partition boundary issues

I think this was deemed infeasible for backward-compatibility reasons, and we briefly discussed what it would take to address that but didn't come up with specific proposals.

Thanks for the thorough analysis all!

@jamalex
Copy link
Member Author

jamalex commented Jun 24, 2021

This does not seem particularly problematic because typically (in S2S at least) devices A and B will be used by different coaches for different classes, so they likely won't expect or need to see assignments from each other.

Yes, exactly -- @rtibbles and I chatted about this, and agree for the S2S use case it's not a concern. And we can easily add the "create IA's on full-facility-sync for users that have done single-user-syncs previously" in future versions if it starts to look like a practical issue.

migrate all assignments to be IndividualAssignments

The main thing that makes this messy is that many ways that a user can come to be assigned something:

  • A new assignment is created to a collection they're in
  • They're added to a collection that already has stuff assigned to it
  • A collection they're in is moved into a collection to which something is assigned (in practice we don't do this)
  • Assignments are activated/deactivated for a collection

So one thing @rtibbles and I talked about was that if we did essentially go with the "IA's should always represent current assignment state", we still need the current type of collection-level assignment structures so that we can re-compute the IA's when stuff changes (on local activity as well as stuff syncing in). And that's compatible with the direction this is heading in -- but for now we're only generating the IA's at the time a single-user sync happens.

@rtibbles
Copy link
Member

Fixed in #8219

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P0 - critical Priority: Release blocker or regression
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants