Fix up diverging structure in bookmark trees #19

linabutler · 2019-01-09T14:54:41Z

In a well-formed tree:

Each item exists in exactly one folder. Two different folder's
children should never reference the same item.
Each folder contains existing children. A folder's children should
never reference tombstones, or items that don't exist in the tree at all.
Each item has a parentid that agrees with its parent's children. In
other words, if item B's parentid is A, then A's children should
contain B.

Because of Reasons, things are (a lot) messier in practice.

Structure inconsistencies

Sync stores structure in two places: a parentid property on each item,
which points to its parent's GUID, and a list of ordered children on the
item's parent. They're duplicated because, historically, Sync clients didn't
stage incoming records. Instead, they applied records one at a time,
directly to the live local tree. This meant that, if a client saw a child
before its parent, it would first use the parentid to decide where to keep
the child, then fix up parents and positions using the parent's children.

This is also why moving an item into a different folder uploads records for
the item, old folder, and new folder. The item has a new parentid, and the
folders have new children. Similarly, deleting an item uploads a tombstone
for the item, and a record for the item's old parent.

Unfortunately, bugs (bug 1258127) and missing features (bug 1253051) in
older clients sometimes caused them to upload invalid or incomplete changes.
For example, a client might have:

Uploaded a moved child, but not its parents. This means the child now
appears in multiple parents. In the most extreme case, an item might be
referenced in two different sets of children, and have a third,
completely unrelated parentid.
Deleted a child, and tracked the deletion, but didn't flag the parent for
reupload. The parent folder now has a tombstone child.
Tracked and uploaded items that shouldn't exist on the server at all,
like the left pane or reading list roots (bug 1309255).
Missed new folders created during a sync, creating holes in the tree.

Newer clients shouldn't do this, but we might still have inconsistent
records on the server that will confuse older clients. Additionally, Firefox
for iOS includes a much stricter bookmarks engine that refuses to sync if
it detects inconsistencies.

Divergences

To work around this, our tree lets the structure diverge. This allows:

Items with multiple parents.
Items with missing parentids.
Folders with children whose parentids don't match the folder.
Items whose parentids don't mention the item in their children.
Items with parentids that point to nonexistent or deleted folders.
Folders with nonexistent children.
Non-syncable items, like custom roots.
Any combination of these.

Each item in the tree has an EntryParentFrom tag that indicates where
its structure comes from. Structure from children is validated and
resolved at insert time, so trying to add an item to a parent that
doesn't exist or isn't a folder returns a MissingParent or
InvalidParent error.

Structure from parentid, if it diverges from children, is resolved at
merge time, when the merger walks the complete tree. You can think of this
distinction as similar to early vs. late binding. The parentid, if
different from the parent's children, might not exist in the tree at
insert time, either because the parent hasn't been added yet, or because
it doesn't exist in the tree at all.

Resolving divergences

Walking the tree, using Tree::node_for_guid, Node::parent, and
Node::children, resolves divergences using these rules:

Items that appear in multiple children, and items with mismatched
parentids, use the chronologically newer parent, based on the parent's
last modified time. We always prefer structure from children over
parentid, because children also gives us the item's position.
Items that aren't mentioned in any parent's children, but have a
parentid that references an existing folder in the tree, are reparented
to the end of that folder, after the folder's children.
Items that reference a nonexistent or non-folder parentid, or don't
have a parentid at all, are reparented to the default folder, after any
children and items from rule 2.
If the default folder isn't set, or doesn't exist, items from rule 3 are
reparented to the root instead.

The result is a well-formed tree structure that can be merged. The merger
detects if the structure diverged, and flags affected items for reupload.

Closes #18.

codecov-io · 2019-01-18T03:07:02Z

Codecov Report

Merging #19 into master will increase coverage by 0.25%.
The diff coverage is 92.11%.

@@            Coverage Diff             @@
##           master      #19      +/-   ##
==========================================
+ Coverage   96.16%   96.41%   +0.25%     
==========================================
  Files           6        6              
  Lines        1797     3709    +1912     
==========================================
+ Hits         1728     3576    +1848     
- Misses         69      133      +64

Impacted Files	Coverage Δ
src/lib.rs	`0% <ø> (ø)`	⬆️
src/tests.rs	`99.92% <100%> (+0.1%)`	⬆️
src/error.rs	`33.33% <20%> (-9.53%)`	⬇️
src/tree.rs	`91.35% <91.9%> (+6.01%)`	⬆️
src/guid.rs	`93.54% <94.73%> (+2.24%)`	⬆️
src/merge.rs	`98.28% <95.8%> (+0.06%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2ea28a3...07fd161. Read the comment docs.

linabutler · 2019-01-18T07:13:23Z

Whew! 😅 I think this is finally ready for review, sorry it took so long! The merger changes are pretty minimal, but the tree got a complete overhaul. The diff view for tree.rs isn't likely to be helpful; it's probably easier to read through it from scratch.

This needs a lot more tests, and there are still likely bugs (and TODOs), but I'm feeling comfortable with the algorithm itself. Rust makes this kind of refactoring far less error-prone. I edited the PR description with an explanation of how this all works now, which is also in the rustdoc comment for Tree.

The TL;DR is, the tree structure can diverge now, which is a fancy way of saying parents can have missing children, and children can have multiple parents, no parents, and mismatched parents. We manage this by storing two different sets of structure: from children, where we already have the parent in the tree, and from parentid, where we might not. When the merger walks the tree, the diverging structures will be automatically resolved. The merger notes that the structure diverged, and flags the nodes for reupload.

In the best (and, hopefully, common 🤞) case, the tree won't diverge at all, and parent and child lookups will be just as simple and fast as before. In the worst case, we should only need to do all this once...unless another client (👀 Android) scrambles the server again.

I've flagged the usual suspects for review. @bbangert and @rnewman, you two might be interested as well. 📓

Originally, I'd intended for the tree to expose diverging structure to the merger, and for the merger to fix it. That was part of why I split `Item` into `Existing` and `Missing` variants. However, it turns out it's simpler to resolve divergences in the tree, and expose a well-formed structure to the merger instead. Changing `Item` to an enum means we now need a pattern match for _every_ item, when all we really want is a flag on the parent that says "this folder has diverged because it has a missing child". `Entry::divergence` already flags diverging structure for multiple parents, so let's use it to also flag parents with missing children, instead of forcing the merger to handle invalid structure for this one case.

* Add missing doc comments. * Rename reparenting methods for clarity. * Move `Tree::{children, parent}_for_entry` to `Node::{children, parent}`. * Remove optional return value from `Node::level`. * Add `Node::is_{root, default_parent_for_orphans}`. * Iterate over `Tree::entries` instead of `Tree::entry_index_by_guid`. The two should point to the same entries, but `entries` is more direct. * Don't mark trees with orphans as equal.

This reverts commit b5e38d4.

linabutler · 2019-01-20T00:01:12Z

I was noodling 🍝 on this some more...

The tricky part of fixing up divergences is handling missing children and missing parentids. For other cases, we can find the right folder for the item. Reuploading the item, its new folder, and any other parents that mention the item, should be enough to fix up the structure on other devices.

However, for missing parentids, we move the item to unfiled, and, for missing children, we reupload the parent with a new list. That risks orphaning those items if they exist on other devices.

In theory, this won't confuse older clients, even if they have the missing items locally. Since they parent based on the parentid, and only use children for ordering (I think Android does this, too, but I'm not sure), they'll move the items to match the server's structure, and keep the missing item around. We can also try something like always preferring the local state if the remote structure diverged, since we know it's consistent, even if it's out of date.

This is the problem that Firefox for iOS faces: it doesn't know if the missing records won't be uploaded at all, and so fixing the server is the right thing to do...or just haven't been uploaded yet, in which case it should definitely not fix the server. When the iOS implementation was built, the server didn't support batch uploads, and Desktop and Android didn't track or merge items properly, so opting to wait for the missing records was a perfectly prudent thing to do.

Four years later, we've fixed many of those consistency issues. We also know that not syncing bookmarks at all is worse than syncing and getting them wrong. And we know that Android can get confused and scramble bookmarks on other devices. I think that all leaves us in a better position to fix the server, instead of refusing to sync or applying questionable changes and letting the local tree diverge.

The other concern with fixing up the server is getting into Sync battles with other clients, where each thinks its view is right, and reuploads the same records over and over again. As @thomcc suggested, we can track this in the mirror (and we know that, if merged_node.merge_state is LocalWithNewStructure(node) or RemoteWithNewStructure(node), but node.needs_merge == false, the node's structure was changed by the merger, not by the user), and emit telemetry.

Anything we do is going to be Hella Hacky and Probably Wrong™, so let's do the thing that has a chance of being right.

thomcc

This is very impressive. These are my first comments on the implementation, mostly written to help familiarize myself with the pieces and the tree.rs changes. I'll try to get another pass in tomorrow for the overall algorithm, but it might take some discussion.

thomcc · 2019-01-24T00:57:49Z

src/tree.rs

+enum Divergence {
+    /// The node's structure is already correct, and doesn't need to be
+    /// reuploaded.
+    Ok,


I don't love reusing Ok (which usually refers to a Result) here. Eh, its probably fine though...

Consistent? 😄

thomcc · 2019-01-24T00:59:40Z

src/tree.rs

+impl ParentGuidFrom {
+    /// Notes that the parent `guid` comes from an item's parent's `children`.
+    pub fn children(self, guid: &Guid) -> ParentGuidFrom {
+        ParentGuidFrom(Some(guid.to_owned()), self.1)


(In this whole file)

I find the to_owned use here confusing. Isn't this just a clone? Using to_owned implies a conversion is happening (it's usually used for "the conversion we want to do is too fancy for clone"), but AFAICT this is just Clone... That said, maybe I'm missing something?

If it is just a clone(), I think that would be a lot clearer.

thomcc · 2019-01-24T01:20:17Z

src/tree.rs

+        }
+    }
+
+    fn indices(&self) -> Vec<Index> {


You can write this without allocating (e.g. returning an Iterator) as follows

fn indices<'a>(&'a self) -> impl Iterator<Item = Index> + 'a { let entry_parents = match self { EntryParents::Root => &[], EntryParents::One(parent_from) => std::slice::from_ref(parent_from), EntryParents::Many(parents_from) => parents_from, }; entry_parents.iter().filter_map(move |parent_from| match parent_from { EntryParentFrom::Children(index) => Some(*index), EntryParentFrom::Item(_) => None, }) }

Not sure if it's worth it though (you need to do a slight change to part of insert to avoid issues with borrowck), but it seems like a shame to allocate for the common case of one parent.

guids below can be done using the same technique too.

Oh, good idea, thanks!

thomcc · 2019-01-24T01:22:41Z

src/tree.rs

+    }
+
+    fn is(&self, other: &Entry) -> bool {
+        self as *const _ == other as *const _


nit: std::ptr::eq(self, other) should work here. Also, probably worth an #[inline]

thomcc · 2019-01-24T01:43:33Z

src/tree.rs

+    }
+
+    pub fn diverged(&self) -> bool {
+        match &self.2 {


nit: self.2 == Divergence::Diverged seems simpler to me but I don't really care.

thomcc · 2019-01-24T01:51:07Z

src/tree.rs


 impl<'t> Node<'t> {
+    /// Returns an iterator for all resolved children of this node, including
+    /// reparented orphans.
    pub fn children<'n>(&'n self) -> impl Iterator<Item = Node<'t>> + 'n {


I think you can avoid the allocations in this function, but I guess they should only happen in the case where there are orphans, which should be rare enough that it doesnt matter.

Since I move self into the closure, and handle orphans and child_indices slightly differently...I guess I could zip the orphans and child_indices with an iter::repeat that passes along a tag (Option<Divergence>?).

Or, since everything in orphan_indices_by_parent_guid should already be Diverged, I can probably just chain orphans before filter_map.

thomcc · 2019-01-24T01:55:30Z

src/merge.rs

 pub struct Merger<'t> {
+    driver: &'t Driver,


Is there a reason to use a trait object here and not something like struct Merger<'t, D: Driver> or similar?

If so, nit: &'t dyn Driver to make it more obvious it's a trait object.

I tried using the generic form with a default parameter (D = DefaultDriver), but ran into rust-lang/rust#50822, and had to replace all uses of let merger = Merger::new with let merger = <Merger>::new for type inference to work. I didn't try it without the default type parameter, though.

Default params should still work. Possibly worth noting that you use them frequently, HashMap<K, V> is actually HashMap<K, V, S = RandomState>. You'll probably have to add new (e.g. the constructor that doesn't take a Driver arg) to impl<'a> Merger<'a, DefaultDriver> instead of impl<'a, D: Driver> Merger<'a, D>, to avoid the issue you described though.

thomcc · 2019-01-24T01:59:03Z

src/merge.rs

@@ -64,6 +64,21 @@ enum ConflictResolution {
    Unchanged,
 }

+/// Controls merging behavior.
+pub trait Driver {


This is a bit weird, I'd almost think it should be something like enum InvalidGuidHandling { Forbid, Allow, Replace }... (And even then, I'm skeptical on Allow...)

That makes sense; MergeDriver sure is a roundabout way to implement GUID regeneration. I did it this way to avoid pulling in dependencies on base64 and rand, and leaving it up to the caller. (For example, on Desktop, we'd use nsINavHistoryService::MakeGuid). I was thinking we could also extend MergeDriver to provide sinks for logging and telemetry events.

But maybe this is all premature abstraction, and we should just take the dependencies.

Eh, thats a decent point. It's probably not worth the dep. (I mean, really do we actually see these in the wild anymore? I don't think so, but could be wrong)

mhammond

This looks great - now I just need to understand it better ;)

mhammond · 2019-01-23T23:11:15Z

src/error.rs

@@ -83,6 +84,9 @@ impl fmt::Display for ErrorKind {
            },
            ErrorKind::UnmergedRemoteItems => {
                write!(f, "Merged tree doesn't mention all items from remote tree")
+            },
+            ErrorKind::GenerateGuid => {


I'm a little confused by this because I can't really see how it will be used in real code, but I note that the places lib will panic if it can't generate a GUID, so it doesn't have any error similar to this. OTOH though, at least in the default trait below, this error is actually used as "I decline to create a new GUID" - so I'm not sure if this is intended for "can't" or "won't".

I guess that I'm suggesting we add a comment below...

mhammond · 2019-01-23T23:51:44Z

src/merge.rs

+/// Controls merging behavior.
+pub trait Driver {
+    /// Generates a new GUID for the given invalid GUID.
+    fn generate_new_guid(&self, invalid_guid: &Guid) -> Result<Guid>;


... here :) Expanding the comment for this function might make sense, to indicate in what conditions the error is returned.

mhammond · 2019-01-24T22:02:00Z

src/merge.rs

-                let merged_child_node = self.two_way_merge(local_child_node, remote_child_node)?;
+                let mut merged_child_node = self.two_way_merge(local_child_node,
+                                                               remote_child_node)?;
+                if remote_child_node.diverged() || merged_node.guid != remote_parent_node.guid {


.diverged() || guids_different seems a very common pattern here and used more often then a .diverged() alone is. It seems that it might be easy to get this wrong - I wonder if there's scope to capture this using the diverged mechanism somehow?

(This is more a rhetorical question than a suggestion as I'm still getting my head around this code)

mhammond · 2019-01-24T22:03:58Z

src/tests.rs

@@ -1808,6 +1811,14 @@ fn mismatched_incompatible_bookmark_kinds() {
 fn invalid_guids() {
    before_each();

+    struct AllowInvalidGuids;


Even though it's "just" test code I think a comment here might be useful to help people get their head around the intent behind generate_new_guid()

pjenvey · 2019-01-25T20:32:02Z

src/tree.rs

+    }
+
+    fn is(&self, other: &Entry) -> bool {
+        self as *const _ == other as *const _


It's not clear to me why this is a pointer comparison vs the fields deriving PartialEq (can you have dupe guids but don't want them to match?), can you clarify w/ a comment?

pjenvey · 2019-01-25T20:38:05Z

src/tree.rs

+                // check if we have any remaining orphans that reference
+                // nonexistent or non-folder parents (rule 3).
+                let needs_reparenting = |guid| {
+                    match self.entry_index_by_guid.get(guid) {


nit: map_or's maybe an improvement

Suggested change

match self.entry_index_by_guid.get(guid) {

self.entry_index_by_guid

.get(guid)

.map_or(true, |&index| !self.entries[index].item.is_folder())

* Clarify why `merge::Driver` exists. * Make `Merger` generic over `Driver`, instead of using a trait object. * Use `clone` instead of `to_owned`. * Replace `Entry#is` with `std::ptr::eq`. * Add `MergedNode::remote_guid_changed`.

@thomcc

In most cases, `node.2 == node.entry().divergence`, except for orphans, default orphans, and diverging `parentid`s. This is a surprising inconsistency that means `node_for_guid` and `children` need to do more work to figure out if the node has actually diverged. This commit: * Changes the tree to flag divergent `parentid`s at `insert` time. * Cleans up `Tree::structure_for_insert`, to clarify what happens when a `parentid` is or isn't provided. * Moves the logic for checking default folder divergences into `Node::diverged`. * Replaces `EntryParents::{indices, guids}` with `EntryParents::iter()`, which doesn't allocate (thanks, @thomcc!).

* Clean up optionals with `.map_or(...)` and `.filter(...).map(...)`. * Explain why we use `ptr::eq` to compare entries.

* Shorten import paths. * Rename `Child::Existing` to `Child::Exists`. * Rename `Divergence::Ok` to `Divergence::Consistent`, since `Ok` might be confused with `Result`.

linabutler · 2019-01-26T01:31:35Z

src/tree.rs

+        };
+
+        self.entry().child_indices.iter()
+            .chain(orphans.into_iter())


@thomcc I thought about it some more, and I'm unsure how to avoid the allocation here. self.tree().orphan_indices_by_parent_guid borrow self (and default_orphans also borrows self in the closure passed to filter), so the returned iterator outlives the temporary...and we also move self into filter_map's closure.

But, as you said, orphans should be rare...and we already do much more work for diverged nodes, anyway. An extra heap allocation is likely the least of that.

We need this to apply the merged tree, when we join to the local and remote trees.

* Fix infinite recursion in `fmt::Display::fmt()` for `Error`. * Make `Store` generic over the error type. This allows callers to provide their own error types that unify with Dogear errors, `nsresult`s, and others, instead of requiring them to wrap their errors into `ErrorKind::Storage(...)`. * Forward decoding errors from `Guid::from{utf8, uft16}()`. * Rename `ErrorKind::GenerateGuid` to `ErrorKind::InvalidGuid`. * Move `dogear::merge` into `Store::merge`.

This started out as a fix for structure corruption corner cases, grew into simplifying tree construction for callers, and turned into a full-blown rewrite. In a way, we've come full circle: the new tree stores a fully consistent structure, and relies on the new builder to resolve inconsistencies and flag diverging structure. PR #19 added lots of complexity to the original tree. This came from storing two different sets of structure, one that's resolved at `insert` time, and the other when we actually walk the tree. This separation exists because we `insert` items one at a time, in the order based on the `children`, so the tree doesn't have a complete picture of all items in it. However, we _do_ have that picture, in the mirror database. By the time we build the tree, we know exactly which `children` and `parentid`s exist, if an item has multiple parents, or no parents. Those entries might not be in the tree yet, but that's because our implementation requires the tree to always be valid. This requirement also forces callers to go through unnecessary contortions. On Desktop, `Store::fetch_remote_tree` first buffers all items into a pseudo-tree, then walks it recursively to inflate the Dogear tree. That's a lot of busy-work to query and normalize the data, and assumes we have a complete structure, which might not be the case. Instead, what if we built the tree in two passes? One to add all items, and one to set their parent-child relationships. The queries for these are simpler, more correct, and let us defer resolving inconsistencies until we're ready to build the tree. We can add optimizations for valid trees, and still handle every kind of diverging structure. Closes #22.

linabutler force-pushed the fixup-structure branch from 118fbed to ad363e4 Compare January 18, 2019 03:02

linabutler force-pushed the fixup-structure branch from ad363e4 to a77e661 Compare January 18, 2019 06:12

linabutler added 3 commits January 17, 2019 22:59

Add an Item::Missing representation for missing items.

b5e38d4

Rewrite the tree to support diverging structure.

f844722

Reupload diverged nodes.

eeaee8f

linabutler force-pushed the fixup-structure branch from a77e661 to eeaee8f Compare January 18, 2019 07:03

linabutler changed the title ~~[WIP] Fix up structure inconsistencies~~ Fix up diverging structure in bookmark trees Jan 18, 2019

linabutler requested review from thomcc and mhammond January 18, 2019 07:07

linabutler added 6 commits January 18, 2019 19:33

Change invalid item GUIDs when merging.

615707f

Implement a simpler, recursive PartialEq for trees and nodes.

ed65291

Remove Item::Missing.

d23fd74

This reverts commit b5e38d4.

Remove unused deps.

67b6cb7

thomcc reviewed Jan 24, 2019

View reviewed changes

mhammond approved these changes Jan 24, 2019

View reviewed changes

pjenvey reviewed Jan 25, 2019

View reviewed changes

linabutler added 4 commits January 25, 2019 16:35

Reviews from @thomcc and @mhammond.

a1ba30b

* Clarify why `merge::Driver` exists. * Make `Merger` generic over `Driver`, instead of using a trait object. * Use `clone` instead of `to_owned`. * Replace `Entry#is` with `std::ptr::eq`. * Add `MergedNode::remote_guid_changed`.

@pjenvey's review.

25b31ef

* Clean up optionals with `.map_or(...)` and `.filter(...).map(...)`. * Explain why we use `ptr::eq` to compare entries.

Naming is the hardest problem in computer science.

3842b40

* Shorten import paths. * Rename `Child::Existing` to `Child::Exists`. * Rename `Divergence::Ok` to `Divergence::Consistent`, since `Ok` might be confused with `Result`.

linabutler commented Jan 26, 2019

View reviewed changes

linabutler added 2 commits January 29, 2019 14:25

Include nodes from both sides for each MergeState.

ceb45d6

We need this to apply the merged tree, when we join to the local and remote trees.

linabutler merged commit 07fd161 into master Jan 30, 2019

linabutler deleted the fixup-structure branch January 30, 2019 04:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix up diverging structure in bookmark trees #19

Fix up diverging structure in bookmark trees #19

linabutler commented Jan 9, 2019 •

edited

Loading

codecov-io commented Jan 18, 2019 •

edited

Loading

linabutler commented Jan 18, 2019

linabutler commented Jan 20, 2019

thomcc left a comment

thomcc Jan 24, 2019

linabutler Jan 24, 2019

thomcc Jan 24, 2019

thomcc Jan 24, 2019

linabutler Jan 24, 2019

thomcc Jan 24, 2019

thomcc Jan 24, 2019

thomcc Jan 24, 2019

linabutler Jan 24, 2019

thomcc Jan 24, 2019

linabutler Jan 24, 2019

thomcc Jan 24, 2019

thomcc Jan 24, 2019

linabutler Jan 24, 2019

thomcc Jan 24, 2019

mhammond left a comment

mhammond Jan 23, 2019

mhammond Jan 23, 2019

mhammond Jan 24, 2019

mhammond Jan 24, 2019

pjenvey Jan 25, 2019

pjenvey Jan 25, 2019

linabutler Jan 26, 2019

Fix up diverging structure in bookmark trees #19

Fix up diverging structure in bookmark trees #19

Conversation

linabutler commented Jan 9, 2019 • edited Loading

Structure inconsistencies

Divergences

Resolving divergences

codecov-io commented Jan 18, 2019 • edited Loading

Codecov Report

linabutler commented Jan 18, 2019

linabutler commented Jan 20, 2019

thomcc left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mhammond left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

linabutler commented Jan 9, 2019 •

edited

Loading

codecov-io commented Jan 18, 2019 •

edited

Loading