-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle more kinds of structure corruption #22
Comments
I think cycles are only a theoretical concern. Last time I looked we had never even seen telemetry reporting them. |
This started out as a fix for structure corruption corner cases, grew into simplifying tree construction for callers, and turned into a full-blown rewrite. In a way, we've come full circle: the new tree stores a fully consistent structure, and relies on the new builder to resolve inconsistencies and flag diverging structure. PR #19 added lots of complexity to the original tree. This came from storing two different sets of structure, one that's resolved at `insert` time, and the other when we actually walk the tree. This separation exists because we `insert` items one at a time, in the order based on the `children`, so the tree doesn't have a complete picture of all items in it. However, we _do_ have that picture, in the mirror database. By the time we build the tree, we know exactly which `children` and `parentid`s exist, if an item has multiple parents, or no parents. Those entries might not be in the tree yet, but that's because our implementation requires the tree to always be valid. This requirement also forces callers to go through unnecessary contortions. On Desktop, `Store::fetch_remote_tree` first buffers all items into a pseudo-tree, then walks it recursively to inflate the Dogear tree. That's a lot of busy-work to query and normalize the data, and assumes we have a complete structure, which might not be the case. Instead, what if we built the tree in two passes? One to add all items, and one to set their parent-child relationships. The queries for these are simpler, more correct, and let us defer resolving inconsistencies until we're ready to build the tree. We can add optimizations for valid trees, and still handle every kind of diverging structure. Closes #22.
This started out as a fix for corruption corner cases, grew into simplifying how we build the remote tree, and turned into a rewrite (again >.>). Instead of managing two different sets of structure at insert and merge time, we now store items and parent-child relationships in a tree builder, then build a consistent tree that flags divergent items. The old tree was complicated because it had to maintain a valid structure before and after every insert, but with references to potentially unknown parents and children. Since we couldn't insert an item without knowing its parent, we had to pick a canonical structure (parents by `children`), insert children in level order, and note divergent structure (by `parentid`) separately. This meant Desktop's `Store::fetch_remote_tree()` had to left-join `mirror.items` to `mirror.structure`, store the items in a pseudo-tree, then recursively walk children to inflate the complete tree. This was complicated enough with a valid tree, let alone orphans and multiple parents. With the new approach, we build the tree in three steps: 1. First, we add all items, without structure. 2. Next, we set parent-child relationships. Parents by `children` require the parent to exist, but not the child; this handles the case where a folder mentions nonexistent or deleted GUIDs in its `children`. Parents by `parentid` require the item to exist, but not its parent; this handles orphans that reference missing or non-folder parents. 3. Finally, once we've added all entries to the tree, we have a complete view of the structure, so we can resolve missing, multiple, and conflicting parents. For cases where we know the structure is valid and in level order, like Desktop's `Store::fetch_local_tree()`, we handle steps 1 and 2 at the same time: `builder.item(item)?.by_structure(&parent_guid)?`. For the remote tree, we insert all items and their `parentid`s first (`builder.item(item)?.by_parent_guid(&parent_guid)?`, where `parent_guid` might not be in the tree), then add structure from children later: `builder.parent_for(&child_guid)?.by_children(&parent_guid)?`, where `child_guid` might not be in the tree. Closes #22.
Sold. 👩⚖️ I added a check (and got to learn about the tortoise and hare cycle detection algorithm!) that fails the merge if we somehow produce a tree with cycles, instead of trying to make sense of them. |
Heh, the only experience I have with that algorithm is that I've heard of it as a famously cruel job interview question (how do you detect a cycle in a linked list with O(1) space). Of course, this is not the fault of the algorithm at all! The way I've usually done this is by doing a DFS but keeping a set containing which items I've seen. If we ever hit an item we've seen, we have a cycle. Although, the way I did it in the bookmark validator was actually a bit more complicated than this, since we wanted be able to produce the set of cycles that we've seen (for e.g. about:sync), and not just that we saw a cycle. I think this makes it the same as finding the strongly connected components, but am not certain. Anyway, that's neither here nor there, that code seems completely fine (even if thinking about traversing the bookmark tree from the children to the parents in that way threatens to melt my brain :p) |
This started out as a fix for corruption corner cases, grew into simplifying how we build the remote tree, and turned into a rewrite (again >.>). Instead of managing two different sets of structure at insert and merge time, we now store items and parent-child relationships in a tree builder, then build a consistent tree that flags divergent items. The old tree was complicated because it had to maintain a valid structure before and after every insert, but with references to potentially unknown parents and children. Since we couldn't insert an item without knowing its parent, we had to pick a canonical structure (parents by `children`), insert children in level order, and note divergent structure (by `parentid`) separately. This meant Desktop's `Store::fetch_remote_tree()` had to left-join `mirror.items` to `mirror.structure`, store the items in a pseudo-tree, then recursively walk children to inflate the complete tree. This was complicated enough with a valid tree, let alone orphans and multiple parents. With the new approach, we build the tree in three steps: 1. First, we add all items, without structure. 2. Next, we set parent-child relationships. Parents by `children` require the parent to exist, but not the child; this handles the case where a folder mentions nonexistent or deleted GUIDs in its `children`. Parents by `parentid` require the item to exist, but not its parent; this handles orphans that reference missing or non-folder parents. 3. Finally, once we've added all entries to the tree, we have a complete view of the structure, so we can resolve missing, multiple, and conflicting parents. For cases where we know the structure is valid and in level order, like Desktop's `Store::fetch_local_tree()`, we handle steps 1 and 2 at the same time: `builder.item(item)?.by_structure(&parent_guid)?`. For the remote tree, we insert all items and their `parentid`s first (`builder.item(item)?.by_parent_guid(&parent_guid)?`, where `parent_guid` might not be in the tree), then add structure from children later: `builder.parent_for(&child_guid)?.by_children(&parent_guid)?`, where `child_guid` might not be in the tree. Closes #22.
This started out as a fix for corruption corner cases, grew into simplifying how we build the remote tree, and turned into a rewrite (again >.>). Instead of managing two different sets of structure at insert and merge time, we now store items and parent-child relationships in a tree builder, then build a consistent tree that flags divergent items. The old tree was complicated because it had to maintain a valid structure before and after every insert, but with references to potentially unknown parents and children. Since we couldn't insert an item without knowing its parent, we had to pick a canonical structure (parents by `children`), insert children in level order, and note divergent structure (by `parentid`) separately. This meant Desktop's `Store::fetch_remote_tree()` had to left-join `mirror.items` to `mirror.structure`, store the items in a pseudo-tree, then recursively walk children to inflate the complete tree. This was complicated enough with a valid tree, let alone orphans and multiple parents. With the new approach, we build the tree in three steps: 1. First, we add all items, without structure. 2. Next, we set parent-child relationships. Parents by `children` require the parent to exist, but not the child; this handles the case where a folder mentions nonexistent or deleted GUIDs in its `children`. Parents by `parentid` require the item to exist, but not its parent; this handles orphans that reference missing or non-folder parents. 3. Finally, once we've added all entries to the tree, we have a complete view of the structure, so we can resolve missing, multiple, and conflicting parents. For cases where we know the structure is valid and in level order, like Desktop's `Store::fetch_local_tree()`, we handle steps 1 and 2 at the same time: `builder.item(item)?.by_structure(&parent_guid)?`. For the remote tree, we insert all items and their `parentid`s first (`builder.item(item)?.by_parent_guid(&parent_guid)?`, where `parent_guid` might not be in the tree), then add structure from children later: `builder.parent_for(&child_guid)?.by_children(&parent_guid)?`, where `child_guid` might not be in the tree. Closes #22.
This started out as a fix for corruption corner cases, grew into simplifying how we build the remote tree, and turned into a rewrite (again >.>). Instead of managing two different sets of structure at insert and merge time, we now store items and parent-child relationships in a tree builder, then build a consistent tree that flags divergent items. The old tree was complicated because it had to maintain a valid structure before and after every insert, but with references to potentially unknown parents and children. Since we couldn't insert an item without knowing its parent, we had to pick a canonical structure (parents by `children`), insert children in level order, and note divergent structure (by `parentid`) separately. This meant Desktop's `Store::fetch_remote_tree()` had to left-join `mirror.items` to `mirror.structure`, store the items in a pseudo-tree, then recursively walk children to inflate the complete tree. This was complicated enough with a valid tree, let alone orphans and multiple parents. With the new approach, we build the tree in three steps: 1. First, we add all items, without structure. 2. Next, we set parent-child relationships. Parents by `children` require the parent to exist, but not the child; this handles the case where a folder mentions nonexistent or deleted GUIDs in its `children`. Parents by `parentid` require the item to exist, but not its parent; this handles orphans that reference missing or non-folder parents. 3. Finally, once we've added all entries to the tree, we have a complete view of the structure, so we can resolve missing, multiple, and conflicting parents. For cases where we know the structure is valid and in level order, like Desktop's `Store::fetch_local_tree()`, we handle steps 1 and 2 at the same time: `builder.item(item)?.by_structure(&parent_guid)?`. For the remote tree, we insert all items and their `parentid`s first (`builder.item(item)?.by_parent_guid(&parent_guid)?`, where `parent_guid` might not be in the tree), then add structure from children later: `builder.parent_for(&child_guid)?.by_children(&parent_guid)?`, where `child_guid` might not be in the tree. Closes #22.
This started out as a fix for corruption corner cases, grew into simplifying how we build the remote tree, and turned into a rewrite (again >.>). Instead of managing two different sets of structure at insert and merge time, we now store items and parent-child relationships in a tree builder, then build a consistent tree that flags divergent items. The old tree was complicated because it had to maintain a valid structure before and after every insert, but with references to potentially unknown parents and children. Since we couldn't insert an item without knowing its parent, we had to pick a canonical structure (parents by `children`), insert children in level order, and note divergent structure (by `parentid`) separately. This meant Desktop's `Store::fetch_remote_tree()` had to left-join `mirror.items` to `mirror.structure`, store the items in a pseudo-tree, then recursively walk children to inflate the complete tree. This was complicated enough with a valid tree, let alone orphans and multiple parents. With the new approach, we build the tree in three steps: 1. First, we add all items, without structure. 2. Next, we set parent-child relationships. Parents by `children` require the parent to exist, but not the child; this handles the case where a folder mentions nonexistent or deleted GUIDs in its `children`. Parents by `parentid` require the item to exist, but not its parent; this handles orphans that reference missing or non-folder parents. 3. Finally, once we've added all entries to the tree, we have a complete view of the structure, so we can resolve missing, multiple, and conflicting parents. For cases where we know the structure is valid and in level order, like Desktop's `Store::fetch_local_tree()`, we handle steps 1 and 2 at the same time: `builder.item(item)?.by_structure(&parent_guid)?`. For the remote tree, we insert all items and their `parentid`s first (`builder.item(item)?.by_parent_guid(&parent_guid)?`, where `parent_guid` might not be in the tree), then add structure from children later: `builder.parent_for(&child_guid)?.by_children(&parent_guid)?`, where `child_guid` might not be in the tree. Closes #22.
The text was updated successfully, but these errors were encountered: