Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle more kinds of structure corruption #22

Closed
linabutler opened this issue Feb 2, 2019 · 3 comments · Fixed by #23
Closed

Handle more kinds of structure corruption #22

linabutler opened this issue Feb 2, 2019 · 3 comments · Fixed by #23
Assignees

Comments

@linabutler
Copy link
Member

  • Misparented roots. We've seen this in the wild, where the menu or toolbar ends up in unfiled. We can either have our tree mark the root as diverged, and always reparent to the Places root, or handle them in the merger.
  • Deleted roots. Unlike other deleted folders, we want to resurrect instead of taking the deletion and relocating orphans. If the root is deleted on both sides, that's odd, but technically okay. I think this is more of a theoretical concern; we haven't (IIRC) seen it come up.
  • Multiple parents with the same timestamp. We rely on the age and parent type to break the tie, but, if we have multiple parents with the same timestamp, could we still end up with children in multiple parents?
  • Cycles. The merger's cycle detection should stop us from walking subtrees multiple times, but we want to make sure we don't exclude them. We also want to make sure that trying to log a tree with cycles doesn't infinitely recurse.
@thomcc
Copy link
Contributor

thomcc commented Feb 2, 2019

I think cycles are only a theoretical concern. Last time I looked we had never even seen telemetry reporting them.

@linabutler linabutler self-assigned this Feb 4, 2019
linabutler added a commit that referenced this issue Feb 10, 2019
This started out as a fix for structure corruption corner cases, grew
into simplifying tree construction for callers, and turned into a
full-blown rewrite. In a way, we've come full circle: the new tree
stores a fully consistent structure, and relies on the new builder
to resolve inconsistencies and flag diverging structure.

PR #19 added lots of complexity to the original tree. This came from
storing two different sets of structure, one that's resolved at
`insert` time, and the other when we actually walk the tree. This
separation exists because we `insert` items one at a time, in the
order based on the `children`, so the tree doesn't have a complete
picture of all items in it.

However, we _do_ have that picture, in the mirror database. By the
time we build the tree, we know exactly which `children` and
`parentid`s exist, if an item has multiple parents, or no parents.
Those entries might not be in the tree yet, but that's because our
implementation requires the tree to always be valid.

This requirement also forces callers to go through unnecessary
contortions. On Desktop, `Store::fetch_remote_tree` first buffers all
items into a pseudo-tree, then walks it recursively to inflate the
Dogear tree. That's a lot of busy-work to query and normalize the
data, and assumes we have a complete structure, which might not be
the case.

Instead, what if we built the tree in two passes? One to add all
items, and one to set their parent-child relationships. The queries
for these are simpler, more correct, and let us defer resolving
inconsistencies until we're ready to build the tree. We can add
optimizations for valid trees, and still handle every kind of
diverging structure.

Closes #22.
linabutler added a commit that referenced this issue Feb 10, 2019
linabutler added a commit that referenced this issue Feb 13, 2019
This started out as a fix for corruption corner cases, grew into
simplifying how we build the remote tree, and turned into a rewrite
(again >.>). Instead of managing two different sets of structure at
insert and merge time, we now store items and parent-child
relationships in a tree builder, then build a consistent tree that
flags divergent items.

The old tree was complicated because it had to maintain a valid
structure before and after every insert, but with references to
potentially unknown parents and children. Since we couldn't insert
an item without knowing its parent, we had to pick a canonical
structure (parents by `children`), insert children in level
order, and note divergent structure (by `parentid`) separately.

This meant Desktop's `Store::fetch_remote_tree()` had to left-join
`mirror.items` to `mirror.structure`, store the items in a
pseudo-tree, then recursively walk children to inflate the complete
tree. This was complicated enough with a valid tree, let alone
orphans and multiple parents.

With the new approach, we build the tree in three steps:

1. First, we add all items, without structure.
2. Next, we set parent-child relationships. Parents by `children`
   require the parent to exist, but not the child; this handles the
   case where a folder mentions nonexistent or deleted GUIDs in
   its `children`. Parents by `parentid` require the item to exist,
   but not its parent; this handles orphans that reference missing
   or non-folder parents.
3. Finally, once we've added all entries to the tree, we have a
   complete view of the structure, so we can resolve missing, multiple,
   and conflicting parents.

For cases where we know the structure is valid and in level order,
like Desktop's `Store::fetch_local_tree()`, we handle steps 1 and 2
at the same time: `builder.item(item)?.by_structure(&parent_guid)?`.

For the remote tree, we insert all items and their `parentid`s
first (`builder.item(item)?.by_parent_guid(&parent_guid)?`, where
`parent_guid` might not be in the tree), then add structure from
children later:
`builder.parent_for(&child_guid)?.by_children(&parent_guid)?`,
where `child_guid` might not be in the tree.

Closes #22.
linabutler added a commit that referenced this issue Feb 13, 2019
@linabutler
Copy link
Member Author

Last time I looked we had never even seen telemetry reporting them.

Sold. 👩‍⚖️ I added a check (and got to learn about the tortoise and hare cycle detection algorithm!) that fails the merge if we somehow produce a tree with cycles, instead of trying to make sense of them.

@thomcc
Copy link
Contributor

thomcc commented Feb 13, 2019

and got to learn about the tortoise and hare cycle detection algorithm

Heh, the only experience I have with that algorithm is that I've heard of it as a famously cruel job interview question (how do you detect a cycle in a linked list with O(1) space). Of course, this is not the fault of the algorithm at all!

The way I've usually done this is by doing a DFS but keeping a set containing which items I've seen. If we ever hit an item we've seen, we have a cycle. Although, the way I did it in the bookmark validator was actually a bit more complicated than this, since we wanted be able to produce the set of cycles that we've seen (for e.g. about:sync), and not just that we saw a cycle. I think this makes it the same as finding the strongly connected components, but am not certain.

Anyway, that's neither here nor there, that code seems completely fine (even if thinking about traversing the bookmark tree from the children to the parents in that way threatens to melt my brain :p)

linabutler added a commit that referenced this issue Feb 17, 2019
This started out as a fix for corruption corner cases, grew into
simplifying how we build the remote tree, and turned into a rewrite
(again >.>). Instead of managing two different sets of structure at
insert and merge time, we now store items and parent-child
relationships in a tree builder, then build a consistent tree that
flags divergent items.

The old tree was complicated because it had to maintain a valid
structure before and after every insert, but with references to
potentially unknown parents and children. Since we couldn't insert
an item without knowing its parent, we had to pick a canonical
structure (parents by `children`), insert children in level
order, and note divergent structure (by `parentid`) separately.

This meant Desktop's `Store::fetch_remote_tree()` had to left-join
`mirror.items` to `mirror.structure`, store the items in a
pseudo-tree, then recursively walk children to inflate the complete
tree. This was complicated enough with a valid tree, let alone
orphans and multiple parents.

With the new approach, we build the tree in three steps:

1. First, we add all items, without structure.
2. Next, we set parent-child relationships. Parents by `children`
   require the parent to exist, but not the child; this handles the
   case where a folder mentions nonexistent or deleted GUIDs in
   its `children`. Parents by `parentid` require the item to exist,
   but not its parent; this handles orphans that reference missing
   or non-folder parents.
3. Finally, once we've added all entries to the tree, we have a
   complete view of the structure, so we can resolve missing, multiple,
   and conflicting parents.

For cases where we know the structure is valid and in level order,
like Desktop's `Store::fetch_local_tree()`, we handle steps 1 and 2
at the same time: `builder.item(item)?.by_structure(&parent_guid)?`.

For the remote tree, we insert all items and their `parentid`s
first (`builder.item(item)?.by_parent_guid(&parent_guid)?`, where
`parent_guid` might not be in the tree), then add structure from
children later:
`builder.parent_for(&child_guid)?.by_children(&parent_guid)?`,
where `child_guid` might not be in the tree.

Closes #22.
linabutler added a commit that referenced this issue Feb 17, 2019
linabutler added a commit that referenced this issue Feb 23, 2019
This started out as a fix for corruption corner cases, grew into
simplifying how we build the remote tree, and turned into a rewrite
(again >.>). Instead of managing two different sets of structure at
insert and merge time, we now store items and parent-child
relationships in a tree builder, then build a consistent tree that
flags divergent items.

The old tree was complicated because it had to maintain a valid
structure before and after every insert, but with references to
potentially unknown parents and children. Since we couldn't insert
an item without knowing its parent, we had to pick a canonical
structure (parents by `children`), insert children in level
order, and note divergent structure (by `parentid`) separately.

This meant Desktop's `Store::fetch_remote_tree()` had to left-join
`mirror.items` to `mirror.structure`, store the items in a
pseudo-tree, then recursively walk children to inflate the complete
tree. This was complicated enough with a valid tree, let alone
orphans and multiple parents.

With the new approach, we build the tree in three steps:

1. First, we add all items, without structure.
2. Next, we set parent-child relationships. Parents by `children`
   require the parent to exist, but not the child; this handles the
   case where a folder mentions nonexistent or deleted GUIDs in
   its `children`. Parents by `parentid` require the item to exist,
   but not its parent; this handles orphans that reference missing
   or non-folder parents.
3. Finally, once we've added all entries to the tree, we have a
   complete view of the structure, so we can resolve missing, multiple,
   and conflicting parents.

For cases where we know the structure is valid and in level order,
like Desktop's `Store::fetch_local_tree()`, we handle steps 1 and 2
at the same time: `builder.item(item)?.by_structure(&parent_guid)?`.

For the remote tree, we insert all items and their `parentid`s
first (`builder.item(item)?.by_parent_guid(&parent_guid)?`, where
`parent_guid` might not be in the tree), then add structure from
children later:
`builder.parent_for(&child_guid)?.by_children(&parent_guid)?`,
where `child_guid` might not be in the tree.

Closes #22.
linabutler added a commit that referenced this issue Feb 23, 2019
linabutler added a commit that referenced this issue Feb 23, 2019
This started out as a fix for corruption corner cases, grew into
simplifying how we build the remote tree, and turned into a rewrite
(again >.>). Instead of managing two different sets of structure at
insert and merge time, we now store items and parent-child
relationships in a tree builder, then build a consistent tree that
flags divergent items.

The old tree was complicated because it had to maintain a valid
structure before and after every insert, but with references to
potentially unknown parents and children. Since we couldn't insert
an item without knowing its parent, we had to pick a canonical
structure (parents by `children`), insert children in level
order, and note divergent structure (by `parentid`) separately.

This meant Desktop's `Store::fetch_remote_tree()` had to left-join
`mirror.items` to `mirror.structure`, store the items in a
pseudo-tree, then recursively walk children to inflate the complete
tree. This was complicated enough with a valid tree, let alone
orphans and multiple parents.

With the new approach, we build the tree in three steps:

1. First, we add all items, without structure.
2. Next, we set parent-child relationships. Parents by `children`
   require the parent to exist, but not the child; this handles the
   case where a folder mentions nonexistent or deleted GUIDs in
   its `children`. Parents by `parentid` require the item to exist,
   but not its parent; this handles orphans that reference missing
   or non-folder parents.
3. Finally, once we've added all entries to the tree, we have a
   complete view of the structure, so we can resolve missing, multiple,
   and conflicting parents.

For cases where we know the structure is valid and in level order,
like Desktop's `Store::fetch_local_tree()`, we handle steps 1 and 2
at the same time: `builder.item(item)?.by_structure(&parent_guid)?`.

For the remote tree, we insert all items and their `parentid`s
first (`builder.item(item)?.by_parent_guid(&parent_guid)?`, where
`parent_guid` might not be in the tree), then add structure from
children later:
`builder.parent_for(&child_guid)?.by_children(&parent_guid)?`,
where `child_guid` might not be in the tree.

Closes #22.
linabutler added a commit that referenced this issue Feb 23, 2019
linabutler added a commit that referenced this issue Feb 27, 2019
This started out as a fix for corruption corner cases, grew into
simplifying how we build the remote tree, and turned into a rewrite
(again >.>). Instead of managing two different sets of structure at
insert and merge time, we now store items and parent-child
relationships in a tree builder, then build a consistent tree that
flags divergent items.

The old tree was complicated because it had to maintain a valid
structure before and after every insert, but with references to
potentially unknown parents and children. Since we couldn't insert
an item without knowing its parent, we had to pick a canonical
structure (parents by `children`), insert children in level
order, and note divergent structure (by `parentid`) separately.

This meant Desktop's `Store::fetch_remote_tree()` had to left-join
`mirror.items` to `mirror.structure`, store the items in a
pseudo-tree, then recursively walk children to inflate the complete
tree. This was complicated enough with a valid tree, let alone
orphans and multiple parents.

With the new approach, we build the tree in three steps:

1. First, we add all items, without structure.
2. Next, we set parent-child relationships. Parents by `children`
   require the parent to exist, but not the child; this handles the
   case where a folder mentions nonexistent or deleted GUIDs in
   its `children`. Parents by `parentid` require the item to exist,
   but not its parent; this handles orphans that reference missing
   or non-folder parents.
3. Finally, once we've added all entries to the tree, we have a
   complete view of the structure, so we can resolve missing, multiple,
   and conflicting parents.

For cases where we know the structure is valid and in level order,
like Desktop's `Store::fetch_local_tree()`, we handle steps 1 and 2
at the same time: `builder.item(item)?.by_structure(&parent_guid)?`.

For the remote tree, we insert all items and their `parentid`s
first (`builder.item(item)?.by_parent_guid(&parent_guid)?`, where
`parent_guid` might not be in the tree), then add structure from
children later:
`builder.parent_for(&child_guid)?.by_children(&parent_guid)?`,
where `child_guid` might not be in the tree.

Closes #22.
linabutler added a commit that referenced this issue Feb 27, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants