Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fastx db] Update an owner-object index #149

Merged
merged 5 commits into from
Jan 11, 2022
Merged

[fastx db] Update an owner-object index #149

merged 5 commits into from
Jan 11, 2022

Conversation

gdanezis
Copy link
Collaborator

@gdanezis gdanezis commented Jan 10, 2022

This addresses issue #127

  • We add and owner_index to the database.
  • We update the index when we insert an object, or the effects of a transaction.
  • To make this a little more efficient we record in the transaction processing handler the old owners of deleted or mutated objects, to know which old index entries to delete.
  • Now use multi-get where possible, and minimise reads by generating the OrderInfoResponse objects.

@gdanezis gdanezis changed the title Update an owner-object index [fastx db] Update an owner-object index Jan 10, 2022
@gdanezis gdanezis linked an issue Jan 10, 2022 that may be closed by this pull request
@gdanezis gdanezis self-assigned this Jan 10, 2022
@gdanezis gdanezis requested a review from patrickkuo January 10, 2022 19:48
Copy link
Contributor

@huitseeker huitseeker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM, but I've left a few Qs.

Comment on lines 85 to 87
// TODO: We only need to read the read_only and owner field of the object,
// it's a bit wasteful to copy the entire object.
let _objects = self.get_objects(&ids[..]).await?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it faster to .get_objects or could you get_object by reference and only copy what you need?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think there is an option of getting an object by reference from the DB right now. As soon as we do get, the object is read from the db value (bytes are pinned) and then deserialized and allocated as a whole. I went down the rabbit hole of looking at serialization formats that allow cheap in place reads (zero copy), like capnpn, flatbuf etc -- but changing the DB to use those is a bigger job.

Comment on lines 97 to 100
// let object = self
// .object_state(&object_id)
// .await
// .map_err(|_| FastPayError::ObjectNotFound)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the commented code still provide value?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch, that was just forgotten.

Comment on lines 179 to 189
let input_objects = order.input_objects();
let ids: Vec<_> = input_objects.iter().map(|(id, _, _)| *id).collect();
// Get a copy of the object.
// TODO: We only need to read the read_only and owner field of the object,
// it's a bit wasteful to copy the entire object.
let _objects = self.get_objects(&ids[..]).await?;

let mut inputs = vec![];
for (input_object_id, input_seq, _input_digest) in order.input_objects() {
let mut owner_index = HashMap::new();
for (object_ref, object) in input_objects.into_iter().zip(_objects) {
let (input_object_id, input_seq, _input_digest) = object_ref;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: you may want to extract a private function to cover this and the above.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My iterator-foo is not up to defining a function with a generic return signature of an iterator :)

Comment on lines 283 to 287
.filter(|(id, _, _)| {
let owner = owner_index.get(id);
owner.is_some() && *owner.unwrap() != temporary_store.objects()[id].owner
})
.map(|(id, _, _)| (owner_index[id], *id)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: consider filter_map

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice to know this exists.

&self.owner_index,
_expired_object_owners
.into_iter()
.map(|(owner, id)| (owner, id)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the map?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No reason any more, removed!

@@ -26,6 +26,20 @@ impl AuthorityTemporaryStore {
}
}

// Helpers to access private fields
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: You might like readonly.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like fewer dependencies :)

@@ -9,6 +9,7 @@ use typed_store::traits::Map;
pub struct AuthorityStore {
objects: DBMap<ObjectID, Object>,
order_lock: DBMap<ObjectRef, Option<TransactionDigest>>,
owner_index: DBMap<(FastPayAddress, ObjectID), ObjectRef>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice way to get out of the fact that it's unwieldy to update collections inside the DB ... but this is something that may require comment, either here or around get_objects:
DBMap<(FastPayAddress, ObjectID), ObjectRef> has far greater machine affinity than DBMap<FastPayAddress, SomeCollectionOf<ObjectRef>>,

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the standard way of using a persistent BTree to make a secondary index, I did not invent it :). Just to check -- I think our choice of serialization (bincode with fixint, bigendian, etc) was specifically selected to allow for these "composite" indexes. Am I correct in thinking this is true? It is not the case that this will break in a different architecture, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think our choice of serialization (bincode with fixint, bigendian, etc) was specifically selected to allow for these "composite" indexes. Am I correct in thinking this is true? It is not the case that this will break in a different architecture, right?

Yes, this is correct.

Copy link
Contributor

@patrickkuo patrickkuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some commented out code and _ prefix, otherwise LGTM

.object_state(&object_id)
.await
.map_err(|_| FastPayError::ObjectNotFound)?;
// let object = self
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we remove comment out code?

// Get a copy of the object.
// TODO: We only need to read the read_only and owner field of the object,
// it's a bit wasteful to copy the entire object.
let _objects = self.get_objects(&ids[..]).await?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why _ prefix?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now the object is used so removed the _ indeed

// Get a copy of the object.
// TODO: We only need to read the read_only and owner field of the object,
// it's a bit wasteful to copy the entire object.
let _objects = self.get_objects(&ids[..]).await?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@@ -221,9 +242,10 @@ impl AuthorityStore {
pub fn update_state(
&self,
temporary_store: AuthorityTemporaryStore,
_expired_object_owners: Vec<(FastPayAddress, ObjectID)>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't need _ prefix?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed!

)
.await

// self.make_order_info(&transaction_digest).await
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this intended?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, removed and commented instead.

@oxade
Copy link
Contributor

oxade commented Jan 11, 2022

Besides the minor comments, LGTM!
Thanks for the PR @gdanezis

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[fastx distsys] Add Object Owner index to AuthorityStore
4 participants