Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dist: Support a bincoded manifest file for performance reasons #2627

Closed
wants to merge 1 commit into from

Conversation

kinnison
Copy link
Contributor

@kinnison kinnison commented Jan 2, 2021

This goes some of the way to mitigating #2626 but isn't a "fix" per-se.

Not least, we need to be sure of whether this is valid.

@kinnison kinnison changed the title dist: Trim the manifest toml to improve startup time dist: Support a bincoded manifest file for performance reasons Jan 9, 2021
@kinnison
Copy link
Contributor Author

kinnison commented Jan 9, 2021

I've rewritten this as a serialisation of the parsed manifest as a bincoded file. This is basically the same performance as toml parsing the trimmed manifest, but doesn't involve trimming which was debateable as to its correctness.

We need to introduce a version indicator for this so that we can detect if we should fall back to reading the toml and rewriting the bincode in case of changing our manifest structures.

@rbtcollins
Copy link
Contributor

@kinnison
Copy link
Contributor Author

Is there a flatbuffers crate with serde support?


file.sync_data()?;

Ok(())
}

pub fn write_file(path: &Path, contents: &str) -> io::Result<()> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this pays for itself vs write_file(path, contents.as_bytes())?;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, I'll sort out a refactor commit alongside this which pushes that up to the call sites.

@@ -52,20 +52,24 @@ pub fn if_not_empty<S: PartialEq<str>>(s: S) -> Option<S> {
}
}

pub fn write_file(path: &Path, contents: &str) -> io::Result<()> {
pub fn write_file_bytes(path: &Path, contents: &[u8]) -> io::Result<()> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I note this is doing a sync_data - this is an important part of the contract of the function; if we're renaming it perhaps consider exposing that at the same time - e.g. namespacing it or adding _synced or something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this idea and will sort it out

@rbtcollins
Copy link
Contributor

There is for flexbuffers - https://github.com/google/flatbuffers/tree/master/rust/flexbuffers - but I'm not sure of the story for flatfbuffers.

@kinnison
Copy link
Contributor Author

Okay so flexbuffers look plausible vs. bincode, though as it's an internal cache implementation detail why are you adamant we shouldn't use bincode?

@rbtcollins
Copy link
Contributor

If we need to debug it or introspect it, flatbuffers has more tooling available as it isn't rust-only with relatively few users. ditto flexbuffers; flatbuffers is the schemad version, I don't think the lack of serde support should be an issue though I haven't looked into it closely - an alternative would be protobuf, the tower protobuf glue is pretty nice

@kinnison
Copy link
Contributor Author

I'm concerned about minimising the impact of the effort if we're do this soon. I was thinking of treating the binary as a cache and if it failed to load falling back to the toml. The serde capability just means it's much less effort for us in terms of implementation.

Debuggability is a good argument against bincode though. Flexbuffers look plausible if a bit more awkward to implement than bincode, yaml, json, etc.

@bjorn3
Copy link
Member

bjorn3 commented Feb 24, 2021

Cargo also uses bincode for certain caches like fingerprints. Flatbuffers having a schema would make the caches a bit bigger I think and will likely encourage others to inspect this implementation detail.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants