Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initialize a crate for memory related tools #89

Merged
merged 25 commits into from
Jan 16, 2019

Conversation

cheme
Copy link
Collaborator

@cheme cheme commented Dec 10, 2018

This PR is a first step for heapsize removal/replacement openethereum/parity-ethereum#9953 . It uses a fork of malloc_size_of (unpublished crate from servo project).

This PR create a new crate parity-util-mem:

  • allows defining a global allocator (target should be jemalloc), by using a feature. (not that there is no priority so if multiple allocator feature are defined the compilation will panic in a global allocator redefinition error).
  • plugs selected allocator to their heapsize measurement (selection of allocator with a feature).
  • allows measurement alternative with only size_of or size_of_val : this can be activated quickly by setting estimate-heapsize feature.
  • move code from parity-ethereum mem crate (MemZero), and and clear_on_drop reexport. This is from a previous split commit, I can put if out of this pr if needed.

After this crate cration, the parity-ethereum base code will basically look like: openethereum/parity-ethereum@master...cheme:rem_heapsize . I may overuse the derive macro but I think it makes it more consistent (you do not need to check heapsize implementation when adding a field), there is also still some room for some implementations (especially if we want to use ConditionalMallocSizeOf, I need to test if there is some need).

Parity-common kvdb crate (otherwhise it will segfault for other allocator than malloc, because it still uses old heapsize directly).

The malloc_size_of usage is a bit awkward:
We copy code from servo project and patch it to remove some contents. We also switch servoarc to std arc.
I put a .sh file as documentation of the process and a .patch file to allow update from servo repo (I do not think there will be a lot).

The reason for this choice are :

  • using as an external crate does not allow defining trait for custom types (parity-types) in the crate. That is very problematic when it comes to update things (even with local projects the dependency on parity-types are really difficult to change. Therefore including the trait in the parity crate let us implement the trait for those types and have an easier switching (we can consider moving the implementation in fixed-hash and uint at some point but it is difficult as a first step (if we plan to I shall feature gate those implementations to make the switch possible without republishing the crate).
  • using as an external crate by specifying github address is also awkward due to the size of the servo github project (for malloc_sizeof_derive crate it is doable right now).
  • the crate itself is really small.

dvdplm
dvdplm previously requested changes Dec 11, 2018
Cargo.toml Outdated
@@ -17,5 +17,6 @@ members = [
"trace-time",
"trie-standardmap",
"triehash",
"uint"
"uint",
"mem",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mem is too generic. Can you change to measure-mem or malloc-size-of or some other more descriptive name?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will use the package name 'parity-util-mem'

mem/Cargo.toml Outdated
@@ -0,0 +1,41 @@
[package]
name = "parity-util-mem"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it when the filename is the same as the package name.

mem/Cargo.toml Outdated
jemalloc-global = ["jemallocator"]
# implement additional types
ethereum-impls = ["ethereum-types", "elastic-array", "parking_lot"]
# Default malloc to conditional mettering
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this comment (and there's a typo: s/mettering/metering/).

mem/Cargo.toml Outdated
# implement additional types
ethereum-impls = ["ethereum-types", "elastic-array", "parking_lot"]
# Default malloc to conditional mettering
conditional-mettering = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
conditional-mettering = []
conditional-metering = []

mem/Cargo.toml Outdated
ethereum-impls = ["ethereum-types", "elastic-array", "parking_lot"]
# Default malloc to conditional mettering
conditional-mettering = []
# Use serde impl from mallocsizeof
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it malloc-size-of?

mem/src/lib.rs Outdated
/// This is a copy of patched crate `malloc_size_of` as a module.
/// We need to have it as an inner module to be able to define our own traits implementation,
/// if at some point the trait become standard enough we could use the right way of doing it
/// by implementing it in our type traits crates. At this time a move on this trait if implemented
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// by implementing it in our type traits crates. At this time a move on this trait if implemented
/// by implementing it in our type traits crates. At this time moving this trait

mem/src/lib.rs Outdated
/// We need to have it as an inner module to be able to define our own traits implementation,
/// if at some point the trait become standard enough we could use the right way of doing it
/// by implementing it in our type traits crates. At this time a move on this trait if implemented
/// at primitive types level would impact to much of the dependency to be easilly manageable.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// at primitive types level would impact to much of the dependency to be easilly manageable.
/// to the primitive types level would impact too much of the dependencies to be easily manageable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this information should be available in the readme.

mem/src/lib.rs Outdated
#[cfg(feature = "ethereum-impls")]
pub mod impls;

/// reexport clear_on_drop crate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doc comments start with capital letter. Add a link to the crate.

// file at the top-level directory of this distribution and at
// http://rust-lang.org/COPYRIGHT.
//
// Licensed under the Apache License, Version 2.0 <LICENSE-APACHE or
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're mixing licenses here, MIT/Apache with GPL. Is that ok? Can you double check please?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like MIT is compatible and apache v2 is only compatible with GPLv3 (not gpl v2 but we are v3). So it looks fine.

//! `<Box<_> as MallocSizeOf>::size_of(field, ops)`.


// This file is patched at commit 5bdea7dc1c80790a852a3fb03edfb2b8fbd403dc DO NOT EDIT.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned above, I think we should considering either forking or write our own crate inspired by this one. It seems like 90% of the code here is irrelevant to our needs and that many types we might want to measure need specialized code anyway.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • forking: doable but we need to get src/impl.rs and src/sizeof.rs in the fork, I prefer using a single crate (in the fork scenario I would consider reexported through parity-util-mem). The thing that I do not like with the fork is that we fork servo repo (thinking twice after trimming all the other stuff it is the same as a smaller repo, just with a huge history : I do not know if cargo is doing shallow clone).
  • write our own crate: I am not comfortable with writing existing code, but this PR is in someway doing that.

@cheme cheme requested a review from tomaka December 12, 2018 15:17
@tomaka
Copy link
Contributor

tomaka commented Jan 2, 2019

The reason for this choice are :

Why not copy-paste Servo's crate (with the proper licensing of course), instead of cloning it and patching it? It would be the first crate we fork.

@cheme
Copy link
Collaborator Author

cheme commented Jan 2, 2019

The crate is actually copied, I just wanted to keep it synching remote change somehow easy (but it may really not be necessary). So I guess it is possible (just need to remove the script and a comment in the copied code).

Proper way to do it should be to fork the crate, but it seems a bit too much when you consider the servo crate size.

@tomaka
Copy link
Contributor

tomaka commented Jan 2, 2019

Shouldn't we remove everything specific to Servo, though?

I was most in favour of not using a trait and having direct methods on everything, but the approach of this PR may be better. I'm in favour of it except for the fact that the crate is very large.

parity-util-mem/malloc_size_of_derive/LICENSE-APACHE Outdated Show resolved Hide resolved
parity-util-mem/README.md Outdated Show resolved Hide resolved
@cheme
Copy link
Collaborator Author

cheme commented Jan 2, 2019

@tomaka I did change the patch to remove all servo related code (before it was deactivated by a feature), it makes the copied code smaller. Maybe some of the servo trait will prove useless (I did not plan to use the conditional one directly (need some testing to see if relevant)) and could also be removed.

Copy link
Member

@ordian ordian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we test estimate-heapsize on CI?

parity-util-mem/src/sizeof.rs Outdated Show resolved Hide resolved
parity-util-mem/src/sizeof.rs Outdated Show resolved Hide resolved
parity-util-mem/src/sizeof.rs Outdated Show resolved Hide resolved
parity-util-mem/src/malloc_size.rs Outdated Show resolved Hide resolved
parity-util-mem/Cargo.toml Outdated Show resolved Hide resolved
@cheme
Copy link
Collaborator Author

cheme commented Jan 11, 2019

I just realize that publishing this to crates.io (requires for dependency such as 'memorydb' (the only one that really need an update), means that I need to publish 'malloc_size_of_derive' (by design .
I see 3 scenarii here:

  • ask servo team to publish 'malloc_size_of_derive' : probably not suitable as 'malloc_size_of' is unpublished.
  • rename the copy of crate in the pr to 'parity-util-mem-derive' and publish it as a parity crate, I really don't like that because there is really no change to the code.
  • rewrite the proc macro (not super interesting) and publish it as a parity crate.

@tomaka
Copy link
Contributor

tomaka commented Jan 11, 2019

Since we will have our local version of malloc_size_of, it makes sense to me to have our local version of malloc_size_of_derive as well.
If Servo's trait diverges from us, the proc macro will likely diverge as well.

@ordian
Copy link
Member

ordian commented Jan 14, 2019

The windows CI failure is unrelated (I wonder whether we shouldn't have switched from appveyor).

@cheme Could you open a PR to switch parity-common to this crate before publishing it? I guess that would be a breaking change, so versions need to be bumped.

@cheme
Copy link
Collaborator Author

cheme commented Jan 14, 2019

@ordian , I have two ways to do it : the breaking one (I got a branch ready: cheme/parity-common@mem-tools2...cheme:rem_heapsize ), or just a deprecating one ( cheme/parity-common@mem-tools2...cheme:deprecate_heapsize for parity eth we need to update memory db but I can in a first time just deprecate 'mem_used' method and create a new one that uses this crate.

I can open both PR before publishing it (with reference to github branch), so we can check ci. I can also open the parity ethereum PR before publishing (the changes on common are not really likely to fail).

@cheme cheme removed the A1-onice label Jan 14, 2019
This was referenced Jan 14, 2019
@cheme cheme merged commit e388dcf into paritytech:master Jan 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants