Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace serialization functions #8867

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

vtnerd
Copy link
Contributor

@vtnerd vtnerd commented May 20, 2023

This replaces all serialization functions (epee binary, epee json, and zmq json), except the custom binary formats for transaction and block. The design is a DOMless vtable interface for reading and writing (2 separate interfaces). The base vtable class can be used to support multiple formats with minimal executable size OR the derived class can be used directly for performance. The design re-uses existing macros, although the newer macros should be lighter on compiler resources.

Memory consumption issues are handled by having all arrays provide a constraint - either a max element count or a min size per element (i.e. std::vector<std::string> ideally has a min size of sizeof(std::string) per element to prevent memory issues on unpack).

Most importantly, reading epee binary is ~81% faster and writing epee binary is ~56% faster (I re-used the benchmark by @jeffro256 on a ryzen 3 chip). This is primarily due to the removal of the DOM entirely - everything is done on the objects directly.

Unfortunately this diff is huge, and probably has no chance of ever being merged as-is. So I will try to break things into smaller chunks, with the caveat that if nobody wants the serialization to be replaced, its best to vote within this discussion thread here.

@vtnerd
Copy link
Contributor Author

vtnerd commented May 20, 2023

Oops, those performance numbers are from epee binary, the read for json should be less.

@vtnerd
Copy link
Contributor Author

vtnerd commented May 21, 2023

I forgot to document a benefit of this new scheme - anything marked as a blob, epee::byte_slice, epee::span<const uint8_t> or std::vector<uint8_t> is marked as binary and automatically converted to hex by the JSON output system. This means many of the manual std::string conversions that are performed no longer need to be. Additionally, this allows for epee binary, msgpack, and json to be efficiently by the same code.

@vtnerd vtnerd mentioned this pull request May 21, 2023
@selsta
Copy link
Collaborator

selsta commented May 22, 2023

One thing I noticed during testing is that checking /get_info doesn't work anymore

curl 127.0.0.1:18081/get_info
2023-05-22 09:12:02.656	I HTTP [127.0.0.1] GET /get_info
2023-05-22 09:12:02.656	D Schema expected object
2023-05-22 09:12:02.656	E Failed to parse json: 

Is this intentional?

@vtnerd
Copy link
Contributor Author

vtnerd commented May 22, 2023

It is expected behavior given the code, but is probably not desireable. This endpoint should work with an empty JSON object in the body of the http message. I didn't think about these endpoints; JSON-RPC now requires a params: {} for empty parameters, but that didn't seem so bad given that JSON-RPC requires additional fields.

I'll work out some sort of fix for this.

@vtnerd
Copy link
Contributor Author

vtnerd commented May 22, 2023

@selsta another update - assuming requiring a {} in the http body as not desirable, can we instead make it a requirement that the body be empty? The problem is DOMless mode has to assume/enforce a particular format (i.e. it cannot assume emptiness and {} at same time).

@vtnerd
Copy link
Contributor Author

vtnerd commented May 22, 2023

@selsta another update - I don't think this one is easily fixed, unless pay-for-RPC is removed. This endpoint has an optional field (client) that is mandatory when pay-for-RPC is enabled. This is currently supported by start_object(), then checking for client (which can be omitted), and then calling end_object().

It's a shame that such a simple thing could hold up this PR, but this one is difficult to fix without a DOM.

Maybe we re-enable DOM reading for requests but not for responses? It would be a shame to lose an ~81% improvement on these binary reads, but at least the requests are typically small anyway.

@UkoeHB
Copy link
Contributor

UkoeHB commented May 23, 2023

@vtnerd pay-for-RPC should already be removed: #8724

@selsta
Copy link
Collaborator

selsta commented May 23, 2023

We only removed it from the wallet side so far.

@vtnerd
Copy link
Contributor Author

vtnerd commented May 23, 2023

Basically the "root" read-function has to do either: std::is_empty<T>::value && buffer.empty() OR is_optional_root<T>::value && buffer.empty().

The latter is necessary currently because the login field yields std::is_empty<T>::value == false. For now, I will do:

template<typename T>
struct is_optional_root
  : std::is_empty<T>
{};

with overloads for a couple of these RPC functions.

@vtnerd vtnerd force-pushed the replace_serialization branch from 9ad29e0 to 7327eae Compare May 23, 2023 21:11
@vtnerd
Copy link
Contributor Author

vtnerd commented May 23, 2023

@selsta force pushed the fix I mentioned earlier. I ran your test, and it appears to work. Not all endpoints can be empty, only ones where std::is_empty<T>::value == true or where wire::is_optional_root<T>::value == true. I could allow all endpoints, which would mean that every endpoint can be called with an empty "body" where all of the values are default-initialized (i.e. basically 0, etc.).

@selsta @UkoeHB thoughts on always allowing an empty root?

@vtnerd vtnerd force-pushed the replace_serialization branch 2 times, most recently from 4feb4df to 8eb225a Compare May 23, 2023 22:42
@jeffro256
Copy link
Contributor

@selsta another update - assuming requiring a {} in the http body as not desirable, can we instead make it a requirement that the body be empty? The problem is DOMless mode has to assume/enforce a particular format (i.e. it cannot assume emptiness and {} at same time).

This could be fixed at the server mappers level: for all JSON requests, if the HTTP body is empty, just replace it with the string {}. That way we can have a good expected default behavior across all Monero RPC endpoints, but keep the underlying serialization code more strict.

@UkoeHB
Copy link
Contributor

UkoeHB commented May 24, 2023

@vtnerd I don't know nearly enough about serialization to comment.

@vtnerd
Copy link
Contributor Author

vtnerd commented May 24, 2023

@jeffro256 not a bad idea, but I prefer the current approach I just force pushed (but I might be persuaded otherwise). Making the string {} is JSON specific. Other formats, such as epee_binary, require a binary header, while formats like msgpack require a single binary tag. That approach would also make any type compatible with an empty buffer, whereas this only accepts empty structs and opt-in types.

@jeffro256
Copy link
Contributor

jeffro256 commented May 30, 2023

Memory consumption issues are handled by having all arrays provide a constraint - either a max element count or a min size per element (i.e. std::vectorstd::string ideally has a min size of sizeof(std::string) per element to prevent memory issues on unpack).

I might be misunderstanding here, but why would one want to put a minimum element size when deserializing std::vector<std::string>s? My system has sizeof(std::string) == 32, which would make for quite a large minimum bound. I can see it making sense for arrays of fixed-size POD-types, but is this bound applied for all regular strings?

@jeffro256
Copy link
Contributor

If one was expecting to deserialize an array of large blobs (like tx blobs) from a untrusted counterparty I could definitely see the utility in this.

@vtnerd
Copy link
Contributor Author

vtnerd commented May 30, 2023

I might be misunderstanding here, but why would one want to put a minimum element size when deserializing std::vectorstd::strings? My system has sizeof(std::string) == 32, which would make for quite a large minimum bound. I can see it making sense for arrays of fixed-size POD-types, but is this bound applied for all regular strings?

This constraint might be confusing, particular since I haven't seen it used before. Perhaps I should change the name to minimum_wire_size? I used the term element to make it clear that it was a per-element constraint in an array.

The goal is to provide a cap on the compression ratio between the "on-the-wire" format and the C/C++ type. As example: ["","",...,""] has roughly a 10-to-1 compression ratio on your system. Every 3-bytes read turns into 32-bytes in memory. If the min_element_size were 10, the average length of all the std::string read must be at least 10. The compression ratio (even considering an outside allocation), drops to 3-to-1 compression.

I used this where std::string is used to store an entire transaction or block - instead of capping the total number of transactions or blocks it requires that a minimum size be enforced on the blobs. The total-maximum read buffer can be raised or lowered without making changes to constraints of this type.

If one was expecting to deserialize an array of large blobs (like tx blobs) from a untrusted counterparty I could definitely see the utility in this.

This is replacing p2p serialization, so it is doing exactly that.

@vtnerd vtnerd force-pushed the replace_serialization branch from eff2620 to 9abfb43 Compare July 15, 2023 16:55
@vtnerd
Copy link
Contributor Author

vtnerd commented Jul 15, 2023

Rebased after recent changes caused merge conflicts.

EDIT: The int and unsigned overloads for the write interface were also dropped, as per other review.

jeffro256 added a commit to jeffro256/monero that referenced this pull request Aug 10, 2023
@vtnerd vtnerd force-pushed the replace_serialization branch from 5893932 to 8357858 Compare September 29, 2023 23:54
@vtnerd
Copy link
Contributor Author

vtnerd commented Sep 29, 2023

Just did a rebase.

@vtnerd vtnerd force-pushed the replace_serialization branch from e9edf20 to d9a7b74 Compare October 25, 2023 00:40
@vtnerd
Copy link
Contributor Author

vtnerd commented Oct 25, 2023

Another rebase, and updates based on @jeffro256 review (in the other PR #8970 ).

@vtnerd vtnerd force-pushed the replace_serialization branch from 0b0a3f5 to c9df851 Compare October 27, 2023 00:59
@vtnerd vtnerd force-pushed the replace_serialization branch from c9df851 to 7200698 Compare November 6, 2023 18:26
@vtnerd
Copy link
Contributor Author

vtnerd commented Nov 6, 2023

Another rebase, and a change to c++17 std::size from boost::size.

@vtnerd vtnerd force-pushed the replace_serialization branch from 7200698 to 0c848e9 Compare November 6, 2023 21:07
@vtnerd
Copy link
Contributor Author

vtnerd commented Nov 6, 2023

Decided against std::size, and instead added a compile-time check to prevent input iterators to be used with boost::size.

@vtnerd vtnerd force-pushed the replace_serialization branch from 1c6e1c1 to 87db8c7 Compare June 20, 2024 17:03
@vtnerd
Copy link
Contributor Author

vtnerd commented Jun 20, 2024

Just did a rebase, and changed around some of the array limits. See other review which was just updated too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants