Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Canonicalize the metadata the same way as the solidity compiler #978

Closed
marcocastignoli opened this issue Apr 13, 2023 · 22 comments
Closed

Comments

@marcocastignoli
Copy link
Member

marcocastignoli commented Apr 13, 2023

We should canonicalize the metadata when we are generation "metadata variations" when we have a partial match. It could be that the verifier somehow mixed the orderings of the metadata or added whitespaces etc.

View in Huly HI-395

@marcocastignoli
Copy link
Member Author

The library used by the solidity compiler is the following: https://open-source-parsers.github.io/jsoncpp-docs/doxygen/class_json_1_1_stream_writer_builder.html

I searched a bit how this library handles the ordering of the keys. And it doesn't because it follows the JSON spec, as explained here: open-source-parsers/jsoncpp#237

The JSON spec at http://www.json.org/ says explicitly that you can't rely on ordering in an object.

An object is an unordered set of name/value pairs.

@kuzdogan kuzdogan changed the title JSON stringify metadata the same way as the solidity compiler Canonicalize the metadata the same way as the solidity compiler Apr 13, 2023
@marcocastignoli
Copy link
Member Author

I think that the order of the fields should be explicit in the solidity compiler, in this moment I believe it's not. Then we have to implement the same ordering in typescript with JSON.stringify.

@kuzdogan
Copy link
Member

For future ref, this is where the metadata is created
CompilerStack.cpp

@kuzdogan
Copy link
Member

@cameel My observation so far has been that the metadata is always canonicalized by:

  1. Alphabetically ordering the object keys (recursively)
  2. Removing whitespaces
    which I added to the docs Update Metadata Documentation solidity#14052

However we couldn't find where and how the alphabetical ordering is done in the compiler code. Can you please point out?

This is because we want to canonicalize the metadata given to Sourcify by the user in case they have changed the ordering, whitespaces etc.

@cameel
Copy link
Member

cameel commented Apr 13, 2023

As for whitespace, that's handled by our jsonPrint(). It sets indentation settings for jsoncpp and lets it handle the rest.

We don't specify ordering explicitly. It's whatever order jsoncpp's StreamWriter::write() happens to use. That's based on the iteration order of Json::Value::Members, which is std::vector<String>. Since it's a vector, it just keeps whatever order the items were inserted in. I'd have to dig deeper to say if keys are sorted on insertion or not but @marcocastignoli's quote above implies they're not.

So, unfortunately, I don't think we canonicalize the JSON in any way other than telling jsoncpp to pretty print it. It's surprising that we even manage to get that order stable and reproducible that way. I would not be surprised if it randomly changed between compiler versions. On top of it we have plans to replace jsoncpp with a better maintained JSON library, though it's unclear when we'll have time for that (attempts so far failed). This is bound to change small details about how the output is formatted.

@marcocastignoli
Copy link
Member Author

marcocastignoli commented Apr 13, 2023

@cameel in the code here (https://github.com/ethereum/solidity/blob/f0c0df2d8ce57896002950b0c043fdd29a22d67e/libsolidity/interface/CompilerStack.cpp#L1540) I see that ´version´ is, in theory, the first field

string CompilerStack::createMetadata(Contract const& _contract, bool _forIR) const
{
	Json::Value meta{Json::objectValue};
	meta["version"] = 1;
        ...

But all the metadata I've seen so far are ordered alphabetically, like @kuzdogan said above

Alphabetically ordering the object keys (recursively)

So I don't think it just keeps the order in which items are inserted.

@kuzdogan
Copy link
Member

kuzdogan commented Apr 13, 2023

Yes in fact we constantly observe the keys are always alphabetically ordered. Here are some examples:

Metadata files in Sourcify repo https://repo.sourcify.dev/contracts/full_match/11155111/0x51567Ac53024D9c958097aAe7C544612B11E3B99/metadata.json https://repo.sourcify.dev/contracts/full_match/11155111/0x76877ed3544ed6105bE08AB0dcb414a71bD28fBE/metadata.json https://repo.sourcify.dev/contracts/full_match/11155111/0xc3262f1B3f191699570Dea4220Bc1D6D5E15B8e3/metadata.json https://repo.sourcify.dev/contracts/full_match/11155111/0x7eEdCf7A13e58d1DCCc1C2bB8bc5Fbcb546cea79/metadata.json https://repo.sourcify.dev/contracts/full_match/11155111/0x18227911BE4e4AfAc96b7d5f29D1c4959B2d4AEa/metadata.json https://repo.sourcify.dev/contracts/full_match/11155111/0x412eC953708761C65708dd0d8A82A8A3FD377F0e/metadata.json https://repo.sourcify.dev/contracts/full_match/11155111/0x258643bFfFD36f47525e5A5C2Eb1d083Ecd72c64/metadata.json https://repo.sourcify.dev/contracts/full_match/11155111/0xE2c36Be76Cf410780b35c2CC4F3ddEF9e9F15abd/metadata.json https://repo.sourcify.dev/contracts/full_match/11155111/0xeA8BFB6270Cb8b95B283E4E78b2BD4214e90F224/metadata.json https://repo.sourcify.dev/contracts/full_match/11155111/0x9202087c8ace1945c022bA66afd6F9A5cb29D813/metadata.json https://repo.sourcify.dev/contracts/full_match/11155111/0xD8C537B4e675d617c9c0611503E47538e120BC0A/metadata.json https://repo.sourcify.dev/contracts/full_match/11155111/0x9fF78f4DD775B1b456307B8C4A1Fb0D22cD6a08d/metadata.json https://repo.sourcify.dev/contracts/full_match/11155111/0xEcC5653B39e7F95dFF667dC0B725867D90CdA5D0/metadata.json https://repo.sourcify.dev/contracts/full_match/11155111/0x7c2AFD1e0d560571CA116DabDe4Bc32555C6b246/metadata.json https://repo.sourcify.dev/contracts/full_match/11155111/0x8455AF1D9Db3b56a83302299727712C798df89B8/metadata.json https://repo.sourcify.dev/contracts/full_match/11155111/0xCa52f78913956D46Ed58279c6Fc03410AbeaE1C8/metadata.json https://repo.sourcify.dev/contracts/full_match/11155111/0x0b2a51febEfE4880f841C8974081DB6eC4e91706/metadata.json https://repo.sourcify.dev/contracts/full_match/11155111/0x566C10580782632A54ee8fBBfe034C621E3b8c43/metadata.json https://repo.sourcify.dev/contracts/full_match/11155111/0x753e9764A1499005Bd89001D8A8596eA36cAB57a/metadata.json https://repo.sourcify.dev/contracts/full_match/11155111/0x9CbFaFA327ABD11C7E475FC9Ae45f4A5de2f150C/metadata.json https://repo.sourcify.dev/contracts/full_match/11155111/0x7a0AB72902d982F18De76F7590C662a6966b245F/metadata.json https://repo.sourcify.dev/contracts/full_match/11155111/0x32CFEA5f1736A65014994a2128CB1D8d57e5e18A/metadata.json https://repo.sourcify.dev/contracts/full_match/11155111/0x5144255405A1e25E89b5439e1Cf38a88e3130a31/metadata.json https://repo.sourcify.dev/contracts/full_match/11155111/0xc56Fc90caa48c41E5b620771B853a4Ff1D0365cD/metadata.json https://repo.sourcify.dev/contracts/full_match/11155111/0x9F62bD838AAb80706626211687021F77b7657794/metadata.json https://repo.sourcify.dev/contracts/full_match/11155111/0x8af239CB6d38AB1988B4AED648C0C4EF91c9EA87/metadata.json https://repo.sourcify.dev/contracts/full_match/11155111/0xB44e8CfebB99c6AD470E5Ab31c2Abe54c758F12C/metadata.json https://repo.sourcify.dev/contracts/full_match/11155111/0x7890b098C859B97F3c59C8421543Ae5488c6e3B3/metadata.json https://repo.sourcify.dev/contracts/full_match/11155111/0xe29A723E0ac513777208E5086d3dD35891f5f0bc/metadata.json https://repo.sourcify.dev/contracts/full_match/11155111/0x8D274B5824CCeF101AB2BbBAf688eDC7980804a6/metadata.json
I can check with a simple script if we have any metadata files at all that does not follow the alphabetical ordering, if we don't know where this is coming from. Maybe we'll find some discrepancies between compiler versions.

@marcocastignoli
Copy link
Member Author

@cameel Instead of having to always canonicalize the metadata before hasing, I suggest using the multicodec dag-json (https://github.com/ipld/js-dag-json) to generate the IPFS CID of the metadata as a binary object. This method would not rely on the key order and would use the specific multicodec dag-json for more accurate IPFS hash calculations.

@cameel
Copy link
Member

cameel commented Apr 14, 2023

ok, looks like you're right, they are sorted. I can at least say that we don't order them explicitly.

Looks like I overlooked the fact that the members vector is filled on the spot by StreamWriter::write() and not stored in the object. The object instead stores members in a map and that has a guaranteed sort order (based on string comparison in this case). So this finally explains it.

@cameel
Copy link
Member

cameel commented Apr 14, 2023

I suggest using the multicodec dag-json (https://github.com/ipld/js-dag-json) to generate the IPFS CID of the metadata as a binary object.

That looks interesting. It's a JS library though, so we can't use it directly (is there a C++ implementation)? And the other complication is that we're focusing hard on the roadmap now so it's not something we'd do in the near future unless it either impacts you in a major way or we get someone to contribute that.

@kuzdogan
Copy link
Member

@cameel ok great at least now we know for sure, and the statement in the docs is correct :)

@marcocastignoli's suggestion is not for the immediate future but consideration for later. Since we made clear the JSON is sorted, we can say it's canonicalized and it's fine this way.

Is there a resource in Solidity for possible metadata updates (version 2?) that we can start gathering potential upgrades? or we can start one internally

@cameel
Copy link
Member

cameel commented Apr 14, 2023

I don't think we ever discussed updating the metadata format in the team. At least there's nothing that I'm aware of.

There may be some changes to the data inside (e.g. at some point we discussed whether libraries should even be included there), but I would not expect any format changes in the near future.

@marcocastignoli
Copy link
Member Author

I think that we can close this issue for now since javascript's JSON.stringify works as jsoncpp's StreamWriter::write(). It could just be luck that the two libraries implement sorting in the same way, but it is actually already canonicalize, infact we never had any problem. @kuzdogan what do you think?

@kuzdogan
Copy link
Member

This was not a problem in the first place but an additional step to add as a "variation" when trying to verify perfectly. I explained by editing your first message

@marcocastignoli
Copy link
Member Author

marcocastignoli commented Apr 21, 2023

Mine report regarding the ordering of the key was more an assumption that I wanted to check with the solidity team. Now that we know that the two libraries work in the javascript and cpp implementations in the same way, I think this is not a problem anymore. Of course to be 100% sure it would be better to implement something like this:

I suggest using the multicodec dag-json (https://github.com/ipld/js-dag-json) to generate the IPFS CID of the metadata as a binary object.

@kuzdogan
Copy link
Member

It is only the whitespaces part that is working the same way, no? Does JS JSON.stringify always do alphabetical reordering?

@marcocastignoli
Copy link
Member Author

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/JSON/stringify

For example, JSON.stringify on the same object will always produce the same string, and JSON.parse(JSON.stringify(obj)) would produce an object with the same key ordering as the original (assuming the object is completely JSON-serializable).

We never change the order, even with tryToFindOriginalMetadata

@kuzdogan
Copy link
Member

kuzdogan commented Apr 21, 2023

I'm not sure we are on the same page. Imagine a case when the user manually changes the alphabetical ordering of the metadata fields somehow. Say they put language before compiler:

{
  "language": "Solidity",
  "compiler": {
    "version": "0.8.7+commit.e28d00a7"
  },

Whereas alphabetically it should be:

{
  "compiler": {
    "version": "0.8.7+commit.e28d00a7"
  },
  "language": "Solidity",

I'm saying that, now that we know the compiler sorts the keys, as an additional step we should check the ordering of the keys recursively in the metadata object, before we even start the variations.

@marcocastignoli
Copy link
Member Author

Now I get why you changed to description of this issue, I thought of this issue more as research on the topic of metadata canonicity. What you are describing is more of a specific case for the metadata variation.

@kuzdogan
Copy link
Member

Could you add a test please 👉👈

@kuzdogan kuzdogan reopened this Apr 25, 2023
@marcocastignoli
Copy link
Member Author

Yep, cought in the act 👐 Tomorrow morning I'll fix this

@marcocastignoli
Copy link
Member Author

test added with the latest commit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants