Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StorageKey - why use WriteBytesWithGrouping()? #793

Closed
ixje opened this issue Jun 3, 2019 · 16 comments
Closed

StorageKey - why use WriteBytesWithGrouping()? #793

ixje opened this issue Jun 3, 2019 · 16 comments
Labels
Question Used in questions

Comments

@ixje
Copy link
Contributor

ixje commented Jun 3, 2019

I'm doing an audit between neo-python and neo-cli and encountered an issue with one of the storage keys. Specifically blocks 131179 of TestNet reports a new storage item added being

{"state":"Added",
        "key":"2170e2e4a128ba6b287b14c1ad4be8d03bbf04f031313131313131313131313131313131003131313131313131313100000000000006",
        "value":"000f32323232323232323232323232323200"}]},

The key can be broken down into:

  • Scripthash: 2170e2e4a128ba6b287b14c1ad4be8d03bbf04f0
  • key_value: 313131313131313131313131313131310031313131313131313131
  • padding: 00000000000006 (to create 16 byte alignment)

if you look closely at the key value you'll see there is a 00 inserted after 16 bytes. The actual key (as can be seen in the screenshot) does not have this 00 byte.
Screenshot 2019-06-03 at 16 12 36
This deviation is caused by WriteBytesWithGrouping() in the code below

void ISerializable.Serialize(BinaryWriter writer)
{
writer.Write(ScriptHash);
writer.WriteBytesWithGrouping(Key);

My question is;
Why are we using this method for serialization instead of a normal write?

It gives unexpected behaviour when you use it (like done here in the StatesDumper plugin):

state["key"] = trackable.Key.ToArray().ToHexString();

I believe the key should be the real key, not some modified form.

@shargon
Copy link
Member

shargon commented Jun 3, 2019

I think the same as you, should be a regular write

@erikzhang
Copy link
Member

erikzhang commented Jun 3, 2019

It is for "Find()". See #200

@erikzhang erikzhang added the Question Used in questions label Jun 3, 2019
@ixje
Copy link
Contributor Author

ixje commented Jun 4, 2019

Thanks 😞

I don't see this changing anymore for 2.x but should we maybe change the approach for 3.0 to remove the confusing behaviour of ToArrray() for StorageKey?

This would mean updating Storage_Put/Delete/Get instead of using WriteBytesWithGrouping.

@igormcoelho
Copy link
Contributor

I always wanted to know this! hahaha :)
Well, this is a storage specific thing or is it necessarily part of the protocol? I mean, other clients could perhaps not implement this and get the same results with Find perhaps?

@ixje
Copy link
Contributor Author

ixje commented Jun 5, 2019

It's a client thing. Neo-python doesn't use it and Find works fine because we fixed Put/Delete/Get instead

@jsolman
Copy link
Contributor

jsolman commented Jun 6, 2019

It has to do with the way the length is encoded. I had to deal with this in some of the plugins I wrote. It should be able to get the length from LevelDB though and not need to encode the length into the key. I didn’t deep dive to fix it, but it should be fixable for NEO 3.0. As long as the storage engine can save and report key length; which it can, there isn’t a good reason to do what is being done.

@ixje ixje closed this as completed Jun 14, 2019
Thacryba pushed a commit to simplitech/neo that referenced this issue Feb 17, 2020
* fix issue neo-project#791

* follow up

* follow up

* add example
@roman-khimov
Copy link
Contributor

May I ask to reopen this bug? I think the issue is still relevant for NEO 3, but now with #1383 changes it has deeper impact for node compatibility. In neo-go we also have implemented a simpler key serialization scheme without grouping and it works fine (some conversion is needed to compare our 2.0 state with neo-storage-audit, but that's a minor annoyance affecting just that --- state change dumps), but now if MPT is to be added using this scheme then we will have to either change the key serialization format or make conversions for MPT calculations. None of these options make me happy as I think it should be trivial to fix it in C# code (and simplify C# code at the same time).

@shargon
Copy link
Member

shargon commented Apr 13, 2020

Yes, we should create a standard. Could you share your neo-go group impementation?

@roman-khimov
Copy link
Contributor

We just don't do any grouping, we append byte representation of the key to the scripthash, like this:
https://github.com/nspcc-dev/neo-go/blob/405fa9c411637a1b77a98132139e177279598332/pkg/core/dao/dao.go#L466

And it works for Find well in that if there are keys "ab" and "ac" in the DB Find returns them when searching for prefix a.

@shargon
Copy link
Member

shargon commented Apr 13, 2020

Please check this thread #199 the problem for us is WriteVarBytes that it could produce undesired results. I think that store should return a fixed size, without using WriteVarBytes but it was designed for working with streams.

@erikzhang
Copy link
Member

but it was designed for working with streams.

Exactly!

@roman-khimov
Copy link
Contributor

roman-khimov commented Apr 13, 2020

Please check this thread #199 the problem for us is WriteVarBytes

I've seen that, but we do not use WriteVarBytes for keys and I still can't see anything preventing C# code from doing the same thing.

but it was designed for working with streams.

Well, if it's just to make this StorageKey ISerializable then yes, it does make some sense, but then the question is --- do we really need ISerializable here?

Update: I see DataCache needs it, still there should be some way.

I mean, from pure DB design perspective (especially given that after #1383 every implementation will have to deal with this format one way or another) this doesn't seem to be necessary to do the job, so making this transformation a standard required for proper MPT implementation doesn't seem to be a good solution to me, there should be some room for more efficient implementations.

@roman-khimov
Copy link
Contributor

OK, so we have stream-based ISerializable and DataCache using that both for keys and values. All good, but we have variable-length StorageKey that can't use our nice WriteVarBytes because of Storage.Find. The StorageKey is only used as a key for the DB.

What I'm thinking about is that every stream has an end, one way or another. So I can't see anything preventing us from creating a ReadTillEOF helper that can be used for StorageKey. Writing never was a problem really, Write will do it well, the only problem is reading. But as we never read StorageKey from the network, we only read it from our own DB, simple reading till EOF (or even EndOfStreamException) shouldn't be an issue.

Pros:

  • ~5 lines for ReadTillEOF would allow to remove some tens of lines of otherwise unused ReadBytesWithGrouping/WriteBytesWithGrouping code
  • shorter keys!
  • less processing required
  • simplifies MPT calculation

Cons:

  • sorry, I don't see any

What do you think?

@erikzhang
Copy link
Member

What I'm thinking about is that every stream has an end, one way or another.

Network stream hasn't an end.

@roman-khimov
Copy link
Contributor

Luckily we're not dealing with networking in this particular case. I do understand that it makes StorageKey somewhat impure wrt general properties expected from a Serializable thing (like you can't put two StorageKey values into a stream and then read them back with TillEOF encoding), but at the same time it's a very special case where our normal encoding principles (like VarBytes) just don't work and one way or another but we have to make something special for this particular case. I think TillEOF solution is better in that it's simpler and solves the problem.

Also, again, it's one thing for an implementation to have whatever encoding scheme it wants to and it's completely another thing to standardize something at the protocol level. So what I'm really concerned about is MPT calculation (which is soon to be a part of the protocol), I think it'd be nice to have simpler encoding used for it even if BytesWithGrouping scheme is to remain in the C# implementation.

@erikzhang
Copy link
Member

#1566

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Question Used in questions
Projects
None yet
Development

No branches or pull requests

6 participants