-
Notifications
You must be signed in to change notification settings - Fork 3.8k
DAWN-167 ⁃ eosd dies when there is infinite notifications between two contracts #704
Comments
➤ Brian Johnson commented: (Comment removed, misunderstood the test scenario) |
➤ Dhanesh Valappil commented: Let me simplify the scenario. Contracts A and B are talking to each other. For the next 3 seconds, B receives plenty of messages. After that, seems during the 3 seconds context switch eosd becomes unresponsive and no more block generation |
➤ Brian Johnson commented: void chain_controller::process_message(const Transaction& trx, AccountName code, output.notify.reserve( apply_ctx.notified.size() ); for( uint32_t i = 0; i < apply_ctx.notified.size(); ++i ) { And the code in here will never matter since notified is not carried into the next context: void apply_context::require_recipient(const types::AccountName& account) { auto itr = boost::find_if(notified, [&account](const auto& recipient) { if (itr == notified.end()) { |
➤ Brian Johnson commented: Depending on what the use cases we need to accommodate, there could be several solutions to limit this. Easiest one I can think of is imposing a total transaction time limit (right now we have a transaction message time limit). With the limited time to release, I would like input from [~dan.larimer] and [~admin] |
➤ David Moss commented: [~brian.johnson] Agreed. Please proceed with imposing a total transaction time limit. |
➤ Jonathan Giszczak commented: There is an existing recursion limit for transactions of 4. This is a fundamental aspect of the blockchain, encoded in the genesis block. There is no similar limit for messages. Should we impose the same limit on message recursion or a different limit? |
➤ Christian Dunst commented: I brought this up in a casual conversation and I believe that the solution from Dan was to impose a total limit. If we don't do this, we could easily write a cascading contract which will bring the bc down. |
➤ Dhanesh Valappil commented: @christian Dunst, I already wrote a "malicious" contract for the same purpose and it could bring eosd down. |
➤ Jonathan Giszczak commented: [~dhanesh.balakrishnan] Please attach buildable versions of the contracts. I have a possible fix, but need to test it. |
➤ Dhanesh Valappil commented: Attaching two contracts herewith |
➤ Dhanesh Valappil commented: How to test each test cases: // the following one simulates endless transaction msgs |
➤ Dhanesh Valappil commented: Works well in the latest build. Here is the test result: bash-3.2$ ./eosc push message malicious notify '{"from":"malicious","to":"dummy", "amount":50}' --scope malicious,dummy |
➤ Kevin Heifner commented: Please get this contract into github and integrated into eosd_run_test.sh so it is run on every commit. |
➤ Dhanesh Valappil commented: Works as expected bash-3.2$ eosc push message malicious notify '{"from":"malicious","to":"dummy", "amount":50}' --scope malicious,dummy {"c":400,"msg":"{"code":400,"message":"Bad Request","details":"message exhausted allowed resources (3030021)\nMessage processing exceeded maximum inline recursion depth of 4\n\n\n\n\n\n\n\n"}"} {"server":"10.160.11.221","port":8888,"path":"/v1/chain/push_transaction","postdata":{"ref_block_num":2805,"ref_block_prefix":2023605566,"expiration":"2017-11-23T08:50:51","scope":["dummy","malicious"],"read_scope":[],"messages":[{"code":"malicious","type":"notify","authorization":[],"data":"0000c09a3ae4a29100000000002fa54e3200000000000000"}],"signatures":[]}} |
➤ Dhanesh Valappil commented: I am not sure whether it is due to malicious contract, but node 0 stopped responding after executing the malicious contract's ping pong messaging between malicious and dummy contracts: The last update at eosd is: Log is kept at /home/testnet/STAT/build/tn_data_00/stderr.txt.STAT-167 Interestingly the same test works well on testnet node 1 |
➤ Kevin Heifner commented: Please retest with latest testnet. |
➤ Dhanesh Valappil commented: Specific test passes again: {"c":400,"msg":"{"code":400,"message":"Bad Request","details":"message exhausted allowed resources (3030021)\nMessage processing exceeded maximum inline recursion depth of 4\n\n\n\n\n\n\n\n"}"} {"server":"10.160.11.221","port":8888,"path":"/v1/chain/push_transaction","postdata":{"ref_block_num":1300,"ref_block_prefix":3760663211,"expiration":"2017-11-28T02:17:22","scope":["dummy","malicious"],"read_scope":[],"messages":[{"code":"malicious","type":"notify","authorization":[],"data":"0000c09a3ae4a29100000000002fa54e3200000000000000"}],"signatures":[]}} But most of the nodes died.
|
➤ Brian Johnson commented: Just pulling the execution command out to be easier to see(comment window may add return characters): eosc push message malicious notify '{"from":"malicious","to":"dummy", "amount":50}' --scope malicious,dummy |
therefore sleeping for a second. #704
➤ Kevin Heifner commented: config.ini additions for testnet: block-interval-seconds = 5 |
➤ Kevin Heifner commented: Running a 4 producer mesh network with 1 sec default block time. I ran tests/eosd_run_test.sh and the eosds all pegged the cpu and spiked memory usage. I was able to connect to the eosd and got this stack trace: (gdb) where |
➤ Bart Wyatt commented: That callstack really looks like the message_buffer was destroyed and then the a call to the io_service message pump still wrote to it. Given that the message_buffer is owned by the connection and we have already seen issues where the connection is destroyed while asio calls are outstanding, this doesn't surprise me. We should probably audit all the lifetimes in that code asap. At the very least, the handler lambda should manage the lifetime of the buffers passed into async_read_some. The nuclear option is to have the lambda capture a shared_ptr to the connection but as we have no hard guarantees about when that lambda fires it makes the lifetime of our connection object harder for us to track. So, first option is to see if we can get a handle on all the things that may be used outside the lifetime of the connection. We fixed many yesterday, hopefully the message_buffer is part of a short list. |
➤ Kevin Heifner commented: With latest from Paul on morning of 11/30/2017. #0 0x00007fa5077c8377 in __libc_recvmsg (fd=17, msg=0x7ffdece2a470, flags=0) at ../sysdeps/unix/sysv/linux/recvmsg.c:28 |
➤ Kevin Heifner commented: I think we have a fix for the memory corruption issue. Creating PR for it now. Chasing a spam message send now. |
➤ Corey Lederer commented: [~dhanesh.balakrishnan] Can you please re-test with the examples you gave above? We believe this should be fixed. |
➤ Corey Lederer commented: Moving to finished, this was moved to closed-withdrawn by mistake. |
"Your email address [email protected] doesn't have access to blockone.atlassian.net could you add access to me ? |
@chenlian2015: blockone.atlassian.net is a private repository which is not available to the public. |
…r-safety stat 167 gh 704 read buffer safety (reapplied on noon branch)
I have developed two contracts to simulate this scenario:
The source code is as follows:
When the transaction happens, eosd will stop responding and wont be able to generate blocks anymore.
┆Attachments: CMakeLists.txt | CMakeLists (4643a848-7a02-44bd-8214-f2d1f6d6c6d8).txt | dummy.abi | dummy.cpp | dummy.hpp | malicious.abi | malicious.cpp | malicious.hpp
The text was updated successfully, but these errors were encountered: