Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gen_udp:send execution time increases with caller's queue length #6455

Closed
gonzalobf opened this issue Nov 9, 2022 · 6 comments
Closed

gen_udp:send execution time increases with caller's queue length #6455

gonzalobf opened this issue Nov 9, 2022 · 6 comments
Assignees
Labels
bug Issue is reported as a bug in progress team:PS Assigned to OTP team PS team:VM Assigned to OTP team VM
Milestone

Comments

@gonzalobf
Copy link
Contributor

gonzalobf commented Nov 9, 2022

Describe the bug
gen_udp:send gets slowed down by the message queue that the caller has.

I suspect the problem is inside the gen_udp:send implementation which calls to erlang:port_command and waits in a receive for the completion in this line. I don't think it puts any mark in the queue so the receive needs to do a full queue scan looking for the inet_reply.

We have been hit by this issue a few times in production recently. We have a process with the sole responsibility of receiving messages and forwarding them to a udp socket that built up a queue of 2M messages. At some point, the forwarder was getting messages at a regular rate but the queue wasn't going down. We found that the clients weren't receiving messages at the usual rate. In the server side, we found the process was spending all its time in prim_inet:do_sendto so we thought it was stuck in gen_udp:send, but it was actually processing them very slowly. Looking at my benchmark, it would take approximately 50ms to process each message with a queue of 2M as we had.

To Reproduce
I wrote a benchmark that reproduces the issue.

Expected behavior
I would expect gen_udp:send to take constant time or not be influenced so much by the caller's queue size.

Affected versions
I checked in OTP25.1 and master.

Additional context
I opened a related topic in erlang forums before opening this one where I posted the code and the debug I did in more details.

@gonzalobf gonzalobf added the bug Issue is reported as a bug label Nov 9, 2022
@RaimoNiskanen RaimoNiskanen added the team:PS Assigned to OTP team PS label Nov 9, 2022
@Polexy
Copy link

Polexy commented Nov 16, 2022

I think we got same issue for our app.
We use syslog_logger which uses gen_udp:send. we do experiments with cpu, and found that after increased it to ~85% it's start stocking on prim_inet:do_sendto with an endlessly growing queue even after reducing the load

[{registered_name,syslog_logger},
 {current_function,{prim_inet,do_sendto,4}},
 {initial_call,{proc_lib,init_p,5}},
 {status,running},
 {message_queue_len,121553},
 {links,[#Port<0.8>,<0.1031.0>]},
 {dictionary,[{'$initial_call',{syslog_logger,init,1}},
              {'$ancestors',[<0.1031.0>,<0.1030.0>]}]},
 {trap_exit,false},
 {error_handler,error_handler},
 {priority,normal},
 {group_leader,<0.1029.0>},
 {total_heap_size,7932638},
 {heap_size,4298223},
 {stack_size,28},
 {reductions,7802338929},
 {garbage_collection,[{max_heap_size,#{error_logger => true,kill => true,size => 0}},
                      {min_bin_vheap_size,46422},
                      {min_heap_size,233},
                      {fullsweep_after,65535},
                      {minor_gcs,2}]},
 {suspending,[]}]

we also collect msacc, top processes(https://gist.github.com/stolen/9a28ed9403c724541b0ee5fcd822613e), message queue.
maybe useful in attachments
stats.log

@KennethL
Copy link
Contributor

We know what the problem is (the receive in prim_inet:sendto) and have discussed an optimization.
The trick is to optimize the case when the sender have a long message_queue without slowing down the case when there is an empty or very short message queue. Receive from the port should (and can) be avoided.
This is not really a bug but a missing receive optimization.

@IngelaAndin IngelaAndin added the team:VM Assigned to OTP team VM label Nov 16, 2022
@gonzalobf
Copy link
Contributor Author

@KennethL Thank you for looking at this. I thought that maybe was a bug but you are right and enhancement might have been more appropriate.

Reading again the code of erlang:port_command, there is something that I don't totally understand and I was wondering if you could explain it:

Why does prim_inet needs to read the value from the queue instead just wait for erlang:port_command return value?

Looking at the underlying C function in erts_internal_port_command_3, it looks like it always returns true when the work is done so we could use that value as completion handler (and avoid putting the message in the callers queue ). Maybe I'm missing something obvious here.

Thank you :)

@gonzalobf
Copy link
Contributor Author

I ran the benchmark with socket as inet_backend and the problem goes away. There is not and increment in the execution time so we are going to use it as a workaround and try in production.

If anyone is interested in the numbers, I pasted the output at the end of the benchmark.

I didn't know about this option when I opened the bug so I think this option can be used as alternative. Maybe the bug can be closed but I'll let you to decide if still worth looking at it or you prefer to close it.

Thank you

RaimoNiskanen added a commit that referenced this issue Mar 16, 2023
* raimo/prim_inet-send-optimization/GH-6455:
  Use the 'local' option in term_to_binary
  Fail noisily for error between prim_inet and inet_drv
  Rework how caller is handled
  Fix whitebox test
  Pass a reference to port command for receive optimization of reply
@RaimoNiskanen RaimoNiskanen added this to the OTP-26.0-rc2 milestone Mar 17, 2023
@RaimoNiskanen
Copy link
Contributor

I have merged a solution into our 'master' branch for OTP-26.0-rc2 with internal ticket ID OTP-18520.
It utilizes the infamous receive reference optimization, and fixes all uses of port_command/* from prim_inet, so the optimization is for gen_{udp,tcp,sctp}:send/*.

Sadly enough there are more operations, such as recv/* that also could benefit from the same optimization, but I will leave that for the future...

@gonzalobf
Copy link
Contributor Author

@RaimoNiskanen Thank you :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue is reported as a bug in progress team:PS Assigned to OTP team PS team:VM Assigned to OTP team VM
Projects
None yet
Development

No branches or pull requests

6 participants