-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Channel#try_send
, a non-blocking Channel#send
#12694
Add Channel#try_send
, a non-blocking Channel#send
#12694
Conversation
I think it might be good to add this behavioral variant. But the name |
I think instead we should change the code to avoid the to_a call. For example there's no need to call to_a if the Indexable has a single element. |
I tried to optimize this a bit, but the main issue is that even with the optimization the code is translated to create a I think the right thing to do is, yes, to introduce some form of In my mind |
I'm also surprised to learn that a Then we can optimize for the case of 1 select action. In the end it will be as efficient as It's more work for the compiler/us, but it's work that only needs to be done once by us, and not put the burden on developers. |
Also it seems an Indexable is passed to select_impl. In the actual implementation that's a tuple. But that's not good if we need to sort the values. So I would change the implementation to use a Static Array and pass a slice to the method. Then the slice is mutable so the data can be sorted in-place. And maybe then the data can also be structs to avoid memory allocations. But I see some pointers being passed around, so it's a tricky refactor. But even though it'll take some time to get it right, it's probably the best thing to do. |
Considering that a very common use case is to wrap the select in a loop, I could see a second variant of |
Channel#try_send
, a non-blocking Channel#send
Co-authored-by: Sijawusz Pur Rahnama <[email protected]>
Hi @carlhoerberg , we'd like to move forward with this. What do you think of straight-shoota's observations? Do you want us to take it to completion? Thanks! |
Really? Could you show an equivalent API for Go? And if Go doesn't have one, why do we need one? I think it might be better to fix |
Co-authored-by: Johannes Müller <[email protected]>
Co-authored-by: Johannes Müller <[email protected]>
Co-authored-by: Johannes Müller <[email protected]>
Isnt this less verbose than any select, fixed or not? |
I guess that's right. My only concern is that eventually if we fix |
The implementation seems correct, but I worry about increasing Channel's API. It's easy for things go wrong and not having a single code for sending things through the channel seems like a bad idea. In this case the whole method is blocking so it should be safe, but in general things like Lines 453 to 457 in 1b93218
We have I also wonder if people would expect to write If From what I see, it seems that the culprit of the Lines 426 to 431 in 1b93218
Did someone check if doing a special case to use a If that is not enough, in order to keep the API as today we could have a A bonus of either last approaches is that will also apply to |
@bcardiff Sounds like a good assessment. 👍 Optimizing Thus the lengthy form with a single |
Moving the proposed implementation to ChannelAction helper in case there is a single channel on the select sounds a good design actually. It still has the downsides of very different implementations but at least the API is kept stable. Would this be a preferred approach? |
I was thinking that something like master...bcardiff:crystal:select-optimizations would avoid the array heap allocation, and can will cover single and dual action selects. WDYT? Essentially it applies the following logic to keep a tuple of ops when possible for the locks. def f(*ops)
f_impl ops
end
def f(ops : Indexable)
f_impl ops
end
def f_impl(ops : Indexable)
if ops.is_a?(Tuple) && ops.size == 1
f_impl_with_locks(ops, ops)
elsif ops.is_a?(Tuple) && ops.size == 2
ops0 = ops.fetch(0, nil).not_nil!
ops1 = ops.fetch(1, nil).not_nil!
case (ops0 <=> ops1)
when 0
f_impl_with_locks(ops, {ops0})
when 1
f_impl_with_locks(ops, {ops1, ops0})
when -1
f_impl_with_locks(ops, {ops0, ops1})
else
raise "unreachable"
end
else
ops_locks = ops
.to_a
.uniq!
.sort!
f_impl_with_locks(ops, ops_locks)
end
end
def f_impl_with_locks(ops : Indexable, ops_locks)
puts({typeof(ops), typeof(ops_locks)})
end
f 1 # => {Tuple(Int32) , Tuple(Int32)}
f 1, 2 # => {Tuple(Int32, Int32) , Tuple(Int32, Int32)}
f 1, 2, 3 # => {Tuple(Int32, Int32, Int32), Array(Int32)}
f [1, 2] # => {Array(Int32) , Array(Int32)} The runtime operations for the non blocking send is basically the proposal of this PR, so I expect it to work as efficient. |
I think it's easier than that. We can just turn the tuple into a I have a patch for that ready that goes on top of #12814. It would work without it, but I'm batching it avoid conflicts. On top of master the simplified diff would be this: @@ -423,8 +423,12 @@ class Channel(T)
private def self.select_impl(ops : Indexable(SelectAction), non_blocking)
# Sort the operations by the channel they contain
# This is to avoid deadlocks between concurrent `select` calls
- ops_locks = ops
- .to_a
+ if ops.is_a?(Tuple)
+ ops_locks = ops.to_static_array
+ else
+ ops_locks = ops.to_a
+ end
+ ops_locks
.uniq!(&.lock_object_id)
.sort_by!(&.lock_object_id) The implementation of |
Something I wanted to avoid is to have a union in ops_locks, so there is no multi-dispatch on each iteration.
I was wrong, we still have the allocation of the |
Okay, yeah multi dispatch is indeed an issue. But we can't really prevent that on scale, unless we multiply all possible combinations. For |
Actually, your implementation doesn't avoid multi dispatch either, does it? The type of Consequently: f 1 # => {Tuple(Int32) , Tuple(Int32)}
f 1, 2u32 # => {Tuple(Int32, UInt32), Tuple(Int32 | UInt32, Int32 | UInt32)}
f 1, 2u32, 3 # => {Tuple(Int32, UInt32, Int32), Array(Int32 | UInt32)}
f [1, 2u32] # => {Array(Int32 | UInt32), Array(Int32 | UInt32)}
To avoid those union types, we need to switch over the tuple size at compile time. This should work with macros: def f_impl(ops : Tuple(*T)) forall T
{% if T.size == 1 %}
f_impl_with_locks(ops, ops)
{% elsif T.size == 2 %}
case (ops[0] <=> ops[1])
when 0
f_impl_with_locks(ops, {ops[0]})
when 1
f_impl_with_locks(ops, {ops[1], ops[0]})
when -1
f_impl_with_locks(ops, ops)
else
raise "unreachable"
end
{% else %}
f_impl(ops.to_a)
{% end %}
end |
I suppose we could also look into alternative solutions to maybe avoid sorting altogether. Considering that the size of |
I was thinking of preventing the union in add the following overloads and a def self.non_blocking_select(op1 : SelectAction)
select_impl_with_locks({op1}, {op1}, true)
end
def self.non_blocking_select(op1 : SelectAction, op2 : SelectAction)
case (op1 <=> op2)
when 0
select_impl_with_locks({op1, op2}, {op1}, true)
when 1
select_impl_with_locks({op1, op2}, {op2, op1}, true)
when -1
select_impl_with_locks({op1, op2}, {op1, op2}, true)
else
raise "unreachable"
end
end With that I think we prevent union in many parts and cover single and dual select actions |
I don't follow that latest example. Where would those methods be called? Looks like they are overloads of the splat variant of |
I thought the compiler would generate those calls. Either way doing a dispatch on runtime depending on the indexable size to a method that will avoid unions in the ops_locks can still be done. That's the core part of the idea. |
Yeah, sure that can be done. It's only feasible for very small sizes, though. Not even sure we would want to go higher than 2 or maybe 3. |
I honestly wouldn't worry about multidispatch. We have LLVM here. Here's an example: struct Int32
def foo
self
end
end
class String
def foo
bytesize
end
end
def sum(values : Union(Int32, String)[2])
values.sum(&.foo)
end
fun my_awesome_function : Int32
values = uninitialized Union(Int32, String)[2]
values[0] = 1
values[1] = "hello"
sum(values)
end
puts my_awesome_function We are initializing a static array whose type is a union. We put an int in the first position and a string in the second one. Then we "sum" the values, which would in theory result in multidispatch. Compile the above program with define i32 @my_awesome_function() local_unnamed_addr #1 !dbg !25766 {
exit.1.i:
ret i32 6, !dbg !25767
} LLVM figured out what was in each position and completely avoided the multidispatch. Not only that, it computed everything at compile-time. |
@asterite I agree on not worrying too much about multi dispatch. At least not at this stage But this kind of optimization you're showing here won't be possible in |
Unrelated to the comment above (I'll reply to it soon), look at this: So Go decides to implement different I think we could do the same. That is, instead of introducing That way, at least for this issue, there's no need to optimize the general case. |
This is a recap and summary of the current situation with this PR. From the performance PoV, with #12814 we optimized part of it: the creation of an array from actions. What’s missing is to avoid the creation of actions in the heap. The goal should be to have From a stdlib and language design PoV, this PR is missing what would be an action related to the method, or a better place for it so it’s not being used as part of a Therefore, I'm closing it for now. |
Instead of having to write
and paying the
to_a
price inselect_impl
when wanting to send to aChannel
non blocking.In one of our apps we use this pattern a lot:
So the
flow_change
channel may or may not have receivers.