-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JDC does not receive shares anymore from Translator if two new blocks are found in few seconds #920
Comments
I pushed a timeout as a temporary fix for this, but a more robust solution to this has to be found. |
Hi @GitGab19, how would I reproduce the issue? |
We are working on adding a MG test which will be able to reproduce this case. |
hey @xyephy what's your status on this? we have this marked as high priority on our Project Board (milestone 1.0.2), so in case you're not able to keep going, I'll take this over please let me know if you have already done some progress that could accelerate my efforts here |
@GitGab19 how is is that share from a previous job (e.g.: I have the impression that you just wanted to highlight that everything is working as expected up until this point in the message flow, but I want to be sure. |
if we realize this will take too much time/effort, we should consider moving it to milestone 1.0.3 |
Today I had a call with @rrybarczyk + @Shourya742 where we deep-dived into this issue and started drafting a visual diagram with the message flow of this issue (this visual diagram should be seen as a WIP and might contain inaccuracies). We left the call feeling quite confident about our progress, but I'm revisiting our diagram a few hours later with fresh eyes and I now I realize we need to do a sanity check on some of the assumptions we made during the call. What really stands out to me right now is that we assumed that a However, if we attentively read the comments on JDC's stratum/roles/jd-client/src/main.rs Lines 69 to 94 in 7841021
This is also connected to the original issue description by @GitGab19 when he said:
This suggests that the root cause is related to So our assumption that a And I don't know if this is a feature or a bug. We will definitely need more calls to deep dive into this issue and refine the visual diagram before we even start writing MG tests or thinking about a long-term bugfix. And we should definitely move this issue away from Note: SV2 message ordering is a very dense topic. We are dealing with scenarios of distributed computing happening across 5 different entities, each doing fundamentally different things. It is strategically important to have frequent collective exercises where we go through the message flow while regularly doing sanity checks on each other. These exercises should allow SRI contributors to get a solid foundation on SV2 message flows and also spread/decentralize this knowledge across the community. |
This is definitely a feature you want to start mining on a job as soon as you know the job. You assume that the pool is going to accept the job (that is what it should always happen if everything works). See also my comment here stratum-mining/sv2-spec#80 (reply in thread) |
We already discussed it during last "diagramming" call, but I write it here just for the record. I wanted to hightlight that, because it seems that since the job to which the |
This PR addresses issue stratum-mining#920, where JDC stops receiving valid shares from the Translator when two new blocks are found within a few seconds. The problem occurs due to a race condition in the handling of SetCustomMiningJob messages, leading to the JDC not registering job IDs properly.
While working on #901, I noticed that there is a particular case in which JDC stops receiving valid shares from Translator. So it does not send them anymore to pool.
This is the flow which causes this issue:
After this, no more valid shares are received from JDC, even if they are found by Translator.
I started to dig into it, and I'm pretty sure the point where JDC gets stuck is in downstream.rs.
The reason is that
UpstreamMiningNode::get_job_id
never return in this case (because there's a loop in there) since the job_id is not inserted through this line. The reason is that for that specific job, a SetCustomMiningJob1 was not sent (because in the meantime a new prev_hash and a new template has arrived, look at the messages flow).The text was updated successfully, but these errors were encountered: