-
Notifications
You must be signed in to change notification settings - Fork 353
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache System hangs when configured with fewer ways than threads #165
Comments
The easiest fix is to make sure there aren't more threads than cache ways. :) But, yes, you are correct about the deadlock potential in this configuration. It's fairly easy to reproduce with the existing design by setting the number of cache ways to 1 (effectively making the cache direct mapped). The lockup is generally unlikely, but when it does happen, it's catastrophic, so the processor needs to handle this properly. I talk about the problem in the section on 'livelock' here: https://jbush001.github.io/2014/07/04/messy-details.html I discuss a few possible solutions in that post. Another simpler (albeit hackier) fix would be to have logic to detect the lockup and resolve it. Since it is infrequent, this should have negligible performance impact. For example, the writeback stage could have a counter of the number of rollbacks from the memory pipeline. When threads successfully retires an instruction, the counter would be reset to zero. If the counter reaches some threshold, it would temporarily enable only one thread to issue instructions. I know some hardware designs have 'starvation counters' to handle degenerate cases with dynamic scheduling, which is not exactly the same thing, but similar. An even simpler approach would be to make scheduling of threads be random instead of round-robin. Assuming a decent PRNG, there would be a chance each cycle that the next scheduled thread would be the most recently filled line and would break the livelock. |
But the constraint that cache_ways >= threads doesn't necessarily seem unreasonable. I think it would be necessary to implement the solutions above if this was suboptimal for a valid configuration. Given that the thread count is generally limited, this doesn't currently seem to be the case. |
Image we got the following scenario: The L1 data cache is configured to have less ways than threads_per_core. All threads are executing a data load with an address leading to the same cache set with different tags. The requested tags can be found in the L2 cache.
The L1 cache system will get in to a dead lock. The first thread generates a miss, get rolled back and requests a fill from the L2 cache. The corresponding response is written into L1 cache after few cycles. The same happens for the remaining threads. So when the first gets scheduled, the load will again produce a miss, because the correct cache line has already been replaced before it could be read once.
Any idea how to fixed this problem? Sure another replacement strategy would fix this specific cause, but I think it is more general problem.
The text was updated successfully, but these errors were encountered: