-
Notifications
You must be signed in to change notification settings - Fork 778
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[i2c,rtl] Predict target clock stretching in HOST mode #21813
[i2c,rtl] Predict target clock stretching in HOST mode #21813
Conversation
694967d
to
49e62b9
Compare
hw/ip/i2c/rtl/i2c_fsm.sv
Outdated
@@ -85,6 +85,8 @@ module i2c_fsm import i2c_pkg::*; | |||
logic load_tcount; // indicates counter must be loaded | |||
logic [31:0] stretch_idle_cnt; // counter for clock being stretched by target | |||
// or clock idle by host. | |||
|
|||
// This bit is active when the FSM is in a state where a TARGET might be trying to stretch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: We might want to add a little more so there is no confusion about the role this IP serves. :) For example...
// This bit is active when the FSM is in a state where a TARGET might be trying to stretch | |
// This bit is active when the FSM is in a state where a TARGET might be trying to stretch the | |
// clock, preventing the CONTROLLER FSM from continuing. |
...or "host FSM," if you prefer. The i2c and SMBus specs actually picked "controller" for this role, though. In SMBus, a "host" is a special kind of controller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM, I'll do a quick review of the spec's for terminology and try to align with them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with keeping terminology aligned with the spec. Please don't extend your terminology alignment efforts to beyond the things you're changing, though, because we may have to go through the entire spec and change terminology soon anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just did a quick flick through the specs, I couldn't see where the I2C spec uses the 'controller' term? It seems to just be Master/Slave. Unless I'm missing something.
Definitely in the medium term we need to update the docs to be unambiguous for both applications.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't have the latest with the cleaning of sensitive terms, hehe. Make sure you're looking at 7.0 (or v.7, depending on which moniker you are looking at).
They did a sweep to align with I3C.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes of course. We still link to the Rev 6. version in the docs. Yet another thing to update!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making these changes. I agree with Alex that it would be nice to remove the second state variable if possible. Other than that just a few minor comments.
hw/ip/i2c/rtl/i2c_fsm.sv
Outdated
tcount_d = tcount_q - 1'b1; | ||
end else begin | ||
tcount_d = tcount_q; // pause timer if clock is stretched | ||
tcount_d = tcount_q; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still need this else statement? It only covers the case when host and target mode are both disabled in which case we are in the Idle state anyways.
Re DV failures: We should keep the smoke test+seed run by CI green. Currently |
hw/ip/i2c/rtl/i2c_fsm.sv
Outdated
if (!rst_ni) begin | ||
stretch_predict_cnt_expired <= 1'b0; | ||
end else begin | ||
if (stretch_idle_cnt == 32'd3) begin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, tClockPulse includes the rise time, so we need to allow for that as well, right?
if (stretch_idle_cnt == 32'd3) begin | |
if (stretch_idle_cnt == (32'd3 + t_r_i)) begin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I agree with you, but I need to think about it a bit more to be sure. That would imply that we document a minimum
value thigh >= t_r + 4
to be safe I think. As before, probably only a downside for very low clk_i
input frequencies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, no, it's still thigh >= 4
. Remember, tClockPulse
already includes t_r
as well. The condition you were satisfying was that the timeout happens before we could possibly change out of the state. In other words...
tStretchCheck < tClockPulse
-------------------------------------
t_r + 3 < t_r + t_high
-------------------------------------
3 < t_high
The value 3 comes from the round-trip delay through flops. So, in prose, we wait until the time when we must have seen a 1, and if it happens then, we continue to count out the normal t_high, keeping in mind that we already counted the cycles of delay.
If we don't see a 1, then we wait until we actually see it, then we count out the full t_high to be safe. Technically, this could probably be reduced by the 2 cycles of input delay.
So, t_high must be long enough to accommodate only that round-trip delay. Otherwise, we risk jumping to the next state without ever actually seeing the clock go high.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The value 3 comes from the round-trip delay through flops.
Ah, but I forgot about the glitch filter, which would add yet more delay to the input path. The predictor might expire early, then. If we don't handle it here, the core will still work, though--It just means this fast path might end up rejecting its speculation every time.
da9327e
to
223e6a4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks!
(modulo a lint error, hehe)
41b117b
to
3e171f0
Compare
Kokoro seems unhappy about: I guess it means that the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes make sense to me. Bar the Kokoro Lint error.
3e171f0
to
fad9c33
Compare
fad9c33
to
fe427b2
Compare
Rebased on |
The failing CW310 tests are phony (sourceforge.net problem). The failing |
fe427b2
to
0c8672b
Compare
The failure of the |
I'm OK with that; however now the |
0c8672b
to
1deaee5
Compare
Previously, the cycle counter for any 'thigh' state would only decrement once the FSM had observed scl_i = 1'b1 on it's input. When the FSM releases SCL to create the clock pulse, a minimum of 3 cycles is required to observe this effect on it's input. (1-cycle output flop, 2-cycle input synchronizer) This change allows the HOST-mode state machine to proceed if it does not observe SCL being stretched (4 + t_r) cycles after releasing SCL to try and create the next clock pulse. This removes the 3-cycle delay from every 'thigh' state, and in the absence of clock-stretching, brings the performance of the block in-line with a user's calculations based on the given timing parameters. This imposes a minimum 'thigh' of 4 cycles, which may limit the possible performance of the block when the frequency of clk_i is not significantly greater than scl. Signed-off-by: Harry Callahan <[email protected]>
1deaee5
to
12ddea4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for this work, @hcallahan-lowrisc. If CI passes (modulo the currently known unrelated problems), please go ahead with merging this PR.
CI failures are known-issues, and are unrelated to this PR. Hence merging. |
This is no longer necessary now that lowRISC#21813 has been merged. However, due to the 'PERFTHRESHOLD = 0.80' fudge-factor in comparing the exp/obs SCL period in this vseq, the check still passed. This perf check should be made more stringent in the future.
This is no longer necessary now that lowRISC#21813 has been merged. However, due to the 'PERFTHRESHOLD = 0.80' fudge-factor in comparing the exp/obs SCL period in this vseq, the check still passed. This perf check should be made more stringent in the future. Signed-off-by: Harry Callahan <[email protected]>
This is no longer necessary now that #21813 has been merged. However, due to the 'PERFTHRESHOLD = 0.80' fudge-factor in comparing the exp/obs SCL period in this vseq, the check still passed. This perf check should be made more stringent in the future. Signed-off-by: Harry Callahan <[email protected]>
From the commit message :
This approach seemed the simplest way to achieve the intended result, but one cost is the minimum 'thigh' of 4 as described above. I could not come up with a solution that avoided this without greatly increasing complexity, and I'm not even sure one is possible. If we advance on too quickly and reach the "next" tlow, the FSM drives SCL low again, and at this point if the target is stretching we cannot distinguish it from our own driver.
There are some DV gremlins from this change that I have still to resolve (similar to #21765). I am again proposing to fix these up as a follow-up item.
Testing the performance with this change + that from #21765, I see clock frequencies within the 4-cycle possible float as a result of the calculation
tlow + t_r + thigh + t_f
. This is as good as we can do without stacking the remainders of the quantization calcs, which may be possible in some cases to squeeze out an extra cycle or two.Closes #18962
Goes toward #18492