Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[i2c,rtl] Predict target clock stretching in HOST mode #21813

Conversation

hcallahan-lowrisc
Copy link
Contributor

@hcallahan-lowrisc hcallahan-lowrisc commented Mar 4, 2024

From the commit message :

Previously, the cycle counter for any 'thigh' state would only decrement once
the FSM had observed scl_i = 1'b1 on it's input. When the FSM releases SCL to
create the clock pulse, a minimum of 3 cycles is required to observe this effect
on it's input. (1-cycle output flop, 2-cycle input synchronizer)
This change allows the HOST-mode state machine to proceed if it does not
observe SCL being stretched 4 cycles after releasing SCL to try and create the
next clock pulse. This removes the 3-cycle delay from every 'thigh' state, and
in the absence of clock-stretching, brings the performance of the block in-line
with a user's calculations based on the given timing parameters.
This imposes a minimum 'thigh' of 4 cycles, which may limit the possible
performance of the block when the frequency of clk_i is not significantly
greater than scl.

This approach seemed the simplest way to achieve the intended result, but one cost is the minimum 'thigh' of 4 as described above. I could not come up with a solution that avoided this without greatly increasing complexity, and I'm not even sure one is possible. If we advance on too quickly and reach the "next" tlow, the FSM drives SCL low again, and at this point if the target is stretching we cannot distinguish it from our own driver.

There are some DV gremlins from this change that I have still to resolve (similar to #21765). I am again proposing to fix these up as a follow-up item.

Testing the performance with this change + that from #21765, I see clock frequencies within the 4-cycle possible float as a result of the calculation tlow + t_r + thigh + t_f. This is as good as we can do without stacking the remainders of the quantization calcs, which may be possible in some cases to squeeze out an extra cycle or two.

Closes #18962
Goes toward #18492

hw/ip/i2c/rtl/i2c_pkg.sv Outdated Show resolved Hide resolved
@@ -85,6 +85,8 @@ module i2c_fsm import i2c_pkg::*;
logic load_tcount; // indicates counter must be loaded
logic [31:0] stretch_idle_cnt; // counter for clock being stretched by target
// or clock idle by host.

// This bit is active when the FSM is in a state where a TARGET might be trying to stretch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: We might want to add a little more so there is no confusion about the role this IP serves. :) For example...

Suggested change
// This bit is active when the FSM is in a state where a TARGET might be trying to stretch
// This bit is active when the FSM is in a state where a TARGET might be trying to stretch the
// clock, preventing the CONTROLLER FSM from continuing.

...or "host FSM," if you prefer. The i2c and SMBus specs actually picked "controller" for this role, though. In SMBus, a "host" is a special kind of controller.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM, I'll do a quick review of the spec's for terminology and try to align with them.

Copy link
Contributor

@andreaskurth andreaskurth Mar 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with keeping terminology aligned with the spec. Please don't extend your terminology alignment efforts to beyond the things you're changing, though, because we may have to go through the entire spec and change terminology soon anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just did a quick flick through the specs, I couldn't see where the I2C spec uses the 'controller' term? It seems to just be Master/Slave. Unless I'm missing something.
Definitely in the medium term we need to update the docs to be unambiguous for both applications.

Copy link
Contributor

@a-will a-will Mar 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't have the latest with the cleaning of sensitive terms, hehe. Make sure you're looking at 7.0 (or v.7, depending on which moniker you are looking at).

They did a sweep to align with I3C.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes of course. We still link to the Rev 6. version in the docs. Yet another thing to update!

hw/ip/i2c/rtl/i2c_fsm.sv Outdated Show resolved Hide resolved
Copy link
Contributor

@marnovandermaas marnovandermaas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making these changes. I agree with Alex that it would be nice to remove the second state variable if possible. Other than that just a few minor comments.

tcount_d = tcount_q - 1'b1;
end else begin
tcount_d = tcount_q; // pause timer if clock is stretched
tcount_d = tcount_q;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need this else statement? It only covers the case when host and target mode are both disabled in which case we are in the Idle state anyways.

hw/ip/i2c/rtl/i2c_fsm.sv Outdated Show resolved Hide resolved
@andreaskurth
Copy link
Contributor

Re DV failures: We should keep the smoke test+seed run by CI green. Currently lc_ctrl_jtag_smoke and chip_sw_i2c_host_tx_rx fail. Did you have a chance to debug these, @hcallahan-lowrisc?

if (!rst_ni) begin
stretch_predict_cnt_expired <= 1'b0;
end else begin
if (stretch_idle_cnt == 32'd3) begin
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, tClockPulse includes the rise time, so we need to allow for that as well, right?

Suggested change
if (stretch_idle_cnt == 32'd3) begin
if (stretch_idle_cnt == (32'd3 + t_r_i)) begin

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I agree with you, but I need to think about it a bit more to be sure. That would imply that we document a minimum
value thigh >= t_r + 4 to be safe I think. As before, probably only a downside for very low clk_i input frequencies.

Copy link
Contributor

@a-will a-will Mar 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, no, it's still thigh >= 4. Remember, tClockPulse already includes t_r as well. The condition you were satisfying was that the timeout happens before we could possibly change out of the state. In other words...

tStretchCheck < tClockPulse
-------------------------------------
      t_r + 3 < t_r + t_high
-------------------------------------
            3 < t_high

The value 3 comes from the round-trip delay through flops. So, in prose, we wait until the time when we must have seen a 1, and if it happens then, we continue to count out the normal t_high, keeping in mind that we already counted the cycles of delay.

If we don't see a 1, then we wait until we actually see it, then we count out the full t_high to be safe. Technically, this could probably be reduced by the 2 cycles of input delay.

So, t_high must be long enough to accommodate only that round-trip delay. Otherwise, we risk jumping to the next state without ever actually seeing the clock go high.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The value 3 comes from the round-trip delay through flops.

Ah, but I forgot about the glitch filter, which would add yet more delay to the input path. The predictor might expire early, then. If we don't handle it here, the core will still work, though--It just means this fast path might end up rejecting its speculation every time.

@hcallahan-lowrisc hcallahan-lowrisc force-pushed the i2c_predict_clock_stretching branch 4 times, most recently from da9327e to 223e6a4 Compare March 7, 2024 18:13
Copy link
Contributor

@a-will a-will left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

(modulo a lint error, hehe)

@hcallahan-lowrisc hcallahan-lowrisc force-pushed the i2c_predict_clock_stretching branch 2 times, most recently from 41b117b to 3e171f0 Compare March 7, 2024 19:27
@marnovandermaas
Copy link
Contributor

Kokoro seems unhappy about:
E ARITH_CONTEXT: i2c_fsm.sv:227 Bitlength of arithmetic operation '16'd3 + t_r_i' is self-determined in this context

I guess it means that the 16' is unnecessary because t_r_i is already 16 bits?

Copy link
Contributor

@marnovandermaas marnovandermaas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes make sense to me. Bar the Kokoro Lint error.

@andreaskurth
Copy link
Contributor

Rebased on master to use the first commit from the now merged PR #21765 and clean failing CI checks (which should have been fixed on master in the meantime).

@andreaskurth
Copy link
Contributor

The failing CW310 tests are phony (sourceforge.net problem). The failing chip_sw_i2c_host_tx_rx test, which is part of smoke tests, is real and happens before and after the rebase.

@hcallahan-lowrisc
Copy link
Contributor Author

The failure of the chip_sw_i2c_host_tx_rx test was due to the monitor losing it's lock on the bus again, similar to failures initially encountered in #21765 and patched to be rethought later as part of closing #21887. This time however we were losing the lock around the ACK bit.
I've weakened the checks to remove the setup time here, which fixes that particular test. Assuming the remaining test suite also passes, I think it may be best to expand #21887 to cover these changes too.

@andreaskurth
Copy link
Contributor

I've weakened the checks to remove the setup time here, which fixes that particular test. Assuming the remaining test suite also passes, I think it may be best to expand #21887 to cover these changes too.

I'm OK with that; however now the i2c_host_smoke test fails, apparently due to an RTL assertion that fails: (i2c_fsm.sv:1423) [ASSERT FAILED] SclInputGlitch_A.

Previously, the cycle counter for any 'thigh' state would only decrement once
the FSM had observed scl_i = 1'b1 on it's input. When the FSM releases SCL to
create the clock pulse, a minimum of 3 cycles is required to observe this effect
on it's input. (1-cycle output flop, 2-cycle input synchronizer)

This change allows the HOST-mode state machine to proceed if it does not
observe SCL being stretched (4 + t_r) cycles after releasing SCL to try and
create the next clock pulse. This removes the 3-cycle delay from every 'thigh'
state, and in the absence of clock-stretching, brings the performance of the
block in-line with a user's calculations based on the given timing parameters.

This imposes a minimum 'thigh' of 4 cycles, which may limit the possible
performance of the block when the frequency of clk_i is not significantly
greater than scl.

Signed-off-by: Harry Callahan <[email protected]>
Copy link
Contributor

@andreaskurth andreaskurth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for this work, @hcallahan-lowrisc. If CI passes (modulo the currently known unrelated problems), please go ahead with merging this PR.

@hcallahan-lowrisc
Copy link
Contributor Author

CI failures are known-issues, and are unrelated to this PR. Hence merging.

@hcallahan-lowrisc hcallahan-lowrisc merged commit 689a163 into lowRISC:master Mar 21, 2024
27 of 32 checks passed
@hcallahan-lowrisc hcallahan-lowrisc deleted the i2c_predict_clock_stretching branch March 22, 2024 13:54
hcallahan-lowrisc added a commit to hcallahan-lowrisc/opentitan that referenced this pull request Mar 27, 2024
This is no longer necessary now that lowRISC#21813 has been merged. However, due
to the 'PERFTHRESHOLD = 0.80' fudge-factor in comparing the exp/obs SCL
period in this vseq, the check still passed.
This perf check should be made more stringent in the future.
hcallahan-lowrisc added a commit to hcallahan-lowrisc/opentitan that referenced this pull request Mar 27, 2024
This is no longer necessary now that lowRISC#21813 has been merged. However, due
to the 'PERFTHRESHOLD = 0.80' fudge-factor in comparing the exp/obs SCL
period in this vseq, the check still passed.
This perf check should be made more stringent in the future.

Signed-off-by: Harry Callahan <[email protected]>
hcallahan-lowrisc added a commit that referenced this pull request Mar 28, 2024
This is no longer necessary now that #21813 has been merged. However, due
to the 'PERFTHRESHOLD = 0.80' fudge-factor in comparing the exp/obs SCL
period in this vseq, the check still passed.
This perf check should be made more stringent in the future.

Signed-off-by: Harry Callahan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[i2c,rtl] Clock stretch causes timing mismatch
5 participants