Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

target/riscv: OpenOCD fails with assert during running "reset" command #895

Merged
merged 1 commit into from
Aug 1, 2023

Conversation

kr-sc
Copy link
Contributor

@kr-sc kr-sc commented Jul 25, 2023

OpenOCD fails in the presence of inactive/unresponsive cores

I faced with case when inactive core returns 0 while reading dtmcontrol. This leads to failure on assert: "addrbits != 0" in "dbus_scan".

Also change "read_bits","poll_target" funcs to avoid a lot lines in logs

Change-Id: If852126755317789602b7372c5c5732183fff6c5

@kr-sc kr-sc changed the title target: OpenOCD fails with assert during running "reset" command target/riscv: OpenOCD fails with assert during running "reset" command Jul 25, 2023
Copy link
Collaborator

@timsifive timsifive left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At a glance this looks OK.

What hardware are you connecting to that implements 0.11? I didn't realize there were implementations beyond HiFive1.

What does "inactive" mean in this case? Is that core held in reset or something? And I assume that condition gets resolved after time, allowing examine() to succeed eventually.

@kr-sc
Copy link
Contributor Author

kr-sc commented Jul 26, 2023

What hardware are you connecting to that implements 0.11? I didn't realize there were implementations beyond HiFive1.

OpenOCD detect spec version using version field in dtmcs register, When it try to read this register, it get 0 value (because hart is inactive) and interpret it as version = 0 . According to spec, it means that hardware implement 0.11

What does "inactive" mean in this case? Is that core held in reset or something? And I assume that condition gets resolved after time, allowing examine() to succeed eventually.

Something like that, yes. Some harts in the system are turned off by default, unless a proper initialization procedure is executed. Once a hart is up (if user initiates the sequence), the examine will eventually succeeds

@timsifive
Copy link
Collaborator

Thanks for explaining. I think it would be simpler to check if dtmcs (erroneously called dtmcontrol) is 0 in riscv_examine(). That would cover both versions, and lets you share the code/check. Mostly I'm saying that because 0.11 is rarely used, we and we don't have as good test coverage for it. I'd rather not change it unless there's a compelling reason.

@JanMatCodasip
Copy link
Collaborator

I think it would be simpler to check if dtmcs (erroneously called dtmcontrol) is 0 in riscv_examine()

Actually, the checks for dtmcs (dtmcontrol) != 0 are already present on the top of examine() functions for both 0.11 and 0.13:

@kr-sc Are these existing checks insufficient to cover your case? Should they be extended or improved somehow?

@kr-sc
Copy link
Contributor Author

kr-sc commented Jul 27, 2023

Actually, the checks for dtmcs (dtmcontrol) != 0 are already present on the top of examine() functions for both 0.11 and 0.13:

Yes, but if I understand corretcly there are no any checks in assert_reset() function, for example. Even if target was not examined, OpenOCD try to reset and it leads to failure in assertion. So, as for me, it is better to replace this assertion with this check to avoid such situations for other functions where may not be such checks too

Copy link
Collaborator

@timsifive timsifive left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. These changes do make the code better, so let's go with them.

I've checked that everything still passes on HiFive1.

@@ -426,7 +426,10 @@ static dbus_status_t dbus_scan(struct target *target, uint16_t *address_in,
.in_value = in
};

assert(info->addrbits != 0);
if (info->addrbits == 0) {
LOG_ERROR("scan failed; addrbits=0");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
LOG_ERROR("scan failed; addrbits=0");
LOG_TARGET_ERROR(target, "Can't access DMI because addrbits=0.");

Copy link
Contributor Author

@kr-sc kr-sc Jul 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

.interrupt = 0
};
if (read_bits(target, &bits) != ERROR_OK) {
LOG_ERROR("read_bits fails");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need this generic error message, since read_bits() already provides a more specific error message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've decided to add those for consistency. Meaning, if we decide to refactor/change this code in the future, we can have a higher chance that at least some error message is printed. Excessive printing should not be a problem - since these kinds of messages indicate that something is terribly wrong anyway. I don't insist however, so if you think that it's better to remove them, I'll do it. Nevertheless, could you revise your comments?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are you being consistent with? Generally the approach is to print one error as specific as possible, ideally in the place that the error is first generated. Then the other layers can simply propagate the error up. There are exceptions where the caller has more information to add, but in this case the caller error message doesn't add any extra information.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just want to add more information in log (for example, we can understand what read_bits was called and failed in this function because of this error message), but if it's not necessary, I'll remove it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

.interrupt = 0
};
if (read_bits(target, &bits) != ERROR_OK) {
LOG_ERROR("read_bits fails");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove also.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -497,7 +497,10 @@ static dmi_status_t dmi_scan(struct target *target, uint32_t *address_in,
memset(in, 0, num_bytes);
memset(out, 0, num_bytes);

assert(info->abits != 0);
if (info->abits == 0) {
LOG_ERROR("scan failed; addrbits=0");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
LOG_ERROR("scan failed; addrbits=0");
LOG_TARGET_ERROR(target, "Can't access DMI because addrbits=0.");

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@kr-sc kr-sc force-pushed the kr-sc/reset-fails-with-assert branch 3 times, most recently from 158d569 to 76f273a Compare July 31, 2023 09:16
@timsifive
Copy link
Collaborator

These changes are failing checkpatch because of a merge from upstream. I believe if you rebase the changes on top of riscv, then that problem will be resolved.

1 similar comment
@timsifive
Copy link
Collaborator

These changes are failing checkpatch because of a merge from upstream. I believe if you rebase the changes on top of riscv, then that problem will be resolved.

OpenOCD fails in the presence of inactive/unresponsive cores

I faced with case when inactive core returns 0 while reading dtmcontrol.
This leads to failure on assert: "addrbits != 0" in "dbus_scan".

Also change "read_bits","poll_target" funcs to avoid a lot lines in logs

Change-Id: If852126755317789602b7372c5c5732183fff6c5
Signed-off-by: Kirill Radkin <[email protected]>
@kr-sc kr-sc force-pushed the kr-sc/reset-fails-with-assert branch from 76f273a to 16e4096 Compare July 31, 2023 16:28
@kr-sc
Copy link
Contributor Author

kr-sc commented Jul 31, 2023

These changes are failing checkpatch because of a merge from upstream. I believe if you rebase the changes on top of riscv, then that problem will be resolved.

Rebase done

@MarekVCodasip
Copy link
Collaborator

I checked this visually. It looks alright.

@timsifive timsifive merged commit ee5c5c2 into riscv-collab:riscv Aug 1, 2023
en-sc pushed a commit to en-sc/riscv-openocd that referenced this pull request Oct 19, 2023
…ommand

OpenOCD fails in the presence of inactive/unresponsive cores

I faced with case when inactive core returns 0 while reading dtmcontrol.
This leads to failure on assert: "addrbits != 0" in "dbus_scan".

Also change "read_bits","poll_target" funcs to avoid a lot lines in logs

Original commit: riscv-collab#895
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants