Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HW Loop Rd after Wr error #189

Closed
yaronbe1 opened this issue Nov 3, 2019 · 24 comments
Closed

HW Loop Rd after Wr error #189

yaronbe1 opened this issue Nov 3, 2019 · 24 comments
Assignees
Labels
Component:RTL For issues in the RTL (e.g. for files in the rtl directory) PARAM:PULP_XPULP Issue depends on the PULP_XPULP parameter Status:Resolved Issue has been resolved, but closure is pending on git merge and/or issuer confirmation Type:Bug For bugs in the RTL, Documentation, Verification environment or Tool and Build system

Comments

@yaronbe1
Copy link

yaronbe1 commented Nov 3, 2019

running the following code we encounter the following issue:

              csrrwi  x1, 0x7c0, 22
              lp.endi  x0, __end_
              csrr  x1, 0x7c0

In the simulation results I see next error message that is worrying me:
UVM_ERROR /proj/dbm10l/denisg/ws_dbm10l_test/blk/ri5cy/verif/classes/pulpino_scoreboard_c.sv(467) @ 2813000: uvm_test_top.pp_env.pp_scoreboard [SPIKE_DBG] EX stage. Register #1 aka ra update mismatch. Expected value is 0x16, actual value is 0x0

Using csrrwi instruction test wanted to update HW loop #0 start address. then we follow with lp.end instruction to write to the end register. the error is that the first (start) write is not executed as seen in the waves:

a

It is a mix of Direct CSR writes to the start register and Loop Instruction writes to the loop end register.

We know from the lp.setup issues that these two instructions update registers in different instruction pipe stages. From the waves we see that the lp.end write of the end register is in the same cycle as the previous direct write of the start register, the end write wins (start write not executed).

Is it a reasonable use case that such a mix of loop setup instructions will be used, or is there a restriction not to use hybrid types (direct csr wr + loop inst wr) of writes when updating the HW loops registers in order to avoid this conflict?

@davideschiavone
Copy link
Contributor

I would say this is a reasonable way to do it so if that is the behavior of the HW, this is an HW bug.

However, the compiler should never use this scheme, thus only explicit SW assembly routines may generate such conditions.

I would say it is not an urgent HW bug but to be fixed.

Before confirming it, I would ask @eflamand and or @haugoug a confirmation about my compiler statements.

Thanks
Davide

@haugoug
Copy link
Contributor

haugoug commented Nov 5, 2019

That's not something the compiler will generate, he's never directly accessing HW loop CSRs. These CSRs are only accessed when saving/restoring interrupted context. So to me it would be reasonable to forbid such code if it simplifies the HW.

@Silabs-ArjanB Silabs-ArjanB added Type:Bug For bugs in the RTL, Documentation, Verification environment or Tool and Build system Component:RTL For issues in the RTL (e.g. for files in the rtl directory) PARAM:PULP_XPULP Issue depends on the PULP_XPULP parameter and removed possible_bug labels Jul 13, 2020
@davideschiavone
Copy link
Contributor

Dear @jm4rtin and @haugoug ,

shall we make the CSR of the HWLoop READ ONLY?

There is no meaning to me that SW writes them explicitly. It should be allowed only via hwloop instructions in my opinion.

If you guys agree I make the RTL change.

@eflamand
Copy link

eflamand commented Nov 17, 2020 via email

@eflamand
Copy link

eflamand commented Nov 17, 2020 via email

@davideschiavone
Copy link
Contributor

Agree with you @eflamand !

@gautschimi
Copy link
Contributor

I don't agree.
if an interrupt arrives during a hardware loop, you need to store those registers. and when you return to the code you need to restore them. you could argue that it is not necessary to safe and restore the registers because hardware loops are not used in interrupt routines. (which I would also not agree on)
but if you want to reschedule an interrupted thread on a different core, you need the possibility to restore the hardware loop setup as it was before taking an interrupt.

@davideschiavone
Copy link
Contributor

davideschiavone commented Nov 17, 2020

Very good point @gautschimi - here the discussion can extend further as if we have an operating system that schedule 2 threads, both using HW loops, then the HWLoop should be part of the contex switch as well.
@eflamand was not saying you cannot write to them, but that you can write to them with lp instructions. Writing explicitly to those CSRs is not equivalent from an RTL point of view

so your ISR would look like this:

//read hwloop
csrr x10, 0x800

...
sw x10, somewhere in the stack

your ISR

...

lw x10, somwhere in the stack
lp.start x0, x10

@davideschiavone
Copy link
Contributor

Actually I was too quick too, there is no lp.start and lp.end instruction with the register as source.
Shall we define them?
I see funct3 has 110 and 111 free

https://core-v-docs-verif-strat.readthedocs.io/projects/cv32e40p_um/en/latest/instruction_set_extensions.html#id3

@eflamand

@gautschimi
Copy link
Contributor

ok if you can restore the hwloop context with lp.start/end/count instructions it is ok.
I don't remember what the arguments of those instructions are. if it is a register good, if it's an immediate, its not sufficient to restore the values.

@davideschiavone
Copy link
Contributor

Pinging to the discussion @jeremybennett as well as he is working on the compiler side

@eflamand
Copy link

eflamand commented Nov 17, 2020 via email

@davideschiavone
Copy link
Contributor

Hi @eflamand , yes that's why I proposed to you, @jeremybennett , and @jm4rtin to add the register version of lp.start and lp.end. funct3 has 2 places free :)

@jeremybennett
Copy link
Contributor

The compiler would only ever generate loops using CSRs directly if that led to better code (size of speed). Is there any circumstance where this is the case.

The CORE-V GCC is a work in progress (this is the rewrite for modern GCC). At present hardware loop is only used to optimize block moves. In principle you can generate hardware loop for other loops, but that is not there yet (known hard part of GCC - ask the ARC developers!)

@davideschiavone
Copy link
Contributor

Hi @jeremybennett - I personally prefer only lp.start or lp.end instructions . But this does not matter too much. I would just remove redundancies.

Davide

@jm4rtin
Copy link
Member

jm4rtin commented Nov 18, 2020

Is there a reason the hardware loop can't be saved and restored as part of the context through a CSR read/write as long as it is not mixed with the HW loop instructions? What makes a CSR write to these registers not equivalent to using the instructions? Is the only difference the timing of the CSR write that creates this hazard?

Are you opposed to leaving it up to software developers to avoid this illegal sequence of instructions? In such a case do you prefer an illegal instruction exception instead? If so, yes it may be better to forbid writing to the CSRs from an RTL complexity standpoint. A register version of lp.start and lp.end would also allow for a larger address range than the immediate version. One thing to note is this change could potentially break existing context switch code (if any).

How does the hardware loop react to interrupts (or exceptions)? Does it automatically return to normal operation? Does it know to resume the hardware loop when returning with mret? Is it valid to use the hardware loop in an interrupt context? If so, does the interrupt function attribute (compiler question) take into account the hardware loop before it is used/a function is called?

I'm not opposed to adding the instructions if there is a good reason to do so. I think they will fit into the encoding space but there should be some clarification as to how context switching is supposed to work with the hardware loops.

@jeremybennett
Copy link
Contributor

@davideschiavone @eflamand We are happy to add a register version of these instructions. I'd like @jm4rtin to confirm the encoding - IIRC HW loop uses CUSTOM-3, so can keep the existing encoding.

We should ask @jessicamills (GNU Tools Project Lead), @ASintzoff (LLVM Project Lead) to comment. It might be useful to get opinion from @flip1995, since has has just completed implementation of HW Loop in the LLVM Integrated Assembler for RISC-V.

@flip1995
Copy link
Member

For the current state of the LLVM implementation, only a few lines (+tests) would have to be added for those two new instructions.

@jessicamills
Copy link

For the GNU Tools project this is fine and easily implemented in binutils-gdb. GCC is a work in progress, however, we can incorporate these changes into the work. We are happy to add the two new instructions upon confirmation and with further encoding details.

@davideschiavone
Copy link
Contributor

davideschiavone commented Nov 19, 2020

So I would like to have a meeting with @jm4rtin and @mp-17 (he designed most of the today HWLoop HW together with me) for a technical HW discussion and then give you feedbacks on new encoding.

Happy to hear that this does not effect too much the work on the compiler side

@davideschiavone
Copy link
Contributor

Hi @jm4rtin

Is there a reason the hardware loop can't be saved and restored as part of the context through a CSR read/write as long as it is not mixed with the HW loop instructions?

No. Indeed this can be done.

What makes a CSR write to these registers not equivalent to using the instructions? Is the only difference the timing of the CSR write that creates this hazard?

They are not equivalent from an RTL point of view. Thus, it's safer to remove such differences.

Are you opposed to leaving it up to software developers to avoid this illegal sequence of instructions? In such a case do you prefer an illegal instruction exception instead? If so, yes it may be better to forbid writing to the CSRs from an RTL complexity standpoint. A register version of lp.start and lp.end would also allow for a larger address range than the immediate version. One thing to note is this change could potentially break existing context switch code (if any).

In PULP we do not usually write those registers except in save context parts. This code should be changed. Asking @haugoug to add comments on this part.

How does the hardware loop react to interrupts (or exceptions)? Does it automatically return to normal operation? Does it know to resume the hardware loop when returning with mret?

This is possible yes. That's why we restricted the HWLoop from an HW point of view. To make it possible. Several bugs have been found with jumps inside the HWLoop with compressed instructions or misaligned ones when handling interrupts in the middle of the loop.

Is it valid to use the hardware loop in an interrupt context? If so, does the interrupt function attribute (compiler question) take into account the hardware loop before it is used/a function is called?

Yes from an HW point of view (the core does not care which code it is handling). From a SW point of view, I remember @haugoug disabled it during ISR code.

I'm not opposed to adding the instructions if there is a good reason to do so. I think they will fit into the encoding space but there should be some clarification as to how context switching is supposed to work with the hardware loops.

Yes, let's have a meeting about it

@mp-17
Copy link
Contributor

mp-17 commented Nov 19, 2020

Hello everyone,

We have two problems here:

  1. The redundancy of writing those registers with hw-loop instructions and csr writes
  2. These two methods write the CSR in two different pipe-stages

We can force both the instructions to write the CSR in the same pipeline stage, but we would keep the redundancy. We can also remove the lp. instructions and keep only the csr_writes to deal with the HW loop CSRs, but we would lose the flexibility of the PC-relative addressing.

@davideschiavone
Copy link
Contributor

davideschiavone commented Nov 19, 2020

Another point, the bug related to this issue is not present:


     208:	00000797          	auipc	a5,0x0
     20c:	00c78793          	addi	a5,a5,12 # 214 <hwloop_pc> //a5 is now 0x214
     210:	1de78793          	addi	a5,a5,478 //0x1DE - now a5 contains 0x214 + 0x1DE = 0x3F2
00000214 <hwloop_pc>:
     214:	0ef0007b          	lp.starti	x0,3f2 <_access+0x26> //this is indeed 0x214 + 0x1DE =  0x3F2
     218:	80002773          	csrr	a4,0x800 //reading after write - a4 contains the start address of the HWLOOP

At the end of the simulation a4 and a5 have the same value.

So we are using this issue to discuss the realted ISA changes, but the bug is not present in the RTL

@davideschiavone
Copy link
Contributor

Discussion continues in #598

@pascalgouedo pascalgouedo added the Status:Resolved Issue has been resolved, but closure is pending on git merge and/or issuer confirmation label Oct 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component:RTL For issues in the RTL (e.g. for files in the rtl directory) PARAM:PULP_XPULP Issue depends on the PULP_XPULP parameter Status:Resolved Issue has been resolved, but closure is pending on git merge and/or issuer confirmation Type:Bug For bugs in the RTL, Documentation, Verification environment or Tool and Build system
Projects
None yet
Development

No branches or pull requests