Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some of the Xpulp instructions use opcodes that are either reserved opcodes or standard opcodes #452

Closed
1 task
Silabs-ArjanB opened this issue Aug 19, 2020 · 32 comments
Assignees
Labels
Component:RTL For issues in the RTL (e.g. for files in the rtl directory) PARAM:PULP_XPULP Issue depends on the PULP_XPULP parameter Status:Resolved Issue has been resolved, but closure is pending on git merge and/or issuer confirmation Type:Bug For bugs in the RTL, Documentation, Verification environment or Tool and Build system

Comments

@Silabs-ArjanB
Copy link
Contributor

Some of the Xpulp opcodes have been chosen such that upstreaming of the related tool changes will become impossible unless we change some opcodes. Ideally only 'custom opcodes' are used for Xpulp extensions.

First question to answer:

  • Are there enough custom opcodes to host all the Xpulp extensions that are now spread in the reserved and standard opcodes?

Craig Blackmore's original message:

As per the actions of the minutes above, here is a summary of opcode issues followed by a breakdown of opcodes used by each CORE-V extension.

The issues fall into 4 categories:

  1. Use of standard opcode (e.g. OP, LOAD, BRANCH). Any CORE-V extension that uses these opcodes may conflict with a future RISC-V standard extension.
  2. Use of reserved opcode. Any CORE-V extension that uses a reserved opcode may conflict with a future RISC-V standard extension.
  3. Use of custom- opcode. The RISC-V Foundation does not allow any standard extension to use the custom opcodes. Therefore, any CORE-V extension that only uses custom opcodes is guaranteed not to conflict with future standard extensions. However, any custom opcodes must be changed before a CORE-V extension can be accepted as a standard extension.
  4. Incorrect use of standard opcode. The bitmanip immediate instructions (p.extract, p.extractu, p.insert, p.bclr, p.bset) use OP, but they should use OP-IMM (if we choose to continue using standard opcodes).

Any CORE-V extension that uses standard or reserved opcodes risks conflicting with future standard extensions. Using custom opcodes eliminates that risk, but those opcodes must be changed if the extension is to be standardized. I suspect we would also need a compelling reason to use a reserved opcode rather than fitting into the existing ones. Crucially, whatever encodings we choose now are subject to change during standardization.

Here is a summary of opcodes used by each CORE-V extension:

  • PULP bitmanip
    All instructions use OP.
    Immediate instructions (p.extract, p.extractu, p.insert, p.bclr, p.bset) should use OP-IMM.

  • PULP SIMD - reserved opcode.

  • Post-increment load/store - custom-0 opcode.

  • PULP general ALU - Add-shift instructions (p.add[u][R]N[r] and p.sub[u][R]N[r]) use custom-3 opcode.
    Multiply-accumulate instructions use custom-2 opcode.
    Remaining instructions use OP opcode.

  • PULP immediate branching - BRANCH opcode.

  • PULP HW loop - custom-3 opcode.

  • Xpulpcluster - LOAD opcode.

@Silabs-ArjanB Silabs-ArjanB added Type:Bug For bugs in the RTL, Documentation, Verification environment or Tool and Build system Component:RTL For issues in the RTL (e.g. for files in the rtl directory) PARAM:PULP_XPULP Issue depends on the PULP_XPULP parameter labels Aug 19, 2020
@jm4rtin
Copy link
Member

jm4rtin commented Aug 19, 2020

I haven't looked too close at how the branch is encoded but if you exclude it you might be able to squeeze it. It might be a tight fit and may make some encoding choices a bit odd due to tightly packing the space. If you add the branch back I'm not sure. There are a few instructions that are grouped together where some have an immediate and others have this immediate tied to 0. This is wasteful and they can be moved to a different encoding type but will no longer be logically grouped together. We may need to try remapping them and proposing something to see if some instructions need to be cut.

Spreadsheet I used for my rough estimation: pulp_encoding_blocks.xlsx

@davideschiavone
Copy link
Contributor

Thanks a lot @Silabs-ArjanB and @jm4rtin ! Very useful thread.
In my understanding, we should try to see whether we can squeeze all the RESERVED and STANDARD instructions into the custom space.

It's a big task :) However, if there is not enough space, we can think of getting rid of some "less-useful" instructions, and/or reducing the immediate where possible.
I am not saying it makes sense, but for sure we know that at this stage they won't get accepted to the standard GCC.

As this is a long term task (I don't think we will remap all of them over the next month), we need to be conscius that products and prototypes being developed now must rely on the PULP GCC, that is not maintained by OpenHW.
In addition, as we will modify the CV32E40P spec, this also means that OpenHW won't maintain the current ISA of the CV32E40P (with the current encoding etc)

Just a note

@jm4rtin
Copy link
Member

jm4rtin commented Aug 20, 2020

It should be possible to remap them all. Here is my first attempt at remapping them: pulp_encoding_blocks.xlsx

Some notes:

  • Decoding would start with the block (custom-0 ... custom-3)
  • Next would be the funct3 field
  • After that it depends on the block/plane:
    • For custom-0 it maps to an instruction
    • For custom-1 it either maps to an instruction or goes to Plane A or Plane B
    • For custom-2 it looks at f3 to determine the instruction
    • For custom-3 it uses a combination of funct5 and F to determine the instruction (this is the SIMD instructions moved directly to custom-3)
    • For Plane A it looks at funct7 to determine the instruction
    • For Plane B it looks at funct4 to determine the instruction (this plane is only used by the hardware loop right now)
  • If you exclude SIMD a vendor could re-purpose custom-3 for their own use
  • A grouping of instructions (e.g. bit manipulation) may no longer reside in a single block
  • A couple encodings for instructions have moved the bits around slightly
  • This may add a bit of complexity (read: longer combinatorial path) to a few instructions
  • Some of the instructions may need to be moved around in the encoding space a bit to simplify the mapping of bits in the instruction to control signals for the data path

Thoughts?

@Silabs-ArjanB
Copy link
Contributor Author

Hi @jm4rtin Thank you for this great proposal. Of course it is difficult to foresee if timing issues would arise from this; it would probably be easiest to just try it out to figure that out. I like that SIMD is only in custom-3 and that no other instructions are in that group. It would be nice if such a property was true for bit manipulation as well (as we know that a standard for that will be ratified), but I do not see that as a requirement. You could even argue that we should not focus on SIMD and bit manipulation too much and use those instructions to 'fill the holes' and that the focus should be on having a clean encoding for all the other instructions.

@jeremybennett
Copy link
Contributor

@jm4rtin

Nice work. One thing we don't have to worry about is SIMD and bitmanip, since we know these are going to be replaced with the official RISC-V versions in the future for CORE-V. That frees up all custom3 and some more of the space. Does this make the job easier?

I've taken your spreadsheet, adding some color coding for each of the instruction set extension groups. I've also looked how the instruction encodings map on to the standard R-/I-/S-/B-/U-/J-types defined in the standard. In a number of places we would need variants decoding, which may be a headache for the hardware engineers. Of perhaps we already have it for PULP anyway
pulp_encoding_blocks-jpb.xlsx

I'll get @craigblackmore to look and comment.

@davideschiavone
Copy link
Contributor

Well, the cv32e40p has bitmanipulation and simd and will support them.
So it's better to move all the extensions in the the custom space. With less priority sure, but still, maybe someone will extend the official GCC with these instructions too. Especially the SIMD onces, which are further than bitman to be standardized.

@jm4rtin
Copy link
Member

jm4rtin commented Aug 21, 2020

The encoding isn't really a headache for the hardware. Even before they got swapped around there was some other encodings for some of these instructions. It's more of a headache for the toolchain developer to create the custom relocation types.

One thing we don't have to worry about is SIMD and bitmanip, since we know these are going to be replaced with the official RISC-V versions in the future for CORE-V. That frees up all custom3 and some more of the space. Does this make the job easier?

In my opinion it doesn't really make it easier. As you can see from the color coding you provided the bit manipulation (yellow) is not off by itself. I wouldn't suggest spreading the other instructions out into the custom-3 block either. It may be better to move it to the end of the list so that when it is removed there isn't a large gap (Arjan's suggestion).

As for the removal of bit manipulation and SIMD, yeah that is planned for CV32E40. What isn't clear yet is what to do with CV32E40P. This instruction encoding topic appears to be on the docket for the TWG meeting at the end of the month.

@jeremybennett
Copy link
Contributor

Well, the cv32e40p has bitmanipulation and simd and will support them.
So it's better to move all the extensions in the the custom space. With less priority sure, but still, maybe someone will extend the official GCC with these instructions too. Especially the SIMD onces, which are further than bitman to be standardized.

Hi @davideschiavone

The tool chain for CORE-V won't be supporting them, so from an upstreaming perspective, where they are encoded is not an issue for us. They can stay where they are or be moved, whichever is less painful for the hardware team.

@jeremybennett
Copy link
Contributor

The encoding isn't really a headache for the hardware. Even before they got swapped around there was some other encodings for some of these instructions. It's more of a headache for the toolchain developer to create the custom relocation types.

I'm not a hardware engineer, so I know just enough to know it is a problem, but without detailed insight.

One thing we don't have to worry about is SIMD and bitmanip, since we know these are going to be replaced with the official RISC-V versions in the future for CORE-V. That frees up all custom3 and some more of the space. Does this make the job easier?

In my opinion it doesn't really make it easier. As you can see from the color coding you provided the bit manipulation (yellow) is not off by itself. I wouldn't suggest spreading the other instructions out into the custom-3 block either. It may be better to move it to the end of the list so that when it is removed there isn't a large gap (Arjan's suggestion).

As for the removal of bit manipulation and SIMD, yeah that is planned for CV32E40. What isn't clear yet is what to do with CV32E40P. This instruction encoding topic appears to be on the docket for the TWG meeting at the end of the month.

OK - as I noted to @davideschiavone from the perspective of upstreaming, the only opcodes which concern us are those which are to be supported in the upstreamed compiler tool chain. Anything else is a matter of making things as easy as possible for the hardware team.

@davideschiavone
Copy link
Contributor

Well, the cv32e40p has bitmanipulation and simd and will support them.
So it's better to move all the extensions in the the custom space. With less priority sure, but still, maybe someone will extend the official GCC with these instructions too. Especially the SIMD onces, which are further than bitman to be standardized.

Hi @davideschiavone

The tool chain for CORE-V won't be supporting them, so from an upstreaming perspective, where they are encoded is not an issue for us. They can stay where they are or be moved, whichever is less painful for the hardware team.

Hi @jeremybennett

Well they are not supported for now as long as I understood :)
you never now if some contributor wants to make that effort, but in anycase, better to have them in the custom space so that the eventual contributor can invest in upstreaming them

@davideschiavone
Copy link
Contributor

The encoding isn't really a headache for the hardware. Even before they got swapped around there was some other encodings for some of these instructions. It's more of a headache for the toolchain developer to create the custom relocation types.

I'm not a hardware engineer, so I know just enough to know it is a problem, but without detailed insight.

One thing we don't have to worry about is SIMD and bitmanip, since we know these are going to be replaced with the official RISC-V versions in the future for CORE-V. That frees up all custom3 and some more of the space. Does this make the job easier?

In my opinion it doesn't really make it easier. As you can see from the color coding you provided the bit manipulation (yellow) is not off by itself. I wouldn't suggest spreading the other instructions out into the custom-3 block either. It may be better to move it to the end of the list so that when it is removed there isn't a large gap (Arjan's suggestion).
As for the removal of bit manipulation and SIMD, yeah that is planned for CV32E40. What isn't clear yet is what to do with CV32E40P. This instruction encoding topic appears to be on the docket for the TWG meeting at the end of the month.

OK - as I noted to @davideschiavone from the perspective of upstreaming, the only opcodes which concern us are those which are to be supported in the upstreamed compiler tool chain. Anything else is a matter of making things as easy as possible for the hardware team.

Hi @jm4rtin ,

from "writing" the HW is not a problem, but if the encoding becomes very complicated, the HW complexity increases and the area and timing get worse. But this has less priority in my opinion than moving them to the custom space (but @Silabs-ArjanB can add more)

@jm4rtin
Copy link
Member

jm4rtin commented Aug 21, 2020

from "writing" the HW is not a problem, but if the encoding becomes very complicated, the HW complexity increases and the area and timing get worse. But this has less priority in my opinion than moving them to the custom space (but @Silabs-ArjanB can add more)

@davideschiavone Yes, of course. It should only be as complex as it needs to be and no more. The statement was a bit of a simplification. For example it would be a bad idea to move the destination register (rd) around randomly for different instructions. If you have some new field that is only used by the extension instructions it is more flexible and doesn't really cause a penalty. If there is more bits used to distinguish which instruction is which that also comes at a cost. There will be some impact on the hardware but yes it doesn't really make the work for a HW engineer much harder.

@jeremybennett If you have a proposal to reduce the number of encoding types / work needed for the toolchain, feel free to mention it.

@jeremybennett
Copy link
Contributor

@jm4rtin The number of encoding types/work for the tool chain is not a significant problem for the software tool chain. It's just rows in a table and potentially some extra relocations (which would almost certainly be needed irrespective of the encoding). All the hard work is above that level, and with limited resources available we've concentrated on those extensions which will require long-term support in the tools.

@craigblackmore
Copy link

Thanks @jm4rtin this spreadsheet is really helpful.

Another thing to consider is custom-2 and custom-3 are custom for rv32/rv64 but reserved for rv128, so putting everything except SIMD/bitmanip in custom-0/custom-1 would make extending their support to cv128 easier. If SIMD and bitmanip do need to be moved, perhaps they could go in custom-2/custom-3.

@jm4rtin
Copy link
Member

jm4rtin commented Aug 21, 2020

@craigblackmore Most of custom-0 and custom-1 are in use by the post-increment load/store and the immediate branches. Some of the bit manipulation, general ALU and MAC instructions need quite a few bits for various registers/immediates and take up most of custom-2. I don't think it's possible to move those out of custom-2. I think an rv128 core would look very different than what CV32E40P is today.

@Silabs-ArjanB
Copy link
Contributor Author

Once this issue has been resolved the assertions from #457 can be checked again to see if they pass.

@Suxiaojie-iot
Copy link

Recently, when I transplanted cv32e40p on the cluster side of Pulp, I found some problems. In addition to the interface mismatch, some internal logic is also different, including the fetching and decoding stages.
In this regard, I would like to ask how to implement Pulp's cluster kernel to cv32e40p.
Thank you.

@MikeOpenHWGroup
Copy link
Member

Hi @Suxiaojie-iot, the items you mention are deliberate changes, not problems. Please refer to the cv32e40p user manual.

Also, in the future, please do not mix multiple topics in a single GitHub issue. This issue is for Xpulp instruction opcodes. To discuss integration of "Pulp's cluster kernel to cv32e40p" you should open an new issue.

@Suxiaojie-iot
Copy link

Suxiaojie-iot commented Jan 22, 2021 via email

@pascalgouedo
Copy link

Hi,

Starting from the proposal made by @jm4rtin and amended by @jeremybennett in pulp_encoding_blocks-jpb.xlsx, here is a new PULP re-encoding proposal for CV32E40Pv2 pulp_encoding_blocks-OPHW-jm-jpb-pgo.xlsx.

It contains few corrections and modifications to improve instructions decoding.
Original proposal are commented in sheets renamed JM, final proposal are not renamed and PGO renamed ones.

Feel free to review and comment using this thread.

Best regards,
Pascal.

@davideschiavone
Copy link
Contributor

@jm4rtin , @jeremybennett any comment on this?

@pascalgouedo
Copy link

pascalgouedo commented Apr 6, 2022

Hi,
Here is a new version pulp_encoding_blocks-OPHW-jm-jpb-pgo-2022-04-06.xlsx
with X's removal and some bit manip instructions grouping in custom-3.

@pascalgouedo
Copy link

Last update on pack instructions pulp_encoding_blocks-OPHW-jm-jpb-pgo-2022-04-26.xlsx

pascalgouedo pushed a commit to pascalgouedo/cv32e40p that referenced this issue Apr 26, 2022
…pulp_encoding_blocks-OPHW-jm-jpb-pgo-2022-04-26.xlsx)

Signed-off-by: Pascal Gouedo <[email protected]>
@jm4rtin
Copy link
Member

jm4rtin commented Apr 26, 2022

It looks good to me. Thank you for taking the time to optimize the encodings. Will PR #700 include the RTL changes eventually or is it just for the documentation updates?

@pascalgouedo
Copy link

Thanks John.
#700 will only be documentation updates.
I will submit another PR with RTL changes.

pascalgouedo pushed a commit to pascalgouedo/cv32e40p that referenced this issue Apr 28, 2022
…pulp_encoding_blocks-OPHW-jm-jpb-pgo-2022-04-26.xlsx)

Signed-off-by: Pascal Gouedo <[email protected]>
@pascalgouedo
Copy link

#700 replaced by #704.

davideschiavone added a commit that referenced this issue Apr 29, 2022
PULP instructions re-encoding following PR #452 proposal
@pascalgouedo
Copy link

Final version of the document with removal of old JM encoding proposals pulp_encoding_blocks-OPHW-final-2022-06-22.xlsx

@pascalgouedo
Copy link

Hi
As always when starting to implement specification and verifying it, you find some problems you didn't anticipate.
Here things are that some ALU control signals are not generated in decoder file but in id one and were not analysed/taken into account in previous versions of re-encoding tables (at least by me).

Here is a new pulp_encoding_blocks-OPHW-2022-09-26.xlsx version taking into account ALU control signals.
First tab lists modifications with respect to older versions and each tab has a comment for each changed instructions.

Best regards,
Pascal.

@pascalgouedo
Copy link

Hi,
The very last version pulp_encoding_blocks-OPHW-2022-10-07.xlsx with final encoding of the 2 new Hardware Loops instructions.

@pascalgouedo
Copy link

Added missing cv.dotsp.sc.b in pulp_encoding_blocks-OPHW-2022-11-14.xlsx

pascalgouedo pushed a commit to pascalgouedo/cv32e40p that referenced this issue Nov 24, 2022
…p_encoding_blocks-OPHW-2022-11-14 from issue openhwgroup#452.

Changes wrt pulp_encoding_blocks-OPHW-jm-jpb-pgo-2022-04-26 :
- new encoding for cv.extract, cv.insert, cv.shuffle, cv.pack, cv.cplxmul, cv.subrotmj, cv.add/subb.div2/4/8
- description fix for cv.insert, cv.clb,  cv.bitrev, cv.avgu
- added missing SIMD cv.sdotsp.sc.b

Signed-off-by: Pascal Gouedo <[email protected]>
@davideschiavone
Copy link
Contributor

@pascalgouedo can we close this issue?

@pascalgouedo
Copy link

Resolved in cv32e40p_v1.2.0

@pascalgouedo pascalgouedo added the Status:Resolved Issue has been resolved, but closure is pending on git merge and/or issuer confirmation label Oct 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component:RTL For issues in the RTL (e.g. for files in the rtl directory) PARAM:PULP_XPULP Issue depends on the PULP_XPULP parameter Status:Resolved Issue has been resolved, but closure is pending on git merge and/or issuer confirmation Type:Bug For bugs in the RTL, Documentation, Verification environment or Tool and Build system
Projects
None yet
Development

No branches or pull requests

8 participants