-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
X86Tables: Converts tables to be mostly consteval #3320
X86Tables: Converts tables to be mostly consteval #3320
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Are some of the tables variables themselves also able to be made const/constexpr as well? (rather than just the lambda being consteval?)
Looks good either way though
for (size_t j = 0; j < TableSize; ++j) { | ||
X86TablesInfoStruct<OpcodeType> const &Op = LocalTable[j]; | ||
auto OpNum = Op.first; | ||
X86InstInfo const &Info = Op.Info; | ||
for (uint32_t i = 0; i < Op.second; ++i) { | ||
LOGMAN_THROW_AA_FMT(FinalTable[OpNum + i].Type == TYPE_UNKNOWN, "Duplicate Entry {}->{}", FinalTable[OpNum + i].Name, Info.Name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how I feel about dropping self-consistency checks for the sake of saving a few kilobytes of executable memory. How about re-adding the checks as a post=init verification step in InitializeInfoTables
?
Luckily this can even be done generically like here (look for CheckForInternalConflicts).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Post initialization verification is hard to do since there will be table entries that remain having their type set to TYPE_UNKNOWN
.
We're checking to ensure the values that we're setting are still TYPE_UNKNOWN
before overwriting it.
We can't do consteval asserts to match this behaviour as far as I can tell.
For your implementation of CheckForInternalConflicts
it is implemented with the comment of // TODO: Implement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second look I think CheckForInternalConflicts
only ever intended to check for conflicts within each individual LocalTable
but not between different once. The former makes the check trivial, hence the TODO.
We can't do consteval asserts to match this behaviour as far as I can tell.
Yeah, that would require one .cpp
file to have full visibility of all the various LocalTables, upon which it could concatenate them and then check for duplicates. And that's clearly contrary to the whole point of splitting these tables into different files in the first place.
I realized one thing though: Have you tried simply leaving the LOGMAN_THROW_AA_FMT(...)
in? You should actually be able to have non-constexpr code in a consteval
lambda as long as the code doesn't execute. It will only fail if the assert indeed is failing then. If for some reason it doesn't compile right now, I think there should be an easy way to make it work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I tried leaving them in and it results in a compile failure.
/mnt/Work/Work/work/FEXNew/FEXCore/Source/Interface/Core/X86Tables/SecondaryGroupTables.cpp:15:80: error: call to consteval function 'FEXCore::X86Tables::(anonymous class)::operator()' is not a constant expression
std::array<X86InstInfo, MAX_INST_SECOND_GROUP_TABLE_SIZE> SecondInstGroupOps = []() consteval {
^
/mnt/Work/Work/work/FEXNew/FEXCore/Source/Interface/Core/X86Tables/X86Tables.h:516:7: note: non-constexpr function 'AFmt<const char *, const char *>' cannot be used in a constant expression
LOGMAN_THROW_AA_FMT(FinalTable[OpNum + i].Type == TYPE_UNKNOWN, "Duplicate Entry {}->{}", FinalTable[OpNum + i].Name, Info.Name);
^
/mnt/Work/Work/work/FEXNew/FEXCore/include/FEXCore/Utils/LogManager.h:75:45: note: expanded from macro 'LOGMAN_THROW_AA_FMT'
#define LOGMAN_THROW_AA_FMT(pred, ...) do { LogMan::Throw::AFmt(pred, __VA_ARGS__); } while (0)
^
/mnt/Work/Work/work/FEXNew/FEXCore/Source/Interface/Core/X86Tables/SecondaryGroupTables.cpp:490:3: note: in call to 'GenerateTable(&Table._M_elems[0], &SecondaryExtensionOpTable[0], 384)'
GenerateTable(&Table.at(0), SecondaryExtensionOpTable, std::size(SecondaryExtensionOpTable));
^
/mnt/Work/Work/work/FEXNew/FEXCore/Source/Interface/Core/X86Tables/SecondaryGroupTables.cpp:15:80: note: in call to '&[]() {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like it fails because that unconditionally calls LogMan::Throw::AFmt
. What happens if you wrap the calls in the actual assertion condition?
if (FinalTable[OpNum + i].Type != TYPE_UNKNOWN) {
LOGMAN_THROW_AA_FMT(FinalTable[OpNum + i].Type == TYPE_UNKNOWN, "Duplicate Entry {}->{}", FinalTable[OpNum + i].Name, Info.Name);
}
(Obviously we might as well just use an ERROR_AND_DIE_FMT
or similar at that point)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out that does work. Thought I tried that.
I wrote a patch very similar to this but wasn't happy with the binary size overhead. Do you know what changed meanwhile, since I measured a size increase of relative 5% whereas you're mentioning a size decrease? I vaguely recall some flags being removed recently, but not sure that make such a huge difference. For reference, my patch is here: main...neobrain:FEX:opt_x86tables_init_next |
The file size increased a small amount (Half a megabyte I think?) but the VM size reduced, since it doesn't need to maintain both tables. |
That's interesting. Where did the binary increase go to? Why is VM size the decisive metric here? |
VM size is more interesting since that means how much memory will be allocated and consumed in a new process. File size is less interesting since when binfmt_misc is installed, the kernel is going to keep the file in memory regardless, it won't be accessing the disk. |
I'm not following. Could you explain what VM size actually is and how you measure it? It's not a web-searchable term. The PR description made me guess it's an ELF section, but it sounds like that's not the case. Regarding my original question, do you know which of the ELF sections in the FEX binary specifically has grown? |
https://github.com/google/bloaty Before:
After:
The primary file size change comes from the .data section going from 1.12Ki to 812Ki. |
5d75821
to
86ea8db
Compare
for (size_t j = 0; j < TableSize; ++j) { | ||
X86TablesInfoStruct<OpcodeType> const &Op = LocalTable[j]; | ||
auto OpNum = Op.first; | ||
X86InstInfo const &Info = Op.Info; | ||
for (uint32_t i = 0; i < Op.second; ++i) { | ||
LOGMAN_THROW_AA_FMT(FinalTable[OpNum + i].Type == TYPE_UNKNOWN, "Duplicate Entry {}->{}", FinalTable[OpNum + i].Name, Info.Name); | ||
if (FinalTable[OpNum + i].Type != TYPE_UNKNOWN) { | ||
LOGMAN_MSG_A_FMT("Duplicate Entry {}->{}", FinalTable[OpNum + i].Name, Info.Name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no need to use debug-only primitives here anymore. Currently, this won't trigger a build failure in Release builds, since LOGMAN_MSG_A_FMT is no-opped there.
Reduces the ELF's VM size from 9.8MB down to 9.37MB and should reduce initialization time a smidge. Slammed this out while waiting for other PRs to get reviewed.
86ea8db
to
98f21a2
Compare
Forgot to respond to this. Currently none of the members can be converted to const but theoretically once we get the tables fully converted over to compile time generated then they can be. Not quite there yet. |
Only doing the single table for review purposes. Once reviewed I will hammer out the remaining tables. Similar to FEX-Emu#3320, most of the OpcodeDispatcher tables can be consteval and made to be a compile time constant. This just requires shuffling the code slightly. The idea is to get almost all of the table setup out of the `InstallOpcodeHandlers` function and instead only install the handlers that change based on 32-bit or 64-bit, just like the x86 tables we also did. This base table removal reduces the `InstallOpcodeHandlers` function from 981 instructions down to 852. It increases `InitializeBaseTables` from 65 instructions to 113. A net removal of 81 instructions. Savings will be more than that of course because it calls to memcpy, but just a general idea. This tables are constexpr and should be evaluated by the compiler just like the previous x86 tables.
Only doing the single table for review purposes. Once reviewed I will hammer out the remaining tables. Similar to FEX-Emu#3320, most of the OpcodeDispatcher tables can be consteval and made to be a compile time constant. This just requires shuffling the code slightly. The idea is to get almost all of the table setup out of the `InstallOpcodeHandlers` function and instead only install the handlers that change based on 32-bit or 64-bit, just like the x86 tables we also did. This base table removal reduces the `InstallOpcodeHandlers` function from 981 instructions down to 852. It increases `InitializeBaseTables` from 65 instructions to 113. A net removal of 81 instructions. Savings will be more than that of course because it calls to memcpy, but just a general idea. This tables are constexpr and should be evaluated by the compiler just like the previous x86 tables.
Only doing the single table for review purposes. Once reviewed I will hammer out the remaining tables. Similar to FEX-Emu#3320, most of the OpcodeDispatcher tables can be consteval and made to be a compile time constant. This just requires shuffling the code slightly. The idea is to get almost all of the table setup out of the `InstallOpcodeHandlers` function and instead only install the handlers that change based on 32-bit or 64-bit, just like the x86 tables we also did. This base table removal reduces the `InstallOpcodeHandlers` function from 981 instructions down to 852. It increases `InitializeBaseTables` from 65 instructions to 113. A net removal of 81 instructions. Savings will be more than that of course because it calls to memcpy, but just a general idea. This tables are constexpr and should be evaluated by the compiler just like the previous x86 tables.
Only doing the single table for review purposes. Once reviewed I will hammer out the remaining tables. Similar to FEX-Emu#3320, most of the OpcodeDispatcher tables can be consteval and made to be a compile time constant. This just requires shuffling the code slightly. The idea is to get almost all of the table setup out of the `InstallOpcodeHandlers` function and instead only install the handlers that change based on 32-bit or 64-bit, just like the x86 tables we also did.
Only doing the single table for review purposes. Once reviewed I will hammer out the remaining tables. Similar to FEX-Emu#3320, most of the OpcodeDispatcher tables can be consteval and made to be a compile time constant. This just requires shuffling the code slightly. The idea is to get almost all of the table setup out of the `InstallOpcodeHandlers` function and instead only install the handlers that change based on 32-bit or 64-bit, just like the x86 tables we also did.
Only doing the single table for review purposes. Once reviewed I will hammer out the remaining tables. Similar to FEX-Emu#3320, most of the OpcodeDispatcher tables can be consteval and made to be a compile time constant. This just requires shuffling the code slightly. The idea is to get almost all of the table setup out of the `InstallOpcodeHandlers` function and instead only install the handlers that change based on 32-bit or 64-bit, just like the x86 tables we also did.
Only doing the single table for review purposes. Once reviewed I will hammer out the remaining tables. Similar to FEX-Emu#3320, most of the OpcodeDispatcher tables can be consteval and made to be a compile time constant. This just requires shuffling the code slightly. The idea is to get almost all of the table setup out of the `InstallOpcodeHandlers` function and instead only install the handlers that change based on 32-bit or 64-bit, just like the x86 tables we also did.
Only doing the single table for review purposes. Once reviewed I will hammer out the remaining tables. Similar to FEX-Emu#3320, most of the OpcodeDispatcher tables can be consteval and made to be a compile time constant. This just requires shuffling the code slightly. The idea is to get almost all of the table setup out of the `InstallOpcodeHandlers` function and instead only install the handlers that change based on 32-bit or 64-bit, just like the x86 tables we also did.
Only doing the single table for review purposes. Once reviewed I will hammer out the remaining tables. Similar to FEX-Emu#3320, most of the OpcodeDispatcher tables can be consteval and made to be a compile time constant. This just requires shuffling the code slightly. The idea is to get almost all of the table setup out of the `InstallOpcodeHandlers` function and instead only install the handlers that change based on 32-bit or 64-bit, just like the x86 tables we also did.
Reduces the ELF's VM size from 9.8MB down to 9.37MB and should reduce initialization time a smidge.
Slammed this out while waiting for other PRs to get reviewed.