Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

X86Tables: Converts tables to be mostly consteval #3320

Merged
merged 1 commit into from
Dec 12, 2023

Conversation

Sonicadvance1
Copy link
Member

Reduces the ELF's VM size from 9.8MB down to 9.37MB and should reduce initialization time a smidge.

Slammed this out while waiting for other PRs to get reviewed.

Copy link
Contributor

@lioncash lioncash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Are some of the tables variables themselves also able to be made const/constexpr as well? (rather than just the lambda being consteval?)

Looks good either way though

for (size_t j = 0; j < TableSize; ++j) {
X86TablesInfoStruct<OpcodeType> const &Op = LocalTable[j];
auto OpNum = Op.first;
X86InstInfo const &Info = Op.Info;
for (uint32_t i = 0; i < Op.second; ++i) {
LOGMAN_THROW_AA_FMT(FinalTable[OpNum + i].Type == TYPE_UNKNOWN, "Duplicate Entry {}->{}", FinalTable[OpNum + i].Name, Info.Name);
Copy link
Member

@neobrain neobrain Dec 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how I feel about dropping self-consistency checks for the sake of saving a few kilobytes of executable memory. How about re-adding the checks as a post=init verification step in InitializeInfoTables?

Luckily this can even be done generically like here (look for CheckForInternalConflicts).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Post initialization verification is hard to do since there will be table entries that remain having their type set to TYPE_UNKNOWN.
We're checking to ensure the values that we're setting are still TYPE_UNKNOWN before overwriting it.
We can't do consteval asserts to match this behaviour as far as I can tell.

For your implementation of CheckForInternalConflicts it is implemented with the comment of // TODO: Implement?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second look I think CheckForInternalConflicts only ever intended to check for conflicts within each individual LocalTable but not between different once. The former makes the check trivial, hence the TODO.

We can't do consteval asserts to match this behaviour as far as I can tell.

Yeah, that would require one .cpp file to have full visibility of all the various LocalTables, upon which it could concatenate them and then check for duplicates. And that's clearly contrary to the whole point of splitting these tables into different files in the first place.

I realized one thing though: Have you tried simply leaving the LOGMAN_THROW_AA_FMT(...) in? You should actually be able to have non-constexpr code in a consteval lambda as long as the code doesn't execute. It will only fail if the assert indeed is failing then. If for some reason it doesn't compile right now, I think there should be an easy way to make it work.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I tried leaving them in and it results in a compile failure.

/mnt/Work/Work/work/FEXNew/FEXCore/Source/Interface/Core/X86Tables/SecondaryGroupTables.cpp:15:80: error: call to consteval function 'FEXCore::X86Tables::(anonymous class)::operator()' is not a constant expression
std::array<X86InstInfo, MAX_INST_SECOND_GROUP_TABLE_SIZE> SecondInstGroupOps = []() consteval {
                                                                               ^
/mnt/Work/Work/work/FEXNew/FEXCore/Source/Interface/Core/X86Tables/X86Tables.h:516:7: note: non-constexpr function 'AFmt<const char *, const char *>' cannot be used in a constant expression
      LOGMAN_THROW_AA_FMT(FinalTable[OpNum + i].Type == TYPE_UNKNOWN, "Duplicate Entry {}->{}", FinalTable[OpNum + i].Name, Info.Name);
      ^
/mnt/Work/Work/work/FEXNew/FEXCore/include/FEXCore/Utils/LogManager.h:75:45: note: expanded from macro 'LOGMAN_THROW_AA_FMT'
#define LOGMAN_THROW_AA_FMT(pred, ...) do { LogMan::Throw::AFmt(pred, __VA_ARGS__); } while (0)
                                            ^
/mnt/Work/Work/work/FEXNew/FEXCore/Source/Interface/Core/X86Tables/SecondaryGroupTables.cpp:490:3: note: in call to 'GenerateTable(&Table._M_elems[0], &SecondaryExtensionOpTable[0], 384)'
  GenerateTable(&Table.at(0), SecondaryExtensionOpTable, std::size(SecondaryExtensionOpTable));
  ^
/mnt/Work/Work/work/FEXNew/FEXCore/Source/Interface/Core/X86Tables/SecondaryGroupTables.cpp:15:80: note: in call to '&[]() {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it fails because that unconditionally calls LogMan::Throw::AFmt. What happens if you wrap the calls in the actual assertion condition?

if (FinalTable[OpNum + i].Type != TYPE_UNKNOWN) {
      LOGMAN_THROW_AA_FMT(FinalTable[OpNum + i].Type == TYPE_UNKNOWN, "Duplicate Entry {}->{}", FinalTable[OpNum + i].Name, Info.Name);
}

(Obviously we might as well just use an ERROR_AND_DIE_FMT or similar at that point)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out that does work. Thought I tried that.

@neobrain
Copy link
Member

neobrain commented Dec 9, 2023

I wrote a patch very similar to this but wasn't happy with the binary size overhead. Do you know what changed meanwhile, since I measured a size increase of relative 5% whereas you're mentioning a size decrease?

I vaguely recall some flags being removed recently, but not sure that make such a huge difference.

For reference, my patch is here: main...neobrain:FEX:opt_x86tables_init_next

@Sonicadvance1
Copy link
Member Author

I wrote a patch very similar to this but wasn't happy with the binary size overhead. Do you know what changed meanwhile, since I measured a size increase of relative 5% whereas you're mentioning a size decrease?

I vaguely recall some flags being removed recently, but not sure that make such a huge difference.

For reference, my patch is here: main...neobrain:FEX:opt_x86tables_init_next

The file size increased a small amount (Half a megabyte I think?) but the VM size reduced, since it doesn't need to maintain both tables.

@neobrain
Copy link
Member

The file size increased a small amount (Half a megabyte I think?) but the VM size reduced, since it doesn't need to maintain both tables.

That's interesting. Where did the binary increase go to? Why is VM size the decisive metric here?

@Sonicadvance1
Copy link
Member Author

The file size increased a small amount (Half a megabyte I think?) but the VM size reduced, since it doesn't need to maintain both tables.

That's interesting. Where did the binary increase go to? Why is VM size the decisive metric here?

VM size is more interesting since that means how much memory will be allocated and consumed in a new process. File size is less interesting since when binfmt_misc is installed, the kernel is going to keep the file in memory regardless, it won't be accessing the disk.

@neobrain
Copy link
Member

neobrain commented Dec 11, 2023

VM size is more interesting since that means how much memory will be allocated and consumed in a new process. File size is less interesting since when binfmt_misc is installed, the kernel is going to keep the file in memory regardless, it won't be accessing the disk.

I'm not following. Could you explain what VM size actually is and how you measure it? It's not a web-searchable term. The PR description made me guess it's an ELF section, but it sounds like that's not the case.

Regarding my original question, do you know which of the ELF sections in the FEX binary specifically has grown?

@Sonicadvance1
Copy link
Member Author

VM size is more interesting since that means how much memory will be allocated and consumed in a new process. File size is less interesting since when binfmt_misc is installed, the kernel is going to keep the file in memory regardless, it won't be accessing the disk.

I'm not following. Could you explain what VM size actually is and how you measure it? It's not a web-searchable term, and it sounds like I guessed wrong about its meaning.

Regarding my original question, do you know which of the ELF sections in the FEX binary specifically has grown?

https://github.com/google/bloaty
Bloaty can analyze an ELF and tell you its component sizes.
File size is the size of the file.
VM size is the size of those components when loaded in to memory.

Before:

    FILE SIZE        VM SIZE
 --------------  --------------
   0.0%       0  54.2%  5.31Mi    .bss
  74.2%  3.34Mi  34.0%  3.34Mi    .text
   5.9%   270Ki   2.7%   270Ki    .rela.dyn
   5.7%   263Ki   2.6%   263Ki    .data.rel.ro
   5.4%   248Ki   2.5%   248Ki    .eh_frame
   4.4%   201Ki   2.0%   201Ki    .rodata
   1.0%  47.9Ki   0.5%  47.9Ki    .dynstr
   1.0%  46.1Ki   0.5%  46.1Ki    .eh_frame_hdr
   0.8%  36.3Ki   0.4%  36.3Ki    .gcc_except_table
   0.7%  30.2Ki   0.3%  30.2Ki    .dynsym
   0.2%  10.3Ki   0.1%  10.3Ki    .rela.plt
   0.1%  6.88Ki   0.1%  6.88Ki    .plt
   0.1%  5.87Ki   0.1%  5.87Ki    .gnu.hash
   0.1%  5.21Ki   0.1%  5.21Ki    .tdata
   0.1%  3.45Ki   0.0%  3.45Ki    .got.plt
   0.1%  2.52Ki   0.0%  2.52Ki    .gnu.version
   0.0%  2.06Ki   0.0%       0    [ELF Section Headers]
   0.0%  2.00Ki   0.0%  1.55Ki    [17 Others]
   0.0%  1.12Ki   0.0%  1.12Ki    .data
   0.0%     992   0.0%     992    .got
   0.0%     672   0.0%     672    [ELF Program Headers]
 100.0%  4.49Mi 100.0%  9.80Mi    TOTAL

After:

    FILE SIZE        VM SIZE
 --------------  --------------
   0.0%       0  44.2%  4.14Mi    .bss
  63.7%  3.33Mi  35.5%  3.33Mi    .text
  15.2%   812Ki   8.5%   812Ki    .data
   6.1%   327Ki   3.4%   327Ki    .rela.dyn
   4.6%   248Ki   2.6%   248Ki    .eh_frame
   3.7%   200Ki   2.1%   200Ki    .rodata
   2.9%   157Ki   1.6%   157Ki    .data.rel.ro
   0.9%  47.9Ki   0.5%  47.9Ki    .dynstr
   0.9%  46.1Ki   0.5%  46.1Ki    .eh_frame_hdr
   0.7%  36.3Ki   0.4%  36.3Ki    .gcc_except_table
   0.6%  30.2Ki   0.3%  30.2Ki    .dynsym
   0.2%  10.3Ki   0.1%  10.3Ki    .rela.plt
   0.1%  6.88Ki   0.1%  6.88Ki    .plt
   0.1%  5.87Ki   0.1%  5.87Ki    .gnu.hash
   0.1%  5.21Ki   0.1%  5.21Ki    .tdata
   0.1%  3.45Ki   0.0%  3.45Ki    .got.plt
   0.0%  2.52Ki   0.0%  2.52Ki    .gnu.version
   0.0%  2.06Ki   0.0%       0    [ELF Section Headers]
   0.0%  1.98Ki   0.0%  1.54Ki    [17 Others]
   0.0%  1.09Ki   0.0%  1.09Ki    .got
   0.0%     672   0.0%     672    [ELF Program Headers]
 100.0%  5.23Mi 100.0%  9.37Mi    TOTAL

The primary file size change comes from the .data section going from 1.12Ki to 812Ki.
The primary VM size change comes from the .bss section going from 5.31Mi to 4.14Mi.

@Sonicadvance1 Sonicadvance1 force-pushed the consteval_x86_tables branch 2 times, most recently from 5d75821 to 86ea8db Compare December 11, 2023 16:30
for (size_t j = 0; j < TableSize; ++j) {
X86TablesInfoStruct<OpcodeType> const &Op = LocalTable[j];
auto OpNum = Op.first;
X86InstInfo const &Info = Op.Info;
for (uint32_t i = 0; i < Op.second; ++i) {
LOGMAN_THROW_AA_FMT(FinalTable[OpNum + i].Type == TYPE_UNKNOWN, "Duplicate Entry {}->{}", FinalTable[OpNum + i].Name, Info.Name);
if (FinalTable[OpNum + i].Type != TYPE_UNKNOWN) {
LOGMAN_MSG_A_FMT("Duplicate Entry {}->{}", FinalTable[OpNum + i].Name, Info.Name);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no need to use debug-only primitives here anymore. Currently, this won't trigger a build failure in Release builds, since LOGMAN_MSG_A_FMT is no-opped there.

Reduces the ELF's VM size from 9.8MB down to 9.37MB and should reduce
initialization time a smidge.

Slammed this out while waiting for other PRs to get reviewed.
@Sonicadvance1
Copy link
Member Author

Nice! Are some of the tables variables themselves also able to be made const/constexpr as well? (rather than just the lambda being consteval?)

Looks good either way though

Forgot to respond to this. Currently none of the members can be converted to const but theoretically once we get the tables fully converted over to compile time generated then they can be. Not quite there yet.

@Sonicadvance1 Sonicadvance1 merged commit ec89a00 into FEX-Emu:main Dec 12, 2023
10 checks passed
@Sonicadvance1 Sonicadvance1 deleted the consteval_x86_tables branch December 12, 2023 07:03
Sonicadvance1 added a commit to Sonicadvance1/FEX that referenced this pull request Sep 7, 2024
Only doing the single table for review purposes. Once reviewed I will
hammer out the remaining tables.

Similar to FEX-Emu#3320, most of the OpcodeDispatcher tables can be consteval
and made to be a compile time constant. This just requires shuffling the
code slightly. The idea is to get almost all of the table setup out of
the `InstallOpcodeHandlers` function and instead only install the
handlers that change based on 32-bit or 64-bit, just like the x86 tables
we also did.

This base table removal reduces the `InstallOpcodeHandlers` function
from 981 instructions down to 852. It increases `InitializeBaseTables`
from 65 instructions to 113. A net removal of 81 instructions.

Savings will be more than that of course because it calls to memcpy, but
just a general idea. This tables are constexpr and should be evaluated
by the compiler just like the previous x86 tables.
Sonicadvance1 added a commit to Sonicadvance1/FEX that referenced this pull request Sep 7, 2024
Only doing the single table for review purposes. Once reviewed I will
hammer out the remaining tables.

Similar to FEX-Emu#3320, most of the OpcodeDispatcher tables can be consteval
and made to be a compile time constant. This just requires shuffling the
code slightly. The idea is to get almost all of the table setup out of
the `InstallOpcodeHandlers` function and instead only install the
handlers that change based on 32-bit or 64-bit, just like the x86 tables
we also did.

This base table removal reduces the `InstallOpcodeHandlers` function
from 981 instructions down to 852. It increases `InitializeBaseTables`
from 65 instructions to 113. A net removal of 81 instructions.

Savings will be more than that of course because it calls to memcpy, but
just a general idea. This tables are constexpr and should be evaluated
by the compiler just like the previous x86 tables.
Sonicadvance1 added a commit to Sonicadvance1/FEX that referenced this pull request Sep 7, 2024
Only doing the single table for review purposes. Once reviewed I will
hammer out the remaining tables.

Similar to FEX-Emu#3320, most of the OpcodeDispatcher tables can be consteval
and made to be a compile time constant. This just requires shuffling the
code slightly. The idea is to get almost all of the table setup out of
the `InstallOpcodeHandlers` function and instead only install the
handlers that change based on 32-bit or 64-bit, just like the x86 tables
we also did.

This base table removal reduces the `InstallOpcodeHandlers` function
from 981 instructions down to 852. It increases `InitializeBaseTables`
from 65 instructions to 113. A net removal of 81 instructions.

Savings will be more than that of course because it calls to memcpy, but
just a general idea. This tables are constexpr and should be evaluated
by the compiler just like the previous x86 tables.
Sonicadvance1 added a commit to Sonicadvance1/FEX that referenced this pull request Sep 7, 2024
Only doing the single table for review purposes. Once reviewed I will
hammer out the remaining tables.

Similar to FEX-Emu#3320, most of the OpcodeDispatcher tables can be consteval
and made to be a compile time constant. This just requires shuffling the
code slightly. The idea is to get almost all of the table setup out of
the `InstallOpcodeHandlers` function and instead only install the
handlers that change based on 32-bit or 64-bit, just like the x86 tables
we also did.
Sonicadvance1 added a commit to Sonicadvance1/FEX that referenced this pull request Sep 7, 2024
Only doing the single table for review purposes. Once reviewed I will
hammer out the remaining tables.

Similar to FEX-Emu#3320, most of the OpcodeDispatcher tables can be consteval
and made to be a compile time constant. This just requires shuffling the
code slightly. The idea is to get almost all of the table setup out of
the `InstallOpcodeHandlers` function and instead only install the
handlers that change based on 32-bit or 64-bit, just like the x86 tables
we also did.
Sonicadvance1 added a commit to Sonicadvance1/FEX that referenced this pull request Sep 7, 2024
Only doing the single table for review purposes. Once reviewed I will
hammer out the remaining tables.

Similar to FEX-Emu#3320, most of the OpcodeDispatcher tables can be consteval
and made to be a compile time constant. This just requires shuffling the
code slightly. The idea is to get almost all of the table setup out of
the `InstallOpcodeHandlers` function and instead only install the
handlers that change based on 32-bit or 64-bit, just like the x86 tables
we also did.
Sonicadvance1 added a commit to Sonicadvance1/FEX that referenced this pull request Sep 9, 2024
Only doing the single table for review purposes. Once reviewed I will
hammer out the remaining tables.

Similar to FEX-Emu#3320, most of the OpcodeDispatcher tables can be consteval
and made to be a compile time constant. This just requires shuffling the
code slightly. The idea is to get almost all of the table setup out of
the `InstallOpcodeHandlers` function and instead only install the
handlers that change based on 32-bit or 64-bit, just like the x86 tables
we also did.
Sonicadvance1 added a commit to Sonicadvance1/FEX that referenced this pull request Sep 12, 2024
Only doing the single table for review purposes. Once reviewed I will
hammer out the remaining tables.

Similar to FEX-Emu#3320, most of the OpcodeDispatcher tables can be consteval
and made to be a compile time constant. This just requires shuffling the
code slightly. The idea is to get almost all of the table setup out of
the `InstallOpcodeHandlers` function and instead only install the
handlers that change based on 32-bit or 64-bit, just like the x86 tables
we also did.
Sonicadvance1 added a commit to Sonicadvance1/FEX that referenced this pull request Sep 13, 2024
Only doing the single table for review purposes. Once reviewed I will
hammer out the remaining tables.

Similar to FEX-Emu#3320, most of the OpcodeDispatcher tables can be consteval
and made to be a compile time constant. This just requires shuffling the
code slightly. The idea is to get almost all of the table setup out of
the `InstallOpcodeHandlers` function and instead only install the
handlers that change based on 32-bit or 64-bit, just like the x86 tables
we also did.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants