-
Notifications
You must be signed in to change notification settings - Fork 190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Anti-Aliasing Bug (Initialize pre-generated message, then modify it) #274
Comments
Actually taking again a look at this, this line looks dangerous:
|
OK, I'd like to look into this issue if there is anything I can do. So I don't want to reject this up front. But I am not convinced that this is a FlatCC issue. I cannot easily reproduce your project because I do not have the tool chain, nor necessarily the hardware, you are using. Now, given that I don't have enough information, I can speculate: First, as Wouter AardAppel writes in the link you reference, reading is by definition read only, so aliasing should not result in different data unless you have UB (undefined behaviour) in the implementation. I am not sure where the problem is exactly, other than it sounds like a vtable lookup issue, but if so, it likely happens many other places for similar reasons, so I'll take this is a symptom, and use the line you posted in your second comment. In C it is (relatively recently) undefined behaviour to cast a pointer to a type it is not, except if the pointer is 'char *' which is explicitly valid, and necessary, to do pointer math. Now, this problem is mostly with accessing data the compiler generates, but here we are reading static data from a buffer, so the compiler cannot shuffle things around. Also, we are casting to uint8_t * which for all practical purposes is the same as 'char *' and any compiler that cannot handle that, should be fixed as there is too much such code around. After doing pointer math, the code is cast to the underlying correct type, and if you do things right on your side, it is also correctly aligned in memory. However, if your buffer is not aligned properly, some architectures might run slower, issue access violations, or do other unexpected stuff. For this reason, the code you reference is not dangerous, except if accessing memory that is not properly aligned. And as to GCC, I have already removed support for GCC pedantic warnings because they break portable C code, and if this turns to not be easily managed in FlatCC, it may require further downgrading GCC support, e.g. removing optimizations in recommended builds. |
For reference below is the code that is being discussed. I link to the "reflection" code which is generated code that is checked into the repo, and the file is same for all flatcc projects of the same version, so this is not related to the reflection schema in particular. Early in the file we have the macro
What cannot see from this is arguments being use to call this macro. If there are aliasing issues, this could be relevant. However, the macro is already being used in such inline functions, it seems overkill to add yet another layer which is why it is not. Here is an example from the same file:
|
Here is a weekly build that shows all the GCC versions being tested and pass. But that is for whatever architecture Github Actions use: https://github.com/dvidelabs/flatcc/actions/runs/8047226724 |
Further: the macro I mentioned calls a function This function is defined in the flatcc_accessors file and use union cast which is generally believed to be a valid cast, at least respected by all living compilers. It could be argued that memcpy should be used, and I did work an that, but that got rather invasive and not worthwhile. Historically, memcpy would be slow, but lately it is understood to be a cast operator by most compilers. https://github.com/dvidelabs/flatcc/blob/master/include/flatcc/flatcc_accessors.h You could try to play around with different cast implementations, or replace the read function. The
|
Thanks! You are right, I could've done a better job producing a seamless repro. Here it is RafaelLaya#1 I forked FlatCC and modified the Monster example to show this. You should be able to Thanks! Please let me know if you come to any conclusions |
I had a look at your PR but don't currently have a tool chain to test with. I will see what I can do. Meanwhile, please consider experimenting with replacing functions in the accessors and common files discussed above. This is where a possible FlatCC fix would happen. That said, it really looks like a compiler bug but I will not rule out anything since these things are notoriously tricky. I cannot see any scenerio where the compiler should think it is a good idea to dereference memory it is responsible for initializing before it has initialized that memory area, which is what it does when dereferencing the stack pointer after reserving space. Of course someone could argue that the optimizer need not check for that if the code it compiles is correct, but I find that a hard sell, and the code works on numerous other platforms, also with GCC compiler. In the PR you have a correctly aligned memory buffer that is compiler initialized. Just to explain what happens in the buffer content and what the C source is supposed to do: For reference: https://github.com/dvidelabs/flatcc/blob/master/doc/binary-format.md#example The soffset read function discussed above fetches the offset from one field in the vtable, and the compiler is supposed to optimize that read to relative to the vtable start very efficiently. It seems too efficiently in this case.
|
OK, I downloaded the eabi-none toolchain and tested, not the PR but the original zip project you provided as I had that at hand already, and it turns out to be rather useful for modifying the flatcc library code for testing. However, I modified the buffer type to be aligned, as you also have done in the PR branch. I get the assembly you documented, i.e. I can reproduce. I then modified the function that reads the soffset to find the vtable location to make it use memcpy: static inline flatbuffers_soffset_v2_t __flatbuffers_soffset_read_from_pe_v2(const void* p) {
#if 1 /* modified version */
flatbuffers_soffset_v2_t word;
memcpy(&word, p, sizeof(word));
return __flatbuffers_soffset_cast_from_pe_v2(word);
#else /* original flatcc library code */
return __flatbuffers_soffset_cast_from_pe_v2(*(flatbuffers_soffset_v2_t*)p);
#endif
} And with the compile flags you provided, that is -O2 enabled, I get the following, which superficially seems to be better. But the memcpy is horribly ineffecient. The code does none of the memcpy is a cast optimizations, which is also one reason I did not make that change to accessors early - too many compilers that might do that slow. 00000000 <interface_encoding_populate_general_rsp>:
0: b570 push {r4, r5, r6, lr}
2: 4c15 ldr r4, [pc, #84] @ (58 <interface_encoding_populate_general_rsp+0x58>)
4: b08c sub sp, #48 @ 0x30
6: 4606 mov r6, r0
8: f10d 0c08 add.w ip, sp, #8
c: ad05 add r5, sp, #20
e: cc0f ldmia r4!, {r0, r1, r2, r3}
10: e8ac 000f stmia.w ip!, {r0, r1, r2, r3}
14: cc0f ldmia r4!, {r0, r1, r2, r3}
16: e8ac 000f stmia.w ip!, {r0, r1, r2, r3}
1a: e894 0003 ldmia.w r4, {r0, r1}
1e: 2204 movs r2, #4
20: e88c 0003 stmia.w ip, {r0, r1}
24: 4629 mov r1, r5
26: eb0d 0002 add.w r0, sp, r2
2a: f7ff fffe bl 0 <memcpy>
2e: 9b01 ldr r3, [sp, #4]
30: 1aed subs r5, r5, r3
32: 882b ldrh r3, [r5, #0]
34: 2b07 cmp r3, #7 I would say that this isn't really an acceptable solution. |
If I do not modify the soffset reader, but change the build target: # original flags
#FLAGS="-O2 -Wall -Wextra -ffreestanding -mcpu=cortex-m7 -mthumb -mabi=aapcs -std=c99"
FLAGS="-O2 -Wall -Wextra -ffreestanding -mthumb -mabi=aapcs -std=c99" then I get the following which looks more sane superficially: 00000000 <interface_encoding_populate_general_rsp>:
0: b5f0 push {r4, r5, r6, r7, lr}
2: b08b sub sp, #44 @ 0x2c
4: 4669 mov r1, sp
6: 4c0f ldr r4, [pc, #60] @ (44 <interface_encoding_populate_general_rsp+0x44>)
8: 000b movs r3, r1
a: 0022 movs r2, r4
c: cae0 ldmia r2!, {r5, r6, r7}
e: c3e0 stmia r3!, {r5, r6, r7}
10: cae0 ldmia r2!, {r5, r6, r7}
12: c3e0 stmia r3!, {r5, r6, r7}
14: cae0 ldmia r2!, {r5, r6, r7}
16: c3e0 stmia r3!, {r5, r6, r7}
18: 6812 ldr r2, [r2, #0]
1a: 601a str r2, [r3, #0]
1c: 888b ldrh r3, [r1, #4]
1e: 2b07 cmp r3, #7 |
I also don't get early stack reference with the original build flags, but cortex-m0 or cortext-m1 instead: FLAGS="-O2 -Wall -Wextra -ffreestanding -mcpu=cortex-m0 -mthumb -mabi=aapcs -std=c99"
#FLAGS="-O2 -Wall -Wextra -ffreestanding -mcpu=cortex-m1 -mthumb -mabi=aapcs -std=c99"
${GCC_PATH}/arm-none-eabi-gcc $FLAGS -o out.obj -c tempfiles.c
|
Regardless of who is to blame, I needed to warn about this issue in the cross compilation section. |
I think technically this line violates strict aliasing, but adding attribute((may_alias)) does not seem to help regardless of where I put it. return __flatbuffers_soffset_cast_from_pe_v2(*(flatbuffers_soffset_v2_t*)p); As mentioned, memcpy seems to work but optimizes poorly. If there is no better way to support reading two bytes from memory without conflicting with the C compiler, I'm not sure it is worth supporting. |
Using __builtin_memcpy static inline flatbuffers_soffset_v2_t __flatbuffers_soffset_read_from_pe_v2(const void* p) {
#if 0
flatbuffers_soffset_v2_t word;
memcpy(&word, p, sizeof(word));
return __flatbuffers_soffset_cast_from_pe_v2(word);
#elif 1
flatbuffers_soffset_v2_t word;
__builtin_memcpy(&word, p, sizeof(word));
return __flatbuffers_soffset_cast_from_pe_v2(word);
#elif 0
flatbuffers_soffset_v2_t word;
word = ((int32_t)((const char *)p)[0]) | ((int32_t)((const char *)p)[1]) << 16 |
((int32_t)((const char *)p)[2]) << 16 | ((int32_t)((const char *)p)[3]) << 24;
return word;
#else
return __flatbuffers_soffset_cast_from_pe_v2(*(flatbuffers_soffset_v2_t*)p);
#endif
} 00000000 <interface_encoding_populate_general_rsp>:
0: b510 push {r4, lr}
2: 4c0a ldr r4, [pc, #40] @ (2c <interface_encoding_populate_general_rsp+0x2c>)
4: b08a sub sp, #40 @ 0x28
6: 4686 mov lr, r0
8: 46ec mov ip, sp
a: cc0f ldmia r4!, {r0, r1, r2, r3}
c: e8ac 000f stmia.w ip!, {r0, r1, r2, r3}
10: cc0f ldmia r4!, {r0, r1, r2, r3}
12: e8ac 000f stmia.w ip!, {r0, r1, r2, r3}
16: e894 0003 ldmia.w r4, {r0, r1}
1a: 2228 movs r2, #40 @ 0x28
1c: e88c 0003 stmia.w ip, {r0, r1}
20: 4669 mov r1, sp
22: 4670 mov r0, lr
24: f7ff fffe bl 0 <memcpy>
28: b00a add sp, #40 @ 0x28
2a: bd10 pop {r4, pc}
2c: 00000000 .word 0x00000000 |
I added a new branch fix-accessors with currently two commits It adds pmemaccess.h to provide __builtin_memcpy and other methods to read and write memory. Note that pmemaccess still uses pointer casts on x86/64 since it has worked so far, and memcpy might not work well on all platforms. There is a related issue with punaligned.h that reads unaligned memory, mostly for JSON parsing. It already does something similar with pointer access on x86/64 and not on other platforms. |
Just for the record: @RafaelLaya commented directly on the fix-accessors commits. Also, I'm going to do some further updates to call mem_copy_read_nn and mem_copy_write_nn directly from flatcc_accessors, instead of calling mem_copy_word directly. This will make it possible to use pointer casts like The behaviour can also be controlled by -DPORTABLE_MEM_PTR_ACCESS=0 (or 1), and also by defining mem_copy_word before header inclusion. This might be relevant on some targets, since this area of operation is extremely performance sensitive, and also code size sensitive. |
I have now update the fix-accessors branch. I needed to add mem_read/write_float_32/64 as well. It passes std CI tests. If you are happy with the design, can you please confirm that it solves the issue on actual -mcu=cortext-m7 -O2 compiled targets, if possible? I still think this is a compiler issue, it is not practical to disallow C code with simple pointer memory access, but at least we appear to have a workable solution that is not too ugly. |
Sorry I reviewed last night and forgot to respond here. I like that you added customizability of mem_copy_word and yes I'm happy with the design. I'm sure from the assembly it should be fixed now, but I'll try to check on a target just in case as you suggested
I do agree the compiler is being extremely aggressive here, undesirably so I didn't feel like I had a strong case to file a gcc bug report, since the first thing they have in bold letter is a warning that if Something that I can't seem to ignore is that we are first initializing the buffer and then passing it into the as_Root() function and then as_Data(). So regardless of any aliasing issues inside as_Root() or as_Data(), it should at the very least finish initialization before doing the calls. Wouldn't you agree? Meaning, I think the compiler has the right to provide UB inside these calls since strict-aliasing is not respected. However, I feel like outside the call, the initialization should've finished before doing the calls that require the buffer as a parameter (regardless of inlining) |
Update: Tested on an M7 Target. The issue is gone now and didn't observe any regressions :) -- Thanks a lot! |
Thanks for testing. I will merge this.
It is a core feature of flatcc to isolate platform dependencies in the portable library, and to run well on many targets, so it is a natural extension of other similar code, like endian conversion and aligned allocation.
I agree. If you read earlier comment of mine I write that I cannot see any scenario where the compiler should dereference uninitialized stack content it is responsible for. Having written some compiler backend logic, I can speculate that the compiler injects its own initializaition into the execution graph, then does liveness checking (dependency tracking), then culls any dependency links that violates strict aliasing, then finds two parallel root executation paths that are equally good: read from stack, or initialize stack. However, this has less to do with optimization of the final output and more to do with how the compiler goes around achieving this. |
Merged. I updated notes in CHANGELOG 0.6.2, README cross compilation section. |
For the record, the following two commits on master implemented the fix: pmemaccess.h: flatcc_accessors.h: The old (by Intel) deprecated Intel ICC compiler stopped working after these changes unless -O3 is dropped to -O2. We are not sure why, and might just deprecate it from FlatCC weekly builds, but for the record, the only other change between successful and broken weekly build is an update to pstdbool.h: 81d4ba6 |
ICC build fixed in commit: 7625a37 A related commit just cleans up bocus ICC warnings: 56d2559 |
Added documentation section: https://github.com/dvidelabs/flatcc?tab=readme-ov-file#strict-aliasing |
bug_for_flatcc.zip
Hi everyone! This is my first bug report contribution into the FlatCC project. Thanks for making such an awesome project! Here are the details:
Files attached
Explanation of the source code provided in tempfiles.c
The code in question is in the function interface_encoding_populate_general_rsp(). writes a message determined at compile-time into an array in the stack, and then finishes the message and copies into a buffer given by the caller.
If one steps through the code, essentially the following happens:
From the initialization of msg_array we can see everything should be initialized correctly, then Response_as_root_v2() and Reponse_data_v2() will work exactly as described above.
Actual Issue
The compiler smartly determines that it needs to look at msg_array[4] and check if it is less than 8. However, it reads msg_array[4] into R5 which has garbage in the stack. Then it initializes msg_array in the stack, and then compares R5 with 7 (branches to assert if less-or-equal). The problem is that by the time the comparison is done, the array is partly initialized and R5 has garbage from the stack before the initialization of the array
a: f8bd 5004 ldrh.w r5, [sp, #4]
e: cc0f ldmia r4!, {r0, r1, r2, r3}
10: e8ac 000f stmia.w ip!, {r0, r1, r2, r3}
14: cc0f ldmia r4!, {r0, r1, r2, r3}
16: 2d07 cmp r5, #7
Schema
The relevant portion looks like this: A table that requires an union inside, and the union has one of many tables
table MyTableResponse{
// Some integers
}
union ResponseData {
// A bunch of tables
/// Generic response (success w/ no payload or error).
MyTable: MyTableResponse,
}
table Response {
data: ResponseData (required);
}
Background
I found this by Aardappel which seems to be an user on stackoverflow who is highly involved with the project:
https://stackoverflow.com/questions/24330925/does-flatbuffers-avoid-strict-aliasing-somehow
Workarounds
The text was updated successfully, but these errors were encountered: