Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x86_64 architecture #9522

Merged
merged 18 commits into from
Jan 11, 2019
Merged

x86_64 architecture #9522

merged 18 commits into from
Jan 11, 2019

Conversation

andyross
Copy link
Contributor

x86_64 architecture support

This patch adds a x86_64 architecture and qemu_x86_64 board to Zephyr.
Only the basic architecture support needed to run 64 bit code is
added; no drivers are added, though a low-level console exists and is
wired to printk().

The support is built on top of a "X86 underkernel" layer, which can be
built in isolation as a unit test on a Linux host.

Limitations:

  • Right now the SDK lacks an x86_64 toolchain. The architecture has
    to be built on a Linux host with ZEPHYR_TOOLCHAIN_VARIANT=host.

  • No x87/SSE/AVX usage is allowed. This is a stronger limitation than
    other architectures where the instructions work from one thread even
    if the context switch code doesn't support it. We are passing
    -no-sse to prevent gcc from automatically generating SSE
    instructions for non-floating-point purposes, which has the side
    effect of changing the ABI. Future work to handle the FPU registers
    will need to be combined with an "application" ABI distinct from the
    kernel one (or just to require USERSPACE).

  • Paging is enabled (it has to be in long mode), but is a 1:1 mapping
    of all memory. No MMU/USERSPACE support yet.

  • We are building with -mno-red-zone for stack size reasons, but this
    is a valuable optimization. Enabling it requires automatic stack
    switching, which requires a TSS, which means it has to happen after
    MMU support.

  • The OS runs in 64 bit mode, but for compatibility reasons is
    compiled to the 32 bit "X32" ABI. So while the full 64 bit
    registers and instruction set are available, C pointers are 32 bits
    long and Zephyr is constrained to run in the bottom 4G of memory.

Bugs to be fixed before merge:

  • The "make run" support for qemu broke recently. This relies on an
    existing qemu_machine_hack.py script to change the ELF binary type
    of the file, which worked for a while and fails now. For now it
    requires a manual conversion with objcopy (which arguably should be
    the mechanism of record even on IAMCU):

    objcopy -O binary zephyr.elf ztmp.bin
    echo '.incbin "ztmp.bin"' | as --32 -c - -o ztmp.o
    gcc -m32 -T linker.cmd -Wl,--build-id=none -nostdlib -nodefaultlibs -nostartfiles -o zephyr.elf -fno-pic ztmp.o

  • All CPUs come up and enter Zephyr, but each is hanging on the first
    context swtich. I have this reproduced in the xuk unit test, will
    fix.

  • Each CPU requires a separate timer interrupt vector because of a
    thinko with the way the xuk IRQ API was written (timer interrupts
    are in some sense "per-CPU devices", but still should share the same
    vector)

  • I need to write up docs for the xuk layer spelling out the way the
    layering works.

@@ -0,0 +1,4 @@
gsource "arch/x86_64/soc/*/Kconfig.soc"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could use a plain source here now. I recently improved how globbing is handled, and gsource is only supported for backwards compatibility now: #9415 (the commit message has a motivation)

Copy link
Collaborator

@ulfalizer ulfalizer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kconfig nit fixed

@codecov-io
Copy link

codecov-io commented Sep 4, 2018

Codecov Report

Merging #9522 into master will increase coverage by 0.06%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #9522      +/-   ##
==========================================
+ Coverage   48.27%   48.33%   +0.06%     
==========================================
  Files         295      295              
  Lines       44283    44290       +7     
  Branches    10601    10606       +5     
==========================================
+ Hits        21377    21408      +31     
+ Misses      18637    18612      -25     
- Partials     4269     4270       +1
Impacted Files Coverage Δ
kernel/stack.c 94.44% <ø> (ø) ⬆️
kernel/queue.c 94.01% <ø> (ø) ⬆️
include/device.h 100% <ø> (ø) ⬆️
kernel/pipes.c 96.62% <ø> (ø) ⬆️
kernel/msg_q.c 96.11% <ø> (ø) ⬆️
kernel/include/kswap.h 100% <ø> (ø) ⬆️
include/arch/bits_portable.h 100% <100%> (ø)
drivers/clock_control/nrf5_power_clock.c 50% <0%> (-2.11%) ⬇️
lib/fdtable.c 28.88% <0%> (-0.33%) ⬇️
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4426bfd...c1971a3. Read the comment docs.

@andyross andyross force-pushed the x86_64 branch 3 times, most recently from d1db0d9 to bc4436f Compare September 10, 2018 20:01
@aescolar
Copy link
Member

One silly question: Is this not a bit far away from Zephyr's target of resource constrained embedded systems?

@andyross
Copy link
Contributor Author

Yeah, it's very unlikely anyone will be running a Zephyr app on a PC for production use. The original impetus for this port was to have a virtualized SMP platform for testing under Qemu (right now we can only do SMP on esp32 hardware). It also helps to have a simplified architecture vs. the existing x86 layer which has a bunch of historical cruft. It's also a good testbed to start work on making the Zephyr source 64 bit clean (which isn't happening yet -- this port uses the x32 ABI so pointers are still 32 bit even though the registers are 64 bits wide).

But longer term, who knows. A tiny Zephyr layer running on IA hardware opens the door to "Zephyr as bootloader" and maybe "Zephyr as hypervisor", etc...

@nashif
Copy link
Member

nashif commented Dec 3, 2018

welcome back :)

Andy Ross added 18 commits January 11, 2019 10:09
These files were using z_thread_malloc() without including
kernel_internal.h.  On existing architectures that works due to
transitive includes, but x86_64 has a thinner include layer and
doesn't do it for us.  Include the files required for the APIs we use.

Signed-off-by: Andy Ross <[email protected]>
This was apparently intended to allow for per-arch linker includes,
but no such includes ever existed.  All it does is senselessly throw
an error on unrecognized architectures. Yank.

Signed-off-by: Andy Ross <[email protected]>
The HPET default is to deliver events on the same INTIn as the legacy
PIT IRQ, and in fact our code requires that because it uses the
"legacy routing" option.  So this isn't really a configurable and has
to be set correctly.  Do it right in the kconfig default instead of
forcing boards to set it.

(No, I have no idea where "20" came from either.)

Signed-off-by: Andy Ross <[email protected]>
I was half way through typing up my own one of these when I realized
there was one already in the tree.  Move it to a shared header.

(FWIW: I really doubt that most architectures actually benefit from
their own versions of these tools -- GCC's optimizer is really good,
and custom assembly defeats optimization and factorizations of the
expressions in context.)

Signed-off-by: Andy Ross <[email protected]>
These files were relying on _thread_essential_set() from
kernel_internal.h, but not including it directly.  New architectures
won't transitively include things the same way.

Signed-off-by: Andy Ross <[email protected]>
It's worth using custom timing information on a few systems to save
cycles or gain precision.  But make the use of k_cycle_get_32() a
proper default instead of hardcoding all the platforms and failing to
build on new ones.  On Xtensa and RISC-V (and now x86_64) the cycle
informatoin from that call is a very fast wrapper around the native
counters anyway -- all you would save would be the function call
overhead.

Signed-off-by: Andy Ross <[email protected]>
Host toolchains don't tend to provide an x32 libgcc.  But we don't
actually need one for existing code.  This is fragile, but better to
work for all but obscure cases than break outright.

Signed-off-by: Andy Ross <[email protected]>
If we don't have a detected cross compiler for x86_64, use the host
compiler instead.

Signed-off-by: Andy Ross <[email protected]>
This patch adds a x86_64 architecture and qemu_x86_64 board to Zephyr.
Only the basic architecture support needed to run 64 bit code is
added; no drivers are added, though a low-level console exists and is
wired to printk().

The support is built on top of a "X86 underkernel" layer, which can be
built in isolation as a unit test on a Linux host.

Limitations:

+ Right now the SDK lacks an x86_64 toolchain.  The build will fall
  back to a host toolchain if it finds no cross compiler defined,
  which is tested to work on gcc 8.2.1 right now.

+ No x87/SSE/AVX usage is allowed.  This is a stronger limitation than
  other architectures where the instructions work from one thread even
  if the context switch code doesn't support it.  We are passing
  -no-sse to prevent gcc from automatically generating SSE
  instructions for non-floating-point purposes, which has the side
  effect of changing the ABI.  Future work to handle the FPU registers
  will need to be combined with an "application" ABI distinct from the
  kernel one (or just to require USERSPACE).

+ Paging is enabled (it has to be in long mode), but is a 1:1 mapping
  of all memory.  No MMU/USERSPACE support yet.

+ We are building with -mno-red-zone for stack size reasons, but this
  is a valuable optimization.  Enabling it requires automatic stack
  switching, which requires a TSS, which means it has to happen after
  MMU support.

+ The OS runs in 64 bit mode, but for compatibility reasons is
  compiled to the 32 bit "X32" ABI.  So while the full 64 bit
  registers and instruction set are available, C pointers are 32 bits
  long and Zephyr is constrained to run in the bottom 4G of memory.

Signed-off-by: Andy Ross <[email protected]>
The call to _arch_switch is a giant screaming sign inviting optimizer
bugs.  The code that appears before is what happened long ago when we
were switched out, but the version that EXECUTED just now is actually
in a different thread.  So the assignment to _current before the
switch actually assigned OUR thread (the "new_thread" of the old
context!) to _current.

But obviously the optimizer looks at that code and assumes that the
_current which got assigned to the thread we were switching to long
ago is still correct, and used it when retrieving the swap return
value.

Obviously the real bug here is that the _arch_switch() in question
lacked a memory clobber (and it's getting one).

But we can remove two lines, remove code from inside the interrupt
lock and make the implementation more robust by moving the read to
after the irq_unlock() (which generally also has a memory clobber).

Signed-off-by: Andy Ross <[email protected]>
When tickless was disabled, this inverted test would never fire the
first interrupt and the timer would be silent.  Just remove it.
There's no harm in unconditionally enabling a single timer interrupt
at boot.

Signed-off-by: Andy Ross <[email protected]>
No MPU support there yet.  This test should really be predicated on a
kconfig variable, not architecture.

Signed-off-by: Andy Ross <[email protected]>
This is intended to be a value set by the platform to adjust the size
of stacks created by tests.  This test was setting it explicitly, and
failing to honor it when creating its own stacks.

Signed-off-by: Andy Ross <[email protected]>
This architecture doesn't support stack canaries.  In fact the gcc
-fstack-protect features don't seem to be working at all.  I'm
guessing it's an x32 ABI mismatch?

Signed-off-by: Andy Ross <[email protected]>
Stacks created by tests should add this amount so thread-hungry
architectures can tune it.

Signed-off-by: Andy Ross <[email protected]>
There is actually nothing wrong with this test code idiom.  But it's
tickling a qemu emulator bug with the hpet driver and x86_64[1].  The
rapidly spinning calls to k_uptime_get_32() need to disable
interrupts, read timer hardware state and enable them.  Something goes
wrong in qemu with this process and the timer interrupt gets lost.
The counter blows right past the comparator without delivering its
interrupt, and thus the interrupt won't be delivered until the counter
is next reset in idle after exit from the busy loop, which is
obviously too late to interrupt the timeslicing thread.

Just replace the loops with a single call to k_busy_wait().  The
resulting code ends up being much simpler anyway.  An added bonus is
that we can remove the special case handling for native_posix (which
was an entirely unrelated thing, but with a similar symptom).

[1] But oddly not the same emulated hardware running with the same
driver under the same qemu binary when used with a 32 bit kernel.

Signed-off-by: Andy Ross <[email protected]>
This builds with a host compiler, not one from the SDK, and so no
newlib library is available.  There is work to enable newlib detection
at and above the cmake level.  This patch can be reverted when that
lands.

Signed-off-by: Andy Ross <[email protected]>
These two tests are hitting a stack overflow on x86_64 (not entirely
surprisingly), but can't just increase stack size because there is an
assert in the CMSIS compatibility layer that stacks be under 512
bytes.  Just disable for now.

Signed-off-by: Andy Ross <[email protected]>
@andyross
Copy link
Contributor Author

Rebased and pushed. Just does a simple whitelisting for the newlib stuff. One extra change needed as some CMSIS tests started overflowing their stacks since the last version.

Note again (as the earlier note is lost somewhere above) that this will continue to show two checkpatch errors (also three warnings). These are false positives: it's misparsing the ":" of a bitfield as a label (probably because it's on the same line as an inline enum declaration) and demanding it not have any whitespace before it.

@nashif nashif merged commit c2c9265 into zephyrproject-rtos:master Jan 11, 2019
# When building in x32 mode with a host compiler, there is no libgcc
# shipped (because it's an x86_64 compiler, not x32). That's
# actually non-fatal, as no known features we hit in existing code
# actually require the library. But I can't find an exaustive list
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be more specific after a quick chat with @andyross : "no known libgcc features we hit in 64bits".
On the other hand 32bits x86 definitely requires -lgcc right now, see fix for that in PR #12674

Copy link
Collaborator

@marc-hb marc-hb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I just added -lgcc to fix 32bits x86 in PR #12674, I broke this [qemu_]x86_64 support in the ZEPHYR_TOOLCHAIN_VARIANT=host case. Tentative fix in PR #12805, thanks in advance for reviewing it.

As you can see in my other comments below I'm quite confused about which toolchain configuration(s) this was tested with.

# but falling back to using the host toolchain is a very sane behavior
# too.
if(CONFIG_X86_64)
if(CMAKE_C_COMPILER STREQUAL CMAKE_C_COMPILER-NOTFOUND)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andyross did you actually test this fallback? I can run it and make it work but only after some serious hacking because the same check in compiler/gcc/generic.cmake runs first and aborts cmake first before this target.cmake file is even looked at.

In the cover letter of this PR you also wrote:

The architecture has to be built on a Linux host with ZEPHYR_TOOLCHAIN_VARIANT=host.

ZEPHYR_TOOLCHAIN_VARIANT=host means no CROSS_COMPILE and none of this compiler/gcc/target.cmake code is used at all! (compiler/host-gcc/target.cmake is used instead)
Were you testing both ZEPHYR_TOOLCHAIN_VARIANT=host and cross-compile?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fallback has just been removed: 33ca224ffdda90

# too.
if(CONFIG_X86_64)
if(CMAKE_C_COMPILER STREQUAL CMAKE_C_COMPILER-NOTFOUND)
find_program(CMAKE_C_COMPILER gcc )
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW find_program() hits the cmake var CACHE first. So if CROSS_COMPILE is incomplete/partially broken then some of these find_program() will have no effect and the CMAKE_* end results will be partially from CROSS_COMPILE and partially from say /usr/bin.

marc-hb added a commit to marc-hb/zephyr that referenced this pull request Jan 31, 2019
PR zephyrproject-rtos#9522 series ending with commit c2c9265 ("tests: cmsis: Disable
two cmsis portability tests on x86_64") added -mx32 support for the
x86_64 ARCH and qemu_x86_64. While this was implemented in
"compiler/gcc/target.cmake" as fall back from cross-compilation to the
host compiler, it worked with a direct ZEPHYR_TOOLCHAIN_VARIANT=host
too.

Later, -lgcc was added to "compiler/host-gcc/target.cmake" by PR zephyrproject-rtos#12674
to fix the -m32 x86 build. This broke the x86_64 build when using
ZEPHYR_TOOLCHAIN_VARIANT=host because even "multilib" packages usually
don't feature the -mx32 version of libgcc.

Fix this by excluding -lgcc in compiler/host-gcc/target.cmake just like
compiler/gcc/target.cmake always did for x86_64.

Signed-off-by: Marc Herbert <[email protected]>
nashif pushed a commit that referenced this pull request Jan 31, 2019
PR #9522 series ending with commit c2c9265 ("tests: cmsis: Disable
two cmsis portability tests on x86_64") added -mx32 support for the
x86_64 ARCH and qemu_x86_64. While this was implemented in
"compiler/gcc/target.cmake" as fall back from cross-compilation to the
host compiler, it worked with a direct ZEPHYR_TOOLCHAIN_VARIANT=host
too.

Later, -lgcc was added to "compiler/host-gcc/target.cmake" by PR #12674
to fix the -m32 x86 build. This broke the x86_64 build when using
ZEPHYR_TOOLCHAIN_VARIANT=host because even "multilib" packages usually
don't feature the -mx32 version of libgcc.

Fix this by excluding -lgcc in compiler/host-gcc/target.cmake just like
compiler/gcc/target.cmake always did for x86_64.

Signed-off-by: Marc Herbert <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants