Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

restore demand paging PR + PC fixes #31564

Merged
merged 42 commits into from
Jan 24, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
e1a6394
arch: add KERNEL_VM_OFFSET
Dec 18, 2020
e2ec646
linker-defs: add syms for kernel image bounds
Dec 10, 2020
ce23cff
x86: linker: define z_mapped_* symbols
Dec 10, 2020
96cb7e7
arm64: linker: define z_mapped_* symbols
Jan 12, 2021
266d584
tests: x86: pagetables: pass if userspace disabled
Dec 17, 2020
7f7f038
kernel: add CONFIG_ARCH_HAS_RESERVED_PAGE_FRAMES
Nov 18, 2020
7fb757e
kernel: add page frame management
Dec 9, 2020
164edf5
kernel: add k_mem_map() interface
Dec 17, 2020
cb1d1ae
mmu: add k_mem_free_get()
Dec 18, 2020
6a24a8f
x86: reserve the first megabyte
Dec 10, 2020
483da8e
x86: tests: pagetables: fix assumptions
Dec 17, 2020
41afc11
arch: remove KERNEL_RAM_SIZE
Dec 17, 2020
5d125d2
newlib: memory-map the heap, cleanups
Dec 18, 2020
8e07bf2
newlib: clamp max heap size on MMU systems
Jan 14, 2021
0fdd948
x86: only map the kernel image
Dec 10, 2020
cd28d36
qemu_x86_tiny: don't use first megabyte at all
Dec 15, 2020
e9c93b8
x86: pre-allocate address space
Dec 18, 2020
137464a
mmu: arch_mem_map() may no longer fail
Dec 18, 2020
671687d
mmu: ensure gperf data is mapped
Dec 18, 2020
e26f6ad
arch: add CONFIG_DEMAND_PAGING
Nov 16, 2020
c2f4bae
kernel: add demand paging arch interfaces
Nov 17, 2020
66e2676
demand_paging: add infra for demand paging modules
Nov 30, 2020
32ab69a
kernel: add demand paging internal interfaces
Dec 10, 2020
5d5a4e8
kernel: add app-facing demand paging APIs
Dec 9, 2020
d62577e
kernel: add demand paging implementation
Dec 10, 2020
f19583e
tests: mem_map: pin test pages
Dec 10, 2020
9423c13
demand_paging: add NRU algorithm
Dec 9, 2020
a3d6438
demand_paging: add RAM-based demo backing store
Dec 9, 2020
4b21090
x86: implement demand paging APIs
Dec 9, 2020
fb96aa9
qemu_x86_tiny: enable demand paging
Dec 9, 2020
6a4e7c5
mmu: pin the whole kernel
Dec 19, 2020
a6022ce
tests: add basic k_mem_map() test
Jan 11, 2021
bdf5c8b
tests: add intial demand paging testcase
Jan 11, 2021
324bab1
CODEOWNERS: add demand paging subdir
Jan 14, 2021
3061d2e
kernel: add z_num_pagefaults_get()
Jan 14, 2021
b2e3e83
tests: demand_paging: add more API tests
Jan 14, 2021
8667075
mmu: backing stores reserve page fault room
Jan 15, 2021
d612aa9
tests: context: disable if DEMAND_PAGING
Jan 15, 2021
da03a84
mmu: promote public APIs
Jan 21, 2021
05b8a42
kernel: add CONFIG_ARCH_MAPS_ALL_RAM
Jan 23, 2021
481bed3
boards: x86: increase VM size on PC-like
Jan 23, 2021
9cc440e
x86: map all RAM if ACPI
Jan 23, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CODEOWNERS
Original file line number Diff line number Diff line change
Expand Up @@ -542,6 +542,7 @@
/subsys/dfu/ @nvlsianpu
/subsys/tracing/ @nashif @wentongwu
/subsys/debug/asan_hacks.c @vanwinkeljan @aescolar @daor-oti
/subsys/demand_paging/ @andrewboie
/subsys/disk/disk_access_spi_sdhc.c @JunYangNXP
/subsys/disk/disk_access_sdhc.h @JunYangNXP
/subsys/disk/disk_access_usdhc.c @JunYangNXP
Expand Down
116 changes: 89 additions & 27 deletions arch/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,7 @@ config X86
select ARCH_HAS_GDBSTUB if !X86_64
select ARCH_HAS_TIMING_FUNCTIONS
select ARCH_HAS_THREAD_LOCAL_STORAGE
select ARCH_HAS_DEMAND_PAGING
help
x86 architecture

Expand Down Expand Up @@ -517,6 +518,40 @@ config CPU_HAS_MMU
This hidden option is selected when the CPU has a Memory Management Unit
(MMU).

config ARCH_HAS_DEMAND_PAGING
bool
help
This hidden configuration should be selected by the architecture if
demand paging is supported.

config ARCH_HAS_RESERVED_PAGE_FRAMES
bool
help
This hidden configuration should be selected by the architecture if
certain RAM page frames need to be marked as reserved and never used for
memory mappings. The architecture will need to implement
arch_reserved_pages_update().

config ARCH_MAPS_ALL_RAM
bool
help
This hidden option is selected by the architecture to inform the kernel
that all RAM is mapped at boot, and not just the bounds of the Zephyr image.
If RAM starts at 0x0, the first page must remain un-mapped to catch NULL
pointer dereferences. With this enabled, the kernel will not assume that
virtual memory addresses past the kernel image are available for mappings,
but instead takes into account an entire RAM mapping instead.

This is typically set by architectures which need direct access to all memory.
It is the architecture's responsibility to mark reserved memory regions
as such in arch_reserved_pages_update().

Although the kernel will not disturb this RAM mapping by re-mapping the associated
virtual addresses elsewhere, this is limited to only management of the
virtual address space. The kernel's page frame ontology will not consider
this mapping at all; non-kernel pages will be considered free (unless marked
as reserved) and Z_PAGE_FRAME_MAPPED will not be set.

menuconfig MMU
bool "Enable MMU features"
depends on CPU_HAS_MMU
Expand All @@ -533,29 +568,18 @@ config MMU_PAGE_SIZE
support multiple page sizes, put the smallest one here.

config KERNEL_VM_BASE
hex "Base virtual address to link the kernel"
hex "Virtual address space base address"
default $(dt_chosen_reg_addr_hex,$(DT_CHOSEN_Z_SRAM))
help
Define the base virtual memory address for the core kernel.

The kernel expects a mappings for all physical RAM regions starting at
this virtual address, with any unused space up to the size denoted by
KERNEL_VM_SIZE available for memory mappings. This base address denotes
the start of the RAM mapping and may not be the base address of the
kernel itself, but the offset of the kernel here will be the same as the
offset from the beginning of physical memory where it was loaded.

If there are multiple physical RAM regions which are discontinuous in
the physical memory map, they should all be mapped in a continuous
virtual region, with bounds defined by KERNEL_RAM_SIZE.
Define the base of the kernel's address space.

By default, this is the same as the DT_CHOSEN_Z_SRAM physical base SRAM
address from DTS, in which case RAM will be identity-mapped. Some
architectures may require RAM to be mapped in this way; they may have
just one RAM region and doing this makes linking much simpler, as
at least when the kernel boots all virtual RAM addresses are the same
as their physical address (demand paging at runtime may later modify
this for some subset of non-pinned pages).
this for non-pinned page frames).

Otherwise, if RAM isn't identity-mapped:
1. It is the architecture's responsibility to transition the
Expand All @@ -568,34 +592,72 @@ config KERNEL_VM_BASE
double-linking of paging structures to make the instruction pointer
transition simpler).

config KERNEL_RAM_SIZE
hex "Total size of RAM mappings in bytes"
default $(dt_chosen_reg_size_hex,$(DT_CHOSEN_Z_SRAM))
Zephyr does not implement a split address space and if multiple
page tables are in use, they all have the same virtual-to-physical
mappings (with potentially different permissions).

config KERNEL_VM_OFFSET
hex "Kernel offset within address space"
default 0
help
Indicates to the kernel the total size of RAM that is mapped. The
kernel expects that all physical RAM has a memory mapping in the virtual
address space, and that these RAM mappings are all within the virtual
region [KERNEL_VM_BASE..KERNEL_VM_BASE + KERNEL_RAM_SIZE).
Offset that the kernel image begins within its address space,
if this is not the same offset from the beginning of RAM.

Some care may need to be taken in selecting this value. In certain
build-time cases, or when a physical address cannot be looked up
in page tables, the equation:

virt = phys + ((KERNEL_VM_BASE + KERNEL_VM_OFFSET) -
SRAM_BASE_ADDRESS)

Will be used to convert between physical and virtual addresses for
memory that is mapped at boot.

This uncommon and is only necessary if the beginning of VM and
physical memory have dissimilar alignment.

config KERNEL_VM_SIZE
hex "Size of kernel address space in bytes"
default 0xC0000000
default 0x800000
help
Size of the kernel's address space. Constraining this helps control
how much total memory can be used for page tables.

The difference between KERNEL_RAM_SIZE and KERNEL_VM_SIZE indicates the
The difference between KERNEL_VM_BASE and KERNEL_VM_SIZE indicates the
size of the virtual region for runtime memory mappings. This is needed
for mapping driver MMIO regions, as well as special RAM mapping use-cases
such as VSDO pages, memory mapped thread stacks, and anonymous memory
mappings.
mappings. The kernel itself will be mapped in here as well at boot.

The system currently assumes all RAM can be mapped in the virtual address
space. Systems with very large amounts of memory (such as 512M or more)
Systems with very large amounts of memory (such as 512M or more)
will want to use a 64-bit build of Zephyr, there are no plans to
implement a notion of "high" memory in Zephyr to work around physical
RAM which can't have a boot-time mapping due to a too-small address space.
RAM size larger than the defined bounds of the virtual address space.

config DEMAND_PAGING
bool "Enable demand paging [EXPERIMENTAL]"
depends on ARCH_HAS_DEMAND_PAGING
help
Enable demand paging. Requires architecture support in how the kernel
is linked and the implementation of an eviction algorithm and a
backing store for evicted pages.

if DEMAND_PAGING
config DEMAND_PAGING_ALLOW_IRQ
bool "Allow interrupts during page-ins/outs"
help
Allow interrupts to be serviced while pages are being evicted or
retrieved from the backing store. This is much better for system
latency, but any code running in interrupt context that page faults
will cause a kernel panic. Such code must work with exclusively pinned
code and data pages.

The scheduler is still disabled during this operation.

If this option is disabled, the page fault servicing logic
runs with interrupts disabled for the entire operation. However,
ISRs may also page fault.
endif # DEMAND_PAGING
endif # MMU

menuconfig MPU
Expand Down
36 changes: 13 additions & 23 deletions arch/x86/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,7 @@ endchoice

config ACPI
bool "ACPI (Advanced Configuration and Power Interface) support"
select ARCH_MAPS_ALL_RAM
help
Allow retrieval of platform configuration at runtime.

Expand Down Expand Up @@ -189,29 +190,6 @@ config X86_MMU
and creates a set of page tables at boot time that is runtime-
mutable.

config X86_MMU_PAGE_POOL_PAGES
int "Number of pages to reserve for building page tables"
default 0
depends on X86_MMU
help
Define the number of pages in the pool used to allocate page table
data structures at runtime.

Pages might need to be drawn from the pool during memory mapping
operations, unless the address space has been completely pre-allocated.

Pages will need to drawn from the pool to initialize memory domains.
This does not include the default memory domain if KPTI=n.

The specific value used here depends on the size of physical RAM,
how much additional virtual memory will be mapped at runtime, and
how many memory domains need to be initialized.

The current suite of Zephyr test cases may initialize at most two
additional memory domains besides the default domain.

Unused pages in this pool cannot be used for other purposes.

config X86_COMMON_PAGE_TABLE
bool "Use a single page table for all threads"
default n
Expand All @@ -224,6 +202,18 @@ config X86_COMMON_PAGE_TABLE
page tables in place. This is much slower, but uses much less RAM
for page tables.

config X86_MAX_ADDITIONAL_MEM_DOMAINS
int "Maximum number of memory domains"
default 3
depends on X86_MMU && USERSPACE && !X86_COMMON_PAGE_TABLE
help
The initial page tables at boot are pre-allocated, and used for the
default memory domain. Instantiation of additional memory domains
if common page tables are in use requires a pool of free pinned
memory pages for constructing page tables.

Zephyr test cases assume 3 additional domains can be instantiated.

config X86_NO_MELTDOWN
bool
help
Expand Down
39 changes: 39 additions & 0 deletions arch/x86/core/fatal.c
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
#include <exc_handle.h>
#include <logging/log.h>
#include <x86_mmu.h>
#include <mmu.h>
LOG_MODULE_DECLARE(os, CONFIG_KERNEL_LOG_LEVEL);

#if defined(CONFIG_BOARD_QEMU_X86) || defined(CONFIG_BOARD_QEMU_X86_64)
Expand Down Expand Up @@ -359,6 +360,44 @@ static const struct z_exc_handle exceptions[] = {

void z_x86_page_fault_handler(z_arch_esf_t *esf)
{
#ifdef CONFIG_DEMAND_PAGING
if ((esf->errorCode & PF_P) == 0) {
/* Page was non-present at time exception happened.
* Get faulting virtual address from CR2 register
*/
void *virt = z_x86_cr2_get();
bool was_valid_access;

#ifdef CONFIG_X86_KPTI
/* Protection ring is lowest 2 bits in interrupted CS */
bool was_user = ((esf->cs & 0x3) != 0U);

/* Need to check if the interrupted context was a user thread
* that hit a non-present page that was flipped due to KPTI in
* the thread's page tables, in which case this is an access
* violation and we should treat this as an error.
*
* We're probably not locked, but if there is a race, we will
* be fine, the kernel page fault code will later detect that
* the page is present in the kernel's page tables and the
* instruction will just be re-tried, producing another fault.
*/
if (was_user &&
!z_x86_kpti_is_access_ok(virt, get_ptables(esf))) {
was_valid_access = false;
} else
#else
{
was_valid_access = z_page_fault(virt);
}
#endif /* CONFIG_X86_KPTI */
if (was_valid_access) {
/* Page fault handled, re-try */
return;
}
}
#endif /* CONFIG_DEMAND_PAGING */

#if !defined(CONFIG_X86_64) && defined(CONFIG_DEBUG_COREDUMP)
z_x86_exception_vector = IV_PAGE_FAULT;
#endif
Expand Down
Loading