You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Set KT2_ROUNDS to 1 and KT2_CHUNKS to 1000, to rule out the that the issue is caused by the unmapping/deallocation
The testcase will either result in an assertion in the PMM getting triggered or by locking-up
(optional) Fix the srand seed to one where you repeatedly observe the assertions
Expected behavior
The testcase just passes. Throwing more memory at the issue, by increasing .qemu_config:QEMU_RAM, has seemingly no impact.
Additional context
After a certain number of allocations, mfn_invalid suddenly starts to return false. This is caused by the underlying mbi_get_memory_range call. If you print out the _start and _end variable, you will notice that shortly before the assertions gets triggered, these ranges change. However, the elements of multiboot_mmap are never actively modified once initialized.
Patch:
diff --git a/arch/x86/boot/multiboot.c b/arch/x86/boot/multiboot.c
--- a/arch/x86/boot/multiboot.c
+++ b/arch/x86/boot/multiboot.c
@@ -234,12 +234,15 @@ int mbi_get_avail_memory_range(unsigned index, addr_range_t *r) {
int mbi_get_memory_range(paddr_t pa, addr_range_t *r) {
paddr_t _start, _end;
+ printk("%s()\n", __func__);
for (unsigned int i = 0; i < multiboot_mmap_num; i++) {
multiboot2_memory_map_t *entry = &multiboot_mmap[i];
_start = _paddr(entry->addr);
_end = _paddr(_start + entry->len);
+ printk("start: 0x%lx\n", _start);
+ printk("end: 0x%lx\n", _end);
if (pa >= _start && pa < _end)
goto found;
To better localize the issue, we put many BUG_ON(mfn_invalid(mfn)) into the pagetable code. The two interesting checks are following, as the first one passes while the second one gets triggers the assertion:
diff --git a/arch/x86/pagetables.c b/arch/x86/pagetables.c
--- a/arch/x86/pagetables.c
+++ b/arch/x86/pagetables.c
@@ -250,7 +250,9 @@ static mfn_t get_pgentry_mfn(mfn_t tab_mfn, pt_index_t index, unsigned long flag
mfn = frame->mfn;
set_pgentry(entry, mfn, flags);
tab = tmp_map_mfn(mfn);
+ BUG_ON(mfn_invalid(tab_mfn));
clean_pagetable(tab);
+ BUG_ON(mfn_invalid(tab_mfn));
}
else {
/* Page table already exists but its flags may conflict with our. Maybe fixup */
Based on this and the values that _start and _end take, it seems that clean_pagetable cleans a frame that is used by the memory manager itself. So a frame that is already used seems to be given out by the pmm. We have not found the exact cause of this yet.
Props to Sandro Rüegge (@sparchatus) for helping me to debugging this.
The text was updated successfully, but these errors were encountered:
Analysis (take with grain of salt, am not experienced with kernels)
The multiboot implementation uses the physical memory it is passed directly without any copying. Hence there would need to be some form of reservation of that area of memory in the physical memory manager. While the multiboot image does map itself it does not reserve the physical frames themselves.
Potential Solutions?
Make multiboot.c remove the multiboot range from the available memory ranges.
pass the range to init_pmm and prevent the frames from getting added
actively reserve the range in pmm later (if there are a lot of frames added in init_pmm this might be too late though)
Option 2. sounds the most straight forward to me. I can make a pull request but I'd appreciate some feedback on whether it is a viable solution.
Yes, the option 2. is the way to go here. I have not had time to look into details yet, but I am looking forward to discussing this further over your PR.
Describe the bug
There is a bug in the VMM/PMM where it corrupts its state by invalidating a frame that is still mapped somewhere else.
To Reproduce
test_kernel_task_func2
demonstrates the issue (Unittest That Stress-Tests the VMM and PMM #341)KT2_ROUNDS
to 1 andKT2_CHUNKS
to 1000, to rule out the that the issue is caused by the unmapping/deallocationsrand
seed to one where you repeatedly observe the assertionsExpected behavior
The testcase just passes. Throwing more memory at the issue, by increasing
.qemu_config:QEMU_RAM
, has seemingly no impact.Additional context
After a certain number of allocations,
mfn_invalid
suddenly starts to return false. This is caused by the underlyingmbi_get_memory_range
call. If you print out the_start
and_end variable
, you will notice that shortly before the assertions gets triggered, these ranges change. However, the elements ofmultiboot_mmap
are never actively modified once initialized.Patch:
Results:
To better localize the issue, we put many
BUG_ON(mfn_invalid(mfn))
into the pagetable code. The two interesting checks are following, as the first one passes while the second one gets triggers the assertion:Based on this and the values that
_start
and_end
take, it seems thatclean_pagetable
cleans a frame that is used by the memory manager itself. So a frame that is already used seems to be given out by the pmm. We have not found the exact cause of this yet.Props to Sandro Rüegge (@sparchatus) for helping me to debugging this.
The text was updated successfully, but these errors were encountered: