Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't run anything with 1.01G of memory #1050

Closed
nyh opened this issue Aug 21, 2019 · 3 comments
Closed

Can't run anything with 1.01G of memory #1050

nyh opened this issue Aug 21, 2019 · 3 comments

Comments

@nyh
Copy link
Contributor

nyh commented Aug 21, 2019

I can easily run the "rogue" image (for example) with as little as 40M of memory, so unsurprisingly I have no problems running it with 1G, 2G, 4G or 8G of memory.

But when I try with 1.01G, I get this crash. Note that the crash happens before running the actual application, so it happens on every image - not just "rogue" (I first saw it on tst-huge.so which I was testing for #1049):

$ scripts/run.py -m1.01G
OSv v0.53.0-88-g5377a50b
Assertion failed: ef->rflags & processor::rflags_if (arch/x64/mmu.cc: page_fault: 34)
Halting.

The gdb backtrace (I'm leaving out all the nested problems that happen after the first problem and confuse the situation further):

#16 0x000000004039e3ce in general_protection (ef=0x40968e18)
    at arch/x64/exceptions.cc:320
#17 <signal handler called>
#18 mmu::hw_ptep_impl<0>::write (this=<optimized out>, pte=...)
    at include/osv/mmu-defs.hh:203
#19 mmu::linear_page_mapper::page<0> (this=<optimized out>, 
    this=<optimized out>, ptep=..., offset=18446744073709548160)
    at core/mmu.cc:469
#20 mmu::page<mmu::linear_page_mapper, 0> (ptep=..., 
    offset=18446744073709548160, pops=...) at core/mmu.cc:311
#21 mmu::map_level<mmu::linear_page_mapper, 1>::operator() (
    base_virt=18446603337305616384, parent=..., this=<synthetic pointer>)
    at core/mmu.cc:445
#22 mmu::map_level<mmu::linear_page_mapper, 2>::map_range<1> (
    this=<synthetic pointer>, ptep=..., base_virt=18446603337305423872, 
    slop=<optimized out>, page_mapper=..., size=4096, vcur=<optimized out>)
    at core/mmu.cc:399
#23 mmu::map_level<mmu::linear_page_mapper, 2>::operator() (
    base_virt=18446603337305423872, parent=..., this=<synthetic pointer>)
    at core/mmu.cc:449
#24 mmu::map_level<mmu::linear_page_mapper, 3>::map_range<2> (
    this=<synthetic pointer>, ptep=..., base_virt=<optimized out>, 
    slop=<optimized out>, page_mapper=..., size=<optimized out>, 
    vcur=<optimized out>) at core/mmu.cc:399
#25 mmu::map_level<mmu::linear_page_mapper, 3>::operator() (
    base_virt=<optimized out>, parent=..., this=<synthetic pointer>)
    at core/mmu.cc:449
#26 mmu::map_level<mmu::linear_page_mapper, 4>::map_range<3> (
    this=0xffff8000001dce00, ptep=..., base_virt=<optimized out>, 
    slop=<optimized out>, page_mapper=..., size=<optimized out>, 
    vcur=<optimized out>) at core/mmu.cc:399
#27 mmu::map_level<mmu::linear_page_mapper, 4>::operator() (
    this=this@entry=0xffff8000001dce70, parent=..., base_virt=<optimized out>, 
    base_virt@entry=0) at core/mmu.cc:449
#28 0x00000000403438ca in mmu::map_range<mmu::linear_page_mapper> (
    slop=<optimized out>, page_mapper=..., size=627, 
    vstart=5261997133009241397, vma_start=5261997133009241397)
    at include/osv/mmu-defs.hh:251
#29 mmu::linear_map (_virt=_virt@entry=0xffff800040a2fd80, 
    addr=addr@entry=1084423552, size=size@entry=627, slop=<optimized out>, 
    slop@entry=4096, mem_attr=mem_attr@entry=mmu::mattr::normal)
    at core/mmu.cc:1850
#30 0x00000000403a34a8 in dmi_table (num=12, len=627, base=1084423552)
    at arch/x64/dmi.cc:56
#31 smbios_decode (p=0xffff8000000f5cb0 "_SM_\022\037\002\bg")
    at arch/x64/dmi.cc:125
#32 dmi_probe () at arch/x64/dmi.cc:140
#33 0x00000000403a1125 in osv::firmware_probe () at arch/x64/firmware.cc:16
#34 0x000000004022b8e8 in main_cont (loader_argc=0, 
    loader_argv=0xffffa00000a08078) at loader.cc:554
#35 0x00000000403f4a47 in sched::thread_main_c (t=0x40963df0)
    at arch/x64/arch-switch.hh:271
#36 0x000000004039cbf3 in thread_main () at arch/x64/entry.S:113

It seems we have a bug in linear_map() when the memory is a tiny bit over 1GB?

I don't know if this is a recent regression or a very old bug - I'm not sure I ever specifically tried to run with 1.01GB of memory.

@wkozaczuk
Copy link
Collaborator

I have narrowed it down to the specific range of memory 1025-1054.1M. When one runs app with 1024 or less or 1054.2 or more it works.

Is also see the exact same behavior on firecracker (it does not use ACPI which I suspected might somehow collide with how we map memory).

With the memory 1054.1M OSv adds free ranges like so and added in this order:

---> free_initial_memory_range: addr: ffff80000093e000
---> free_initial_memory_range: size: 000000003f6c2000

---> free_initial_memory_range: addr: ffff800000000000
---> free_initial_memory_range: size: 000000000009f000

---> free_initial_memory_range: addr: ffff800000100000
---> free_initial_memory_range: size: 0000000000100000

---> free_initial_memory_range: addr: ffff800040000000
---> free_initial_memory_range: size: **0000000001dee000**

with 1054.2

---> free_initial_memory_range: addr: ffff80000093e000
---> free_initial_memory_range: size: 000000003f6c2000

---> free_initial_memory_range: addr: ffff800000000000
---> free_initial_memory_range: size: 000000000009f000

---> free_initial_memory_range: addr: ffff800000100000
---> free_initial_memory_range: size: 0000000000100000

---> free_initial_memory_range: addr: ffff800040000000
---> free_initial_memory_range: size: **0000000001e0e000**

The only difference is the size of the last range around 31-32 MB.
When on passes 1024M or less there are only 3 ranges.

@wkozaczuk
Copy link
Collaborator

wkozaczuk commented Aug 23, 2019

This more and more looks like some sort of bug that corrupts page tables.

When I put a breakpoint just right after the memory is set up at the end of the arch-setup.cc:setup_free_memory(), I am able to print the value at the 1GB address like so:

(gdb) p *0xffff800040000000
$1 = 0

However, if I continue and let it crash I see this in gdb:

(gdb) p *0xffff800040000000
Cannot access memory at address 0xffff800040000000
(gdb) p *(0xffff800040000000-0x8)
$1 = 9764864

So clearly the address 0xffff800040000000 was mapped but somehow got unmapped? Or in general everything above 1GB is not accessible anymore. Corrupt page tables?

Running with 1025M of memory

@wkozaczuk
Copy link
Collaborator

I have sent the patch that fixes this issue and #1049.

Please see this debug output when setting up memory before the patch:

free_initial_memory_range: addr: ffff800000000000
free_initial_memory_range: size: 000000000009f000

free_initial_memory_range: addr: ffff800000100000
free_initial_memory_range: size: 0000000000100000

arch_setup_free_memory, linear_map for base: ffff800000000000

// This is where the corresponding physical address was 0 and 
// most likely was allocated for a page table
page_range_allocator:alloc() - returned: ffff800000000000 
--------------------------------------------------------------------------------

arch_setup_free_memory, linear_map for base: ffff900000000000
page_range_allocator:alloc() - returned: ffff800000001000

arch_setup_free_memory, linear_map for base: ffffa00000000000
page_range_allocator:alloc() - returned: ffff800000002000

free_initial_memory_range: addr: ffff800040000000
free_initial_memory_range: size: 0000000000100000
page_range_allocator:alloc() - returned: ffff800000003000

After patch:

free_initial_memory_range: addr: ffff800000000000
free_initial_memory_range: size: 000000000009f000

free_initial_memory_range: addr: ffff800000100000
free_initial_memory_range: size: 0000000000100000

arch_setup_free_memory, linear_map for base: ffff800000000000

// This time the corresponding physical address was 0x1000
page_range_allocator:alloc() - returned: ffff800000001000 
--------------------------------------------------------------------------------

arch_setup_free_memory, linear_map for base: ffff900000000000
page_range_allocator:alloc() - returned: ffff800000002000

arch_setup_free_memory, linear_map for base: ffffa00000000000
page_range_allocator:alloc() - returned: ffff800000003000

free_initial_memory_range: addr: ffff800040000000
free_initial_memory_range: size: 0000000000100000
page_range_allocator:alloc() - returned: ffff800000004000

@nyh nyh closed this as completed in fb7ef9a Aug 24, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants