Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Available Xen free memory not used #4891

Closed
Eric678 opened this issue Mar 17, 2019 · 19 comments
Closed

Available Xen free memory not used #4891

Eric678 opened this issue Mar 17, 2019 · 19 comments
Labels
C: core P: major Priority: major. Between "default" and "critical" in severity. r4.0-dom0-stable r4.1-dom0-cur-test T: bug Type: bug report. A problem or defect resulting in unintended behavior in something that exists.

Comments

@Eric678
Copy link

Eric678 commented Mar 17, 2019

Qubes OS version:

4.0

Affected component(s) or functionality:

core: qmemman? Whoever decides to issue "Not enough memory to start ..."


Steps to reproduce the behavior:

on my 4GB current-testing system (4.8.5 Xen & 4.19 kernel): clamp dom0 memory (dom0_mem=1024M,max=1024M), disable memory balancing on all qubes, start qubes until "Not enough memory to start..." and chop qube size to find where the threshold is.

Expected or desired behavior:

All Xen free memory will be allocated to qubes except XEN_FREE_MEM_LEFT (50MB).

Actual behavior:

The minimum Xen free memory (in xl info) was 655MB (21% of domU space).

General notes:

I was trying to track down why more than 1.3G of free memory always remained in my work 16GB system where memory is always at a premium.

In the 4G example qmemman has no one to talk to except dom0 and its memory is fixed so it should not be doing anything and on the 4G system at min free it is not writing anything to the journal.


I have consulted the following relevant documentation:

https://www.qubes-os.org/doc/qmemman/

This topic in qubes-devel started me off: https://groups.google.com/forum/#!topic/qubes-devel/o3ZoOsGPR7o and its subject expresses many new user's frustration - certainly mine!

I am aware of the following related, non-duplicate issues:

@marmarek
Copy link
Member

One issue I've found, is that qmemman fails to see that dom0 is limited to 1GB (or even 4GB in default setup). It loads dom0 max mem from /local/domain/0/memory/static-max xenstore key, but apparently it is always set to physical memory size.
Try writing 1GB there (or whatever max mem you've set for dom0), in KB:

xenstore-write /local/domain/0/memory/static-max 1048576

You may need restart qmemman to reload the value.

@Eric678
Copy link
Author

Eric678 commented Mar 17, 2019

I did not have to restart qmemman.
This looks like a total game changer! Managed to push Xen free memory down to 68MB.

Not so fast, tried my main 16GB -current system that was at limit with 1367MB free, it has dom0 mem max set to 1.5G so I wrote that to static-max, (checked with xenstore-read), qmemman sprang into life, but only got another 30MB for VMs, no further change after restarting qmemman.

@marmarek
Copy link
Member

Try collecting logs mentioned in #4890 (comment)

@andrewdavidwong andrewdavidwong added T: bug Type: bug report. A problem or defect resulting in unintended behavior in something that exists. C: core labels Mar 17, 2019
@andrewdavidwong andrewdavidwong added this to the Release 4.0 updates milestone Mar 17, 2019
@Eric678
Copy link
Author

Eric678 commented Mar 17, 2019

System as above, with static-max still at 1.5G, 3 VM start attemps: fixed 650MB - failed, fixed 600MB - failed, and finally fixed 550MB succceded. Runs of xl list between each attempt and before & after - anonymised by cutting 15 chars off each line. Looks like that 1367 free is a hard limit.
logs.txt

@Eric678
Copy link
Author

Eric678 commented Mar 17, 2019

The 30MB I thought I got above was just memory being shuffled around. As you will note most VMs are fixed memory as having too many enabled for balancing sends qmemman bananas - writing 16+lines/sec to the dom0 journal and getting nowhere.

@marmarek
Copy link
Member

Something about dom0 is still odd. Multiple qmemman lines says it's at ~2.8G, while in fact it is below 1.5G. Are you sure static-max entry is set correctly?

@marmarek
Copy link
Member

I now have ~350MB that qmemman refuses to hand out, is all that really required for safety?

That looks suspiciously similar to dom0-mem-boost parameter in /etc/qubes/qmemman.conf (also available in global settings).

Others have complained about the over generous allocation to dom0 - perhaps some defaults need to be changed - at least a guide on how to tame it.

Generally, OOM in dom0 is quite unpleasant, see #3079. 4GB default is definitely on the safe side, but lower default value could be risky depending on the use case (desktop environment, screen(s) size etc). Wrong static-max entry is clearly a bug, but it's mostly applicable to the cases when one modified max dom0 mem. But having some guide how to optimize memory usage (including tweaking max mem of VMs and dom0) may be a good idea.

@Eric678
Copy link
Author

Eric678 commented Mar 18, 2019

That looks suspiciously similar to dom0-mem-boost parameter in /etc/qubes/qmemman.conf (also available in global settings).

I was wondering what that parameter was* - that and the min qube size were being wound down slowly by Qubes a while back and it has ended up at 185MB (same in qmemman.conf). So it is not that. Any other ideas? Shouldn't it be XEN_FREE_MEM_LEFT?
[* F1 in the GUI and Duck search onsite for the exact label give no information.]

Generally, OOM in dom0 is quite unpleasant, see #3079. 4GB default is definitely on the safe side, but lower default value could be risky depending on the use case (desktop environment, screen(s) size etc). Wrong static-max entry is clearly a bug, but it's mostly applicable to the cases when one modified max dom0 mem. But having some guide how to optimize memory usage (including tweaking max mem of VMs and dom0) may be a good idea.

I did not read all of that issue but it sounded at the start like it might have been triggered by downward ballooning in dom0 - my reading on best practice with Xen says not to allow that and always run a fixed size dom0. I have configured my next boot for 1800M fixed. Where is the best place to script up the patch to static-max during boot?

This does bring up another point that I have been pondering while the qmemman issue unfolded: Is it possible that qmemman opens up a side-channel attack vector? Given ITL's awareness of these issues, I have to assume that it must be OK. I may be paranoid, however I feel more comfortable with dom0 and any sensitive VMs being excluded from memory balancing. Perhaps a note for this memory guide. Fixed dom0 should avoid any fast memory surprises so that the very well over provisioned swap will cope.

On the 4GB default I did notice that it did not change on a 4GB machine.

[ed] Deleted my previous post: too much information. Summary Marek was right and adjusting static-max on the fly fixed my problem except that ~350MB is now the min Xen free memory.

@Eric678
Copy link
Author

Eric678 commented Mar 19, 2019

The bug is in /usr/lib/qubes/startup-misc.sh on line 6, replacing

DOM0_MAXMEM=`/usr/sbin/xl info | grep total_memory | awk '{ print $3 }'`

with

DOM0_MAXMEM=`/usr/sbin/xl list 0 | awk '{ getline; print $3 }'`

picks up the size of dom0 running during startup, which in my testing on fixed sized dom0s picks up the same number that dom0_mem was set to at startup for larger sizes and a few MB under for smaller settings (under ~1GB) that seems to have the effect of having dom0 further trimmed by that amount over the next 10+ minutes. I have never looked at Xen before and could not find where the allocated size at startup was expressed internally... over to Marek.

Both my main and test systems are running SO much better for the last couple of days now than with the default configuration. Sold on fixed size dom0. :-)

@MystesofEternity
Copy link

Wait what so the fact that I only have like 7-8 GB available to use out of my pool of 12 GB of memory after I start qubes was a bug afterall? I thought that was normal after all this time

@tasket
Copy link

tasket commented Mar 21, 2019

@Eric678 Thanks! I've had dom0 limited to 1500MB for many months and might never have noticed there was something wrong with qmemman dom0 values. Look forward to trying this.

@Eric678
Copy link
Author

Eric678 commented Mar 22, 2019

@tasket thank you for your posts in the topic in qubes-devel linked to above that led me down this path.
BTW thank you for your qubes-vpn-support scripts, well done! Firing up my VPNs was one of the many positive surprises when setting up Qubes - a real FMIW* moment! 😆

* f*** me it worked

@andrewdavidwong andrewdavidwong added the P: major Priority: major. Between "default" and "critical" in severity. label Mar 22, 2019
@Eric678
Copy link
Author

Eric678 commented Mar 24, 2019

I had a few spare hours so decided to pull down the sources and have a look at qmemman - now I have never looked at a python program before so bear that in mind. It is relatively small (good, I hate complexity) however I have to say it looks like it was written by trial and error. I was trying to figure out how to stop dom0 from being added to domdict - effectively the include/exclude dom0 from memory balancing checkbox. No luck, expected to find it in __init__.py. Thus "Include dom0 in memory balancing" checkbox in global settings is a separate request since it looks like qmemman is going to get a work over for 4.1. Even though the algorithm is reasonably simple the implementation becomes messy and quite unprovably correct. The reason dom0 needs exclusion is that even with a fixed setting from boot qmemman manages to chip away at dom0 slowly when memory is full (probably some rounding error) and it may also partially be returned when memory is freer. The memory that is taken from dom0 is not made available to other VMs, just increases Xen free memory minimum.

Since no one has reassured me on the side-channel issue I have to assume that it is real and that is why I have heard nothing, damage control, and my email address seems to have been taken off the qubes-users whitelist. 😒 I am not a security professional either, bear that in mind. Only picked up Qubes from 4.01 and still finding my feet.

I am getting a bit of a queasy feeling, all of the above and then thinking that it was OK to stop and start qmemman, tried leaving it stopped, thinking I needed a big switch to get it out of the picture, shut down a qube and it disappeared from Xen and stayed running in Qubes. OK fine, so started qmemman and Qubes crapped on itself, the domain widget went into an inf startup loop - throwing an exception for the first domain in its list - <VMname>.icon nonexistent property. Used 50% of 1 core in dom0 and was still doing this after a reboot. The old "fixed the taillight and the front bumper fell off". 😲

Seems like little or no stress testing is done on Qubes - if there is, it is not working.

On the size of dom0, after working out how to easily track memory and swap real instantaneous demands, the peak usage requirements in dom0 are, unsurprisingly, during updates. A 4GB system seems good at 900M dom0 and it looks like 1300M is enough for 16GB with up to 20 VMs, which still has ~360MB unusable Xen free memory, not too bad. BTW I found the dom0-mem-boost mentioned above in the code and it has nothing to do with the free memory left - just boosts the prefmem dom0 would like to have according to the algorithm when balancing.

Need to get my dom0 X killed cheat sheet ready.

@tasket
Copy link

tasket commented May 5, 2019

@marmarek After such a brief time with a nicely working Qubes system, the awful memory allocation is back with the update from qubes-core-dom0 4.0.41 to 4.0.42.

Max allocation (from xentop sums) is back to 6750MB, about 1GB lower than I was getting before the last update.

Can anyone explain how my startup-misc.sh reverted to using xl info when the qubes-core-admin master branch shows the fixed revision using xl list?!

@rustybird
Copy link

Can anyone explain how my startup-misc.sh reverted to using xl info when the qubes-core-admin master branch shows the fixed revision using xl list?!

The fix hasn't been cherry-picked into the release4.0 branch yet.

@tasket
Copy link

tasket commented May 6, 2019

OK, so this issue was flagged as major and the fix hasn't made it to testing in nearly a month........

marmarek added a commit to QubesOS/qubes-core-admin that referenced this issue May 6, 2019
This value needs to be set to actual static max for qmemman to work
properly. If it's set higher than real static-max, qmemman will try to
assign more memory to dom0, which dom0 could not use - will be wasted.
Since this script is executed before any VM is started, simply
take the current dom0 memory usage, instead of parsing dom0_mem Xen
argument. There doesn't seem to be nice API to get this value from Xen
directly.

Fixes QubesOS/qubes-issues#4891

(cherry picked from commit 56ec271)
@qubesos-bot
Copy link

Automated announcement from builder-github

The package qubes-core-dom0-4.0.43-1.fc25 has been pushed to the r4.0 testing repository for dom0.
To test this update, please install it with the following command:

sudo qubes-dom0-update --enablerepo=qubes-dom0-current-testing

Changes included in this update

@qubesos-bot
Copy link

Automated announcement from builder-github

The package qubes-core-dom0-4.0.43-1.fc25 has been pushed to the r4.0 stable repository for dom0.
To install this update, please use the standard update command:

sudo qubes-dom0-update

Or update dom0 via Qubes Manager.

Changes included in this update

@qubesos-bot
Copy link

Automated announcement from builder-github

The package qubes-core-dom0-4.1.1-1.fc29 has been pushed to the r4.1 testing repository for dom0.
To test this update, please install it with the following command:

sudo qubes-dom0-update --enablerepo=qubes-dom0-current-testing

Changes included in this update

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C: core P: major Priority: major. Between "default" and "critical" in severity. r4.0-dom0-stable r4.1-dom0-cur-test T: bug Type: bug report. A problem or defect resulting in unintended behavior in something that exists.
Projects
None yet
Development

No branches or pull requests

7 participants