Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

disk_usage() percent doesn't take reserved blocks into account (-5%) #829

Closed
ccztux opened this issue May 31, 2016 · 11 comments
Closed

disk_usage() percent doesn't take reserved blocks into account (-5%) #829

ccztux opened this issue May 31, 2016 · 11 comments

Comments

@ccztux
Copy link

ccztux commented May 31, 2016

Hi!

As described here: NagiosEnterprises/ncpa#102 the disk usage differs between df command and psutil output.

df -a
Filesystem           1K-blocks     Used Available Use% Mounted on
/dev/mapper/vg_mgmt1-lv_root
                      51475068 20684108  28169520  43% /
python
Python 2.6.6 (r266:84292, Jul 23 2015, 15:22:56)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-11)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import psutil
>>> psutil.disk_usage('/')
sdiskusage(total=52710469632, used=21180530688, free=28845584384, percent=40.200000000000003)

Is there an explanation about this?

@giampaolo
Copy link
Owner

psutil metrics are expressed in bytes, while df should show kb. As for the
percentage there's a comment in the source code explaining why it's
different but right now I am on the metro and can't check it.
On May 31, 2016 11:16 AM, "ccztux" [email protected] wrote:

Hi!

As described here: NagiosEnterprises/ncpa#102
NagiosEnterprises/ncpa#102 the disk usage
differs between df command and psutil output.

df -a
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/mapper/vg_mgmt1-lv_root
51475068 20684108 28169520 43% /

python
Python 2.6.6 (r266:84292, Jul 23 2015, 15:22:56)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-11)] on linux2
Type "help", "copyright", "credits" or "license" for more information.

import psutil
psutil.disk_usage('/')
sdiskusage(total=52710469632, used=21180530688, free=28845584384, percent=40.200000000000003)

Is there an explanation about this?


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#829, or mute the thread
https://github.com/notifications/unsubscribe/AAplLMYlAxFmVeCc0xRHzyTXBP8llp2gks5qG_xRgaJpZM4IqVGg
.

@eschava
Copy link

eschava commented May 31, 2016

# NB: the percentage is -5% than what shown by df due to
# reserved blocks that we are currently not considering:
# http://goo.gl/sWGbH

@jomann09
Copy link
Contributor

jomann09 commented Jun 4, 2016

Mostly interested in the % since that doesn't give the proper percentage when it doesn't account for the reserved blocks. Do you know of any good way to account for this? Is it always 5% off?

@giampaolo
Copy link
Owner

giampaolo commented Jun 4, 2016

So here's my findings. Linux (not sure about other UNIXes) by default reserves 5% of total disk space for the root user so that if, say, the user home directory fills the disk, the root user and the system will still have 5% of space to write into /var/log, not crash in general and also to avoid disk fragmentation:
http://unix.stackexchange.com/a/7964/168884
psutil currently does not take the reserved block into account so that is why % usage is 5% smaller:

# NB: the percentage is -5% than what shown by df due to

tune2fs utility can be used to find the exact value of the reserved block, which is what psutil would need in order to calculate the correct percent value:

~/svn/psutil {master}$ sudo tune2fs  -l /dev/sda6 | grep "Reserved block" 
Reserved block count:     4607763

Parsing tune2fs output is a no-no thought: I just don't want to rely on it. I tried to take a look at tune2fs source code which is available here: http://git.kernel.org/cgit/fs/ext2/e2fsprogs.git. It determines this information via a ext2fs_r_blocks_count() function defined in ./lib/ext2fs/blknum.c but it's not clear to me how the struct it uses is filled. Relying on tune2fs lib is also a no-no as it would require installing the C headers separately (sudo apt-get install e2fslibs-dev on ubuntu).

As for df utility. df is also able to print the right percentage information so it must take the reserved blocks value from somewhere (or "fake it", let's hope not). I tried to strace it (strace df) but I didn't notice anything interesting. The strace output is pretty short, suggesting df doesn't do anything particularly complex but unfortunately it seems the information doesn't come from the /proc fs. I will try to take a look at df source code later.

Another open question is what to do with other UNIX platforms. Is the reserved block something which exists in Linux only? If not how can we determine its value on other UNIX platforms (aka is tune2fs util available?)?

@giampaolo giampaolo changed the title Disk usage differs from df command disk_usage() percent doesn't take reserved blocks into account (-5%) Jun 4, 2016
@jomann09
Copy link
Contributor

jomann09 commented Jun 4, 2016

Maybe I am doing my math wrong but is this not why it's not taking them into account? I'm not normally a Python developer, but this seems to make sense.

In the pstuil code you currently have: used = (st.f_blocks - st.f_bfree) * st.f_frsize

From the Python docs for 2.7.11 here I see that F_BFREE is actually the amount of free blocks total, not the amount that can be used by non-superusers. So when you're doing total blocks - available = used it's giving you the amount available for root, not the amount available for the user, you'd have to use F_BAVAIL to get that... is this something that could be triggered with a flag?

Now I am not sure how reliable the F_BAVAIL is but I am assuming it's grabbing it directly from the C library, statvfs.

As a side note, I would think you could use the two values to find the value of the actual space reserved too without having to add anything to pusitls.

@giampaolo
Copy link
Owner

Yes, you are right: statvfs() already gives all the necessary bits. Thanks for figuring this out.
But what is wrong is not how psutil calculates the used space: it's the percentage which is wrong.
If you compare df -h and psutil you'll notice total, used and free/avail match, while the percentage doesn't (df reports a higher percentage) so we're doing something wrong in in there.

@giampaolo
Copy link
Owner

giampaolo commented Jun 5, 2016

df source code: https://searchcode.com/codesearch/view/17947198/
Aliases for these confusing metrics are created at line 551, the percentage should be calculated later on at line 610.

@giampaolo
Copy link
Owner

OK, I fixed this here: 3dea30d
It was kinda complicated to get all the math right so I made sure to properly comment every value for future reference. Also, the returned values match df -B 1 perfectly (tests here: d5e56ec).

@jomann09
Copy link
Contributor

jomann09 commented Jun 5, 2016

Awesome this looks great, thanks for acting on this so quickly! I'm glad it wasn't as complicated as originally thought!

@giampaolo
Copy link
Owner

Yeah I'm glad too. It's not complicated because you have all the necessary pieces already in place but the math to calculate useful metrics out of those pieces is kinda hard to get right (it took me all day), suggesting a wrapper on top of os.statvfs is good to have.

@ccztux
Copy link
Author

ccztux commented Jun 6, 2016

Thank you for fixing this!

nlevitt added a commit to nlevitt/psutil that referenced this issue Jun 22, 2016
* origin/master: (121 commits)
  update HISTORY/README
  giampaolo#810: rename windows wheels to make pip 7.1.2 happy
  add doc warning about disk_io_counter() numbers which may overlap (se giampaolo#802)
  update HISTORY
  git travis/osx error
  fix typo
  fix travis err
  add STAT for ps.py
  Convert string arguments to bytes
  appveyor download script: check the num of download files and print a warning if it's < than expected
  win / CI: try not to upgrade pip version and see whether pip produces compatible wheels
  refactor makefile
  makefile refactoring
  makefile refactoring
  Updated to use better if/else/endif values (my bad) Updated HISTORY to explain better that Win XP still uses 32bit values Reverted test code, will add in a different PR
  Styling fixes (spaces instead of tabs)
  PEP 8 compliance and history update (Vista+ only for fix)
  Type fix
  Continue on RuntimeError when running df on partitions it can't run on
  Fix disk_usage test to use 1 kB block size due to issues with OS X
  Add comment lines to ifs for win versions
  Actually does need it in XP/2000 unfortunately
  Tried to keep the mingw32 support but win 7 sdk is causing issues
  Whoops, whitespace issue
  Add back in ws2tcpip.h in the proper place in Win XP / Win 2000
  Fixes for compiling on Win XP/Win 200
  Update HISTORY.rst with giampaolo#816 issue bug fix
  Fix for windows net_io_counters wrapping after 4.3GB due to MIB_IFROW using DWORD. Updated to use MIB_IF_ROW2 which gives ULONG values instead. This causes more breaking changes for Windows XP and all Windows versions less than Vista / Server 2008 meaning that it should have no problems working on Vista / Server 2008 and beyond.
  fix doc indentation
  doc indentation
  fix giampaolo#829: disk_usage().percent takes reserved root space into account
  giampaolo#829: add tests to compare disk_usage() with 'df' cmdline utility
  small refactoring
  update comment
  update badges
  move stuff around
  reorganize (move stuff in) _common.py
  def __all__ for _common.py module
  reorganize (move) test utils
  update __all__
  small @memoize refactoring
  Fix psutil.virtual_memory() type mismatch for NetBSD.
  prettyfy code
  prettyfy code
  update README
  Sets Makefile variable for imports compatible with Python 3.x
  fix linux test
  memory_maps: use bytes
  fir unpackment err
  refactor smaps code
  linux memory_maps refactoring
  fix typo
  update doc
  update version and HISTORY
  re-enable win services
  re-enable all tests on windows
  try to upgrade pip
  try to upgrade pip
  try to upgrade pip
  try to install pip globally
  try to upgrade pip
  force build of old 4.1.0 ver
  giampaolo#817: add script to download exes/whels from appveyor
  appveyor exp 5
  appveyor exp 4
  appveyor exp 3
  appveyor exp 2 appveyor/ci#806 (comment)
  appveyor exp appveyor/ci#806 (comment)
  appveyor experiment
  appveyor experiment
  appveyor: attempt differe VS config for py 3.5
  fix typo
  restore previous appveyor conf + try to add python 3.5 64 bits
  try easier appveyor conf, see pypa/packaging.python.org#172
  try to make appveyor create exes/wheels
  add freebsd dev notes
  refactor ctx switches
  uids/gids refactoring
  refactor num_threads
  more refactoring
  [Linux] Process.name() is 25% faster (on python 3)
  ignore me
  remove outdated test
  [Linux] speedup Process.status() by 28%
  [Linux] speedup Process.pid() by 20% by reading it from /proc/pid/stat instead of /proc/pid/status
  set ppid
  linux set ppid
  fix 813: have as_dict() ignore extraneous attribute names which getsattached to the Process instance
  pep8 fixes
  fix giampaolo#812: [NetBSD] fix compilation on NetBSD-5.x.
  build fix: MNT_RELATIME and MNT_EXTATTR are not available on NetBSD-5
  build fix: declare warn()
  update IDEAS
  fix win tests
  better AD error handling on win
  service descr: handle unicode errors
  service descr: handle empty description str
  check PyUnicodeDecode return value
  add services memory leak tests
  update doc
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants