Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Debian package is slow to install on some devices, may cause timeouts. Related to node_modules size #88138

Closed
bnewbold opened this issue Jan 13, 2021 · 5 comments
Labels
Team:Operations Team label for Operations Team

Comments

@bnewbold
Copy link

This is a re-opening of bug #17176 ("Investigate slow package installs"). That bug was closed when a PR was merged to use static-fs for the node_modules directory (in March 2020), but that PR was subsequently reverted in #67135 (May 2020).

I would infer from the earlier issue that the problem is that the package is installing tens of thousands of files under /usr/share/kibana/node_modules, which contains hundreds and hundreds of vendored libraries. Perhaps some of these could be pruned? I don't have much experience with nodejs/javascript, but it looks like there are puppeteer and vscode packages being vendored, which seem like developer tools that aren't needed when deployed. Or, speculating, a tool like eatmydata could be used in the debian install process to skip file system syncs until all files have been written.


Kibana version: 7.10.1

Elasticsearch version: 7.10.1

Server OS version: Ubuntu Focal 20.04 (amd64, 6x CPUs, 16 GByte RAM)

Browser version: N/A

Browser OS version: N/A

Original install method (e.g. download page, yum, from source, etc.): deb https://artifacts.elastic.co/packages/7.x/apt stable main

Describe the bug: When installing the debian package, either via ansible or locally, the dpkg command takes a very long time (>15 minutes, which causes timeouts and leaves apt/dpkg in a half-installed state.

Steps to reproduce:

  1. sudo apt install kibana=7.10.*
@rudolf rudolf added the Team:Operations Team label for Operations Team label Jan 13, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-operations (Team:Operations)

@tylersmalley
Copy link
Contributor

@bnewbold, you're correct that the static-fs PR was reverted, but we have made huge reductions in file count through other means.

15 minutes seems VERY slow. I brought up an e2-medium (2 vCPUs, 4 GB memory) instance on GCP with standard persistent disk running Ubuntu Focal 20.04 and am seeing under 30 seconds.

$ time sudo dpkg -i kibana-7.10.1-amd64.deb 
Selecting previously unselected package kibana.
(Reading database ... 63070 files and directories currently installed.)
Preparing to unpack kibana-7.10.1-amd64.deb ...
Unpacking kibana (7.10.1) ...
Setting up kibana (7.10.1) ...
Processing triggers for systemd (245.4-4ubuntu3.2) ...

real	0m28.532s
user	0m8.195s
sys	0m8.103s
$ dd if=/dev/zero of=/tmp/test1.img bs=1G count=1 oflag=dsync
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 3.40415 s, 315 MB/s
$ sudo hdparm -t /dev/sda1

/dev/sda1:
SG_IO: bad/missing sense data, sb[]:  70 00 05 00 00 00 00 0a 00 00 00 00 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 Timing buffered disk reads: 364 MB in  3.01 seconds = 120.78 MB/sec

What disks are you using? Is there any other activity on this machine, or the underlying drives?

@bnewbold
Copy link
Author

This comment contains more information gathered. The summary is that package installation took about 7.5 minutes with default settings, on a modern spinning hard disk. Running dpkg with a special --force-unsafe-io flag sped up installation dramatically, to 30 seconds total.

I don't have detailed log of my >15 minute experience previously, and didn't reproduce such a long install time, or timeouts/failure. That was running in a different command and context (ansible), and the VM was in a different state (running elasticsearch and metricsbeat, etc, though minimal traffic or indexing happening). It might have been human error or misunderstanding on my part.

This debian wiki page has some background on dpkg slowness in general: Why is dpkg so slow when using new filesystems such as btrfs or ext4?


There is a single 12 TByte hard disk (not SSD), which does not use "shingling" technology. I don't have the exact part number available (due to virtualization), but I can hunt it down if it is important. There is a single filesystem partition and a small swap partition; I have not seen any evidence that swap is getting hit at all. There 6 vCPU cores and 16 GBytes of RAM. This is on a shared server, but the VM has CPU, RAM, and disk are all allocated exclusively to this VM; this is self-hosted hardware, not a cloud instance.

Here are the filesystem and mount settings are:

$ mount | grep vda1
/dev/vda1 on / type ext4 (rw,relatime,errors=remount-ro)

I ran the other two commands from above:

$ dd if=/dev/zero of=/tmp/test1.img bs=1G count=1 oflag=dsync
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 9.49515 s, 113 MB/s

$ sudo hdparm -t /dev/vda1 
/dev/vda1:
 HDIO_DRIVE_CMD(identify) failed: Inappropriate ioctl for device
 Timing buffered disk reads: 668 MB in  3.00 seconds = 222.44 MB/sec

I wiped and installed the operating system to have a clear slate, then installed kibana using dpkg (instead of apt or apt-get previously). Elasticsearch has not been installed at this point. In all the below I have lightly edited the time output for readability (it gets compressed in a weird way).

$ sudo time dpkg -i kibana-7.10.1-amd64.deb

Selecting previously unselected package kibana.
(Reading database ... 63540 files and directories currently installed.)
Preparing to unpack kibana-7.10.1-amd64.deb ...
Unpacking kibana (7.10.1) ...
Setting up kibana (7.10.1) ...
Processing triggers for systemd (245.4-4ubuntu3.3) ...

8.87 user
11.77 system
7:31.24 elapsed
4% CPU (0 avg text + 0 avg data 36636 max resident) k
240 inputs + 1838128 outputs (0 major + 29017 minor) pagefaults 0 swaps

In the first 30 seconds or so while the command was running I checked dstat and iotop:

# dstat
--total-cpu-usage-- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai stl| read  writ| recv  send|  in   out | int   csw
  5   7  80   8   0|8192B   31M|  70k  428B|   0     0 |  13k 5555
  5   4  79  11   0|8192B   24M|  77k  758B|   0     0 |8414  4884
  6   4  75  15   0|   0    48M|  64k  318B|   0     0 |5820  5442
  7   8  79   6   0|4096B   44M|  70k  404B|   0     0 |  18k 9409
  6   4  78  13   0|4096B   35M|  71k  318B|   0     0 |7120  4439
  5   3  80  12   0|8192B   22M|  76k  318B|   0     0 |6478  4279
  6   5  79  10   0|8192B   34M|  75k  318B|   0     0 |9458  4798
  6   4  79  10   0|8192B   27M|  79k  318B|   0     0 |8785  5425

# iotop
146886 be/4 root        0.00 B/s  195.42 K/s  0.00 % 98.46 % dpkg -i kibana-7.10.1-amd64.deb

and then again after a couple minutes (command still running):

# dstat
--total-cpu-usage-- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai stl| read  writ| recv  send|  in   out | int   csw
  1   0  83  16   0|   0  1576k|  57k  428B|   0     0 |1465   871
  1   0  82  16   0|   0  1436k|  57k  758B|   0     0 |1726   956
  1   0  83  16   0|   0   412k|  58k  318B|   0     0 |1752   943
  1   0  82  16   0|   0  1376k|  60k  318B|   0     0 |1699   879
  1   0  82  16   0|   0  1432k|  51k  318B|   0     0 |1651   933
  1   0  82  16   0|   0   444k|  50k  318B|   0     0 |1684   997
  1   1  81  16   0|   0  1456k|  55k  318B|   0     0 |1789  1106
  1   0  82  16   0|   0  1572k|  54k  318B|   0     0 |1765   983
  1   1  82  16   0|   0  1588k|  73k  318B|   0     0 |1889   820

# iotop
 146886 be/4 root        0.00 B/s   26.82 K/s  0.00 % 98.20 % dpkg -i kibana-7.10.1-amd64.deb

I also ran lsof on the dpkg process:

   $ sudo lsof -p 146886
    COMMAND    PID USER   FD   TYPE DEVICE SIZE/OFF      NODE NAME
    dpkg    146886 root  cwd    DIR  252,1     4096  56623172 /home/bnewbold
    dpkg    146886 root  rtd    DIR  252,1     4096         2 /
    dpkg    146886 root  txt    REG  252,1   309944 318767391 /usr/bin/dpkg
    [snip libraries and console]
    dpkg    146886 root    3uW  REG  252,1        0 134217789 /var/lib/dpkg/lock-frontend
    dpkg    146886 root    4uW  REG  252,1        0 134217787 /var/lib/dpkg/lock
    dpkg    146886 root    5w   REG  252,1     4608 134220125 /var/lib/dpkg/updates/tmp.i
    dpkg    146886 root    6u   REG  252,1        0 134219001 /var/lib/dpkg/triggers/Lock
    dpkg    146886 root    7r   REG  252,1        0 134219011 /var/lib/dpkg/triggers/Unincorp
    dpkg    146886 root    8w   REG  252,1   463049 134218821 /var/log/dpkg.log
    dpkg    146886 root    9r   REG  252,1      410 134219045 /var/lib/dpkg/diversions
    dpkg    146886 root   10r   REG  252,1      100 134219634 /var/lib/dpkg/statoverride
    dpkg    146886 root   11w   REG  252,1     2031 356516392 /usr/share/kibana/node_modules/@icons/material/SpotlightIcon.js.dpkg-new 

As an experiment, I un-installed kibana and tried dpkg with a special unsafe flag to skip filesystem synchronization:

$ sudo time dpkg --force-unsafe-io -i kibana-7.10.1-amd64.deb
Selecting previously unselected package kibana.
(Reading database ... 63546 files and directories currently installed.)
Preparing to unpack kibana-7.10.1-amd64.deb ...
Unpacking kibana (7.10.1) ...
Setting up kibana (7.10.1) ...
Processing triggers for systemd (245.4-4ubuntu3.3) ...

8.37 user
16.20 system
0:30.44 elapsed
80%CPU (0avgtext+0avgdata 36828maxresident)k
0inputs+1823672outputs (0major+24511minor)pagefaults 0swaps

Then un-installed and re-installed again with the original command to verify that there wasn't caching or something fishy going on:

$ sudo time dpkg -i kibana-7.10.1-amd64.deb
Selecting previously unselected package kibana.
(Reading database ... 63546 files and directories currently installed.)
Preparing to unpack kibana-7.10.1-amd64.deb ...
Unpacking kibana (7.10.1) ...
Setting up kibana (7.10.1) ...
Processing triggers for systemd (245.4-4ubuntu3.3) ...

9.16 user
28.06 system
7:36.66 elapsed
8%CPU (0avgtext+0avgdata 36732maxresident)k
0inputs+1908176outputs (0major+24608minor)pagefaults 0swaps

@bnewbold
Copy link
Author

Per my above findings, the default install process on spinning disk is still pretty slow (7+ minutes), though not as long as initially thought. The root of the problem seems to be dpkg doing an fsync() call after every file creation, which is slow with ext4 on spinning disk, even when throughput is fast.

I generally find apt/deb/dpkg installation quick and reliable to work with; this is the first time I have experienced a problem like this. Having way fewer files installed, or being able to "bundle" or load files from an indexed archive (like zip files), would be nice. But perhaps is at odds with current nodejs/npm workflows.

This may not be a high priority problem if most users are installing on SSD disks these days. Though I would note that there are probably many folks out there with alternative devices (Raspberry Pi? or other embedded flask disk) or low-cost hosting setups who might be impacted. For people installing manually, the dpkg --force-unsafe-io option might be worth trying, though it is, of course, not safe.

@bnewbold bnewbold changed the title debian package is so slow to install that install fails, apparently due to node_modules size Debian package is slow to install on some devices, may cause timeouts. Related to node_modules size Jan 25, 2021
@tylersmalley tylersmalley added 1 and removed 1 labels Oct 11, 2021
@exalate-issue-sync exalate-issue-sync bot added impact:low Addressing this issue will have a low level of impact on the quality/strength of our product. loe:small Small Level of Effort labels Feb 16, 2022
@exalate-issue-sync exalate-issue-sync bot removed the loe:small Small Level of Effort label Mar 2, 2022
@tylersmalley tylersmalley removed impact:low Addressing this issue will have a low level of impact on the quality/strength of our product. EnableJiraSync labels Mar 16, 2022
@jbudz
Copy link
Member

jbudz commented May 10, 2022

We've dropped the file count by about 20% since 7.13 (8.2 at this time of writing). There's more work to do, but the slower sort of narrowing our list of dependencies. I'm going to close this out as expected behavior for now.

@jbudz jbudz closed this as completed May 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Operations Team label for Operations Team
Projects
None yet
Development

No branches or pull requests

5 participants