Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vip-manager RAM usage steadily increases till OOM killed by system #285

Closed
thelinuxracoon opened this issue Dec 9, 2024 · 6 comments · Fixed by #286
Closed

vip-manager RAM usage steadily increases till OOM killed by system #285

thelinuxracoon opened this issue Dec 9, 2024 · 6 comments · Fixed by #286
Assignees
Labels

Comments

@thelinuxracoon
Copy link

thelinuxracoon commented Dec 9, 2024

I Have recently upgraded vip-manager 2.3.0 from the ubuntu repos to the newest vip-manager version 2.8.0 from this github repo. The issue began after installing the newest version.

I wanted to have the newest version so I can use patroni as the vip-manager endpoint. This ensures that vip-manager still functions when patroni is in failsafe_mode and/or etcd is unreachable.

OS: Ubuntu 24.04 LTS
vip-manager: version: 2.8.0, commit: 8aef662
patroni: 3.2.2

Screenshot of the system RAM usage:
vip-manager_ram_usage

Logs when vip-manager gets OOM killed:

Dec 07 21:33:13 dbserver vip-manager[325325]: 2024/12/07 21:33:13 IP address 192.168.110.30/24 state is false, desired false
Dec 07 21:33:24 dbserver vip-manager[325325]: 2024/12/07 21:33:24 IP address 192.168.110.30/24 state is false, desired false
Dec 07 21:33:31 dbserver systemd[1]: vip-manager.service: A process of this unit has been killed by the OOM killer.
Dec 07 21:33:33 dbserver systemd[1]: vip-manager.service: Main process exited, code=killed, status=9/KILL
Dec 07 21:33:33 dbserver systemd[1]: vip-manager.service: Failed with result 'oom-kill'.
Dec 07 21:33:33 dbserver systemd[1]: vip-manager.service: Consumed 1h 34min 30.819s CPU time, 9.8G memory peak, 0B memory swap peak.
Dec 07 21:33:33 dbserver systemd[1]: vip-manager.service: Scheduled restart job, restart counter is at 1.
Dec 07 21:33:33 dbserver systemd[1]: Started vip-manager.service - Manages Virtual IP for Patroni.
Dec 07 21:33:33 dbserver vip-manager[3723453]: 2024/12/07 21:33:33 Using config from file: /etc/vip-manager/vip-manager.yml
Dec 07 21:33:33 dbserver vip-manager[3723453]: 2024/12/07 21:33:33 No dcs-endpoints specified, trying to use localhost with standard ports!
Dec 07 21:33:33 dbserver vip-manager[3723453]: 2024/12/07 21:33:33 This is the config that will be used:
Dec 07 21:33:33 dbserver vip-manager[3723453]:         config : /etc/vip-manager/vip-manager.yml
Dec 07 21:33:33 dbserver vip-manager[3723453]:         dcs-endpoints : [http://127.0.0.1:8008/]
Dec 07 21:33:33 dbserver vip-manager[3723453]:         dcs-type : patroni
Dec 07 21:33:33 dbserver vip-manager[3723453]:         hostingtype : basic
Dec 07 21:33:33 dbserver vip-manager[3723453]:         interface : ens192
Dec 07 21:33:33 dbserver vip-manager[3723453]:         interval : 1000
Dec 07 21:33:33 dbserver vip-manager[3723453]:         ip : 192.168.110.30
Dec 07 21:33:33 dbserver vip-manager[3723453]:         manager-type : basic
Dec 07 21:33:33 dbserver vip-manager[3723453]:         netmask : 24
Dec 07 21:33:33 dbserver vip-manager[3723453]:         retry-after : 250
Dec 07 21:33:33 dbserver vip-manager[3723453]:         retry-num : 3
Dec 07 21:33:33 dbserver vip-manager[3723453]:         trigger-key : /leader
Dec 07 21:33:33 dbserver vip-manager[3723453]:         trigger-value : 200
Dec 07 21:33:33 dbserver vip-manager[3723453]:         verbose : false
Dec 07 21:33:33 dbserver vip-manager[3723453]:         version : false
Dec 07 21:33:33 dbserver vip-manager[3723453]: 2024/12/07 21:33:33 IP address 192.168.110.30/24 state is false, desired false
Dec 07 21:33:43 dbserver vip-manager[3723453]: 2024/12/07 21:33:43 IP address 192.168.110.30/24 state is false, desired false

These are the logs from the standby server so the state is false, desired false from vip-manager is correct. The issue also occurs on the leader server.

Enabling debug and verbose options does not give additional logs.

My vip-manager Config:

ip: 192.168.110.30
netmask: 24
interface: ens192
trigger-key: "/leader"
trigger-value: "200"
dcs-type: patroni

My Patroni API Config:

restapi:
  listen: 0.0.0.0:8008
  connect_address: 192.168.110.32:8008

I am starting the vip-manager via systemD. The Unit file looks like this:

# /usr/lib/systemd/system/vip-manager.service
# This is an example of a systemD config file for vip-manager.
# You can copy it to "/etc/systemd/system/vip-manager.service", adjust as necessary and then call
# systemctl daemon-reload && systemctl start vip-manager && systemctl enable vip-manager
# to start and also enable auto-start after reboot.

[Unit]
Description=Manages Virtual IP for Patroni
After=network-online.target
Before=patroni.service

[Service]
Type=simple

ExecStart=/usr/bin/vip-manager --config=/etc/default/vip-manager.yml

Restart=on-failure

[Install]
WantedBy=multi-user.target

# /etc/systemd/system/vip-manager.service.d/override.conf
[Service]
EnvironmentFile=
ExecStart=
ExecStart=/usr/bin/vip-manager --config="/etc/vip-manager/vip-manager.yml"

I have overridden the ExecStart so I can use a custom File. This because if the package from the ubuntu repos ever gets installed again it wont override the custom parameters in the unit file.

The overall behavior of vip-manager is correct and works without issues in failover situations. The only problem is that vip-manager fills up the system RAM gets killed and repeats this. This creates a downtime for the whole server every 2 days.(depending on how much RAM you have)

@klaci71
Copy link

klaci71 commented Dec 9, 2024

Hello,

Excuse me, I don't know, if I can write them in this thread or I should open a new one.

We have the same problem in all servers, when we use vip-manager with Patroni datasource.

Operating system:
Operating System: SUSE Linux Enterprise Server 15 SP4
RAM: 16 GB
Processor: 4 core

vip-manager version: 2.8.0

Config. file: vip-manager-dbasync.yml
interval: 1000
trigger-key: "/asynchronous"
trigger-value: "200"
ip: xxx.xxx.xxx.xxx # the virtual ip address to manage
netmask: 24 # netmask for the virtual ip
interface: eth0 #interface to which the virtual ip will be added
hosting-type: basic # possible values: basic, or hetzner.
dcs-type: patroni # etcd, consul or patroni
dcs-endpoints:
- http://yyy.yyy.yyy.yyy:8008/
retry-num: 3
retry-after: 250 #in milliseconds
#verbose: false

Running vip-manager processes:

ps -ef | grep vip
root      1541     1  0 dec01 ?        00:00:37 /usr/bin/vip-manager --config=/xxxxxx/vip-manager/vip-manager.yml
root      2037     1  0 dec06 ?        00:07:36 /usr/bin/vip-manager --config=/xxxxxx/vip-manager/vip-manager-dbasync.yml

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
**2037 root      20   0 3140996 1,885g  12580 S 34,55 12,08   7:39.93 vip-manager**
1541 root      20   0 1241960  11408   2740 S 0,000 0,070   0:37.26 vip-manager

vip-manager.yml uses etcd datasource

The process's data:
Name: vip-manager
Umask: 0022
State: S (sleeping)
Tgid: 2037
Ngid: 0
Pid: 2037
PPid: 1
TracerPid: 0
Uid: 0 0 0 0
Gid: 0 0 0 0
FDSize: 131072
Groups:
NStgid: 2037
NSpid: 2037
NSpgid: 2037
NSsid: 2037
VmPeak: 3207112 kB
VmSize: 3207112 kB
VmLck: 0 kB
VmPin: 0 kB
VmHWM: 1994048 kB
VmRSS: 1994048 kB
RssAnon: 1981468 kB
RssFile: 12580 kB
RssShmem: 0 kB
VmData: 2018472 kB
VmStk: 132 kB
VmExe: 7628 kB
VmLib: 8 kB
VmPTE: 3984 kB
VmSwap: 0 kB
HugetlbPages: 0 kB
CoreDumping: 0
THP_enabled: 1
Threads: 10
SigQ: 4/63804
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: 0000000000000000
SigCgt: fffffffd7fc1feff
CapInh: 0000000000000000
CapPrm: 000001ffffffffff
CapEff: 000001ffffffffff
CapBnd: 000001ffffffffff
CapAmb: 0000000000000000
NoNewPrivs: 0
Seccomp: 0
Seccomp_filters: 0
Speculation_Store_Bypass: thread vulnerable
SpeculationIndirectBranch: conditional enabled
Cpus_allowed: f
Cpus_allowed_list: 0-3
Mems_allowed: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000001
Mems_allowed_list: 0
voluntary_ctxt_switches: 249224
nonvoluntary_ctxt_switches: 1056

sudo pmap -X 2037
2037: /usr/bin/vip-manager --config=/var/lib/pgsql/vip-manager/vip-manager-dbasync.yml
Address Perm Offset Device Inode Size Rss Pss Referenced Anonymous LazyFree ShmemPmdMapped FilePmdMapped Shared_Hugetlb Private_Hugetlb Swap SwapPss Locked THPeligible Mapping
00400000 r-xp 00000000 fe:01 730523 7628 6456 5882 6456 0 0 0 0 0 0 0 0 0 1 /usr/bin/vip-manager
00b73000 r--p 00773000 fe:01 730523 8396 5996 4792 5996 0 0 0 0 0 0 0 0 0 0 /usr/bin/vip-manager
013a6000 rw-p 00fa6000 fe:01 730523 420 420 386 420 108 0 0 0 0 0 0 0 0 0 /usr/bin/vip-manager
0140f000 rw-p 00000000 00:00 0 204 100 100 100 100 0 0 0 0 0 0 0 0 0
c000000000 rw-p 00000000 00:00 0 1945600 1943552 1943552 1943552 1943552 0 0 0 0 0 0 0 0 1
c076c00000 ---p 00000000 00:00 0 20480 0 0 0 0 0 0 0 0 0 0 0 0 1
7f88de80b000 rw-p 00000000 00:00 0 28764 28460 28460 28452 28460 0 0 0 0 0 0 0 0 1
7f88e0423000 rw-p 00000000 00:00 0 12056 12056 12056 12048 12056 0 0 0 0 0 0 0 0 1
7f88e0ff0000 rw-p 00000000 00:00 0 8332 8332 8332 8332 8332 0 0 0 0 0 0 0 0 1
7f88e181c000 rw-p 00000000 00:00 0 5708 5708 5708 5704 5708 0 0 0 0 0 0 0 0 1
7f88e1db0000 rw-p 00000000 00:00 0 3524 3512 3512 3512 3512 0 0 0 0 0 0 0 0 1
7f88e2126000 rw-p 00000000 00:00 0 2376 2376 2376 2376 2376 0 0 0 0 0 0 0 0 0
7f88e237c000 rw-p 00000000 00:00 0 1220 1220 1220 1220 1220 0 0 0 0 0 0 0 0 0
7f88e24b5000 rw-p 00000000 00:00 0 5444 5444 5444 5444 5444 0 0 0 0 0 0 0 0 1
7f88e2a06000 rw-p 00000000 00:00 0 1024 60 60 60 60 0 0 0 0 0 0 0 0 0
7f88e2b06000 rw-p 00000000 00:00 0 68 68 68 68 68 0 0 0 0 0 0 0 0 0
7f88e2b17000 rw-p 00000000 00:00 0 932 0 0 0 0 0 0 0 0 0 0 0 0 0
7f88e2c00000 rw-p 00000000 00:00 0 30720 2048 2048 2048 2048 0 0 0 0 0 0 0 0 1
7f88e4a00000 rw-p 00000000 00:00 0 1116 0 0 0 0 0 0 0 0 0 0 0 0 0
7f88e4b17000 ---p 00000000 00:00 0 263680 0 0 0 0 0 0 0 0 0 0 0 0 1
7f88f4c97000 rw-p 00000000 00:00 0 4 4 4 4 4 0 0 0 0 0 0 0 0 0
7f88f4c98000 ---p 00000000 00:00 0 524284 0 0 0 0 0 0 0 0 0 0 0 0 1
7f8914c97000 rw-p 00000000 00:00 0 4 4 4 4 4 0 0 0 0 0 0 0 0 0
7f8914c98000 ---p 00000000 00:00 0 293564 0 0 0 0 0 0 0 0 0 0 0 0 1
7f8926b47000 rw-p 00000000 00:00 0 4 4 4 4 4 0 0 0 0 0 0 0 0 0
7f8926b48000 ---p 00000000 00:00 0 36692 0 0 0 0 0 0 0 0 0 0 0 0 1
7f8928f1d000 rw-p 00000000 00:00 0 4 4 4 4 4 0 0 0 0 0 0 0 0 0
7f8928f1e000 ---p 00000000 00:00 0 4580 0 0 0 0 0 0 0 0 0 0 0 0 1
7f8929397000 rw-p 00000000 00:00 0 4 4 4 4 4 0 0 0 0 0 0 0 0 0
7f8929398000 ---p 00000000 00:00 0 508 0 0 0 0 0 0 0 0 0 0 0 0 0
7f8929417000 rw-p 00000000 00:00 0 384 48 48 48 48 0 0 0 0 0 0 0 0 0
7ffd655b8000 rw-p 00000000 00:00 0 132 12 12 12 12 0 0 0 0 0 0 0 0 0 [stack]
7ffd655d9000 r--p 00000000 00:00 0 16 0 0 0 0 0 0 0 0 0 0 0 0 0 [vvar]
7ffd655dd000 r-xp 00000000 00:00 0 8 4 0 4 0 0 0 0 0 0 0 0 0 0 [vdso]
ffffffffff600000 --xp 00000000 00:00 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 [vsyscall]
======= ======= ======= ========== ========= ======== ============== ============= ============== =============== ==== ======= ====== ===========
3207884 2025892 2024076 2025872 2013124 0 0 0 0 0 0 0 0 15 KB

When we restarted the process, the inital memory usage:
VmPeak: 1241896 kB
VmSize: 1241896 kB
VmLck: 0 kB
VmPin: 0 kB
VmHWM: 17612 kB
VmRSS: 17612 kB
RssAnon: 6968 kB
RssFile: 10644 kB
RssShmem: 0 kB
VmData: 45064 kB
VmStk: 132 kB
VmExe: 7628 kB
VmLib: 8 kB
VmPTE: 124 kB
VmSwap: 0 kB

Thank you for the help!

@pashagolub
Copy link
Collaborator

would you please check the #286 if it fixes the issue?

Thanks in advance!

@thelinuxracoon
Copy link
Author

@pashagolub thank you very much for the quick update!

We installed the fix on the server and are monitoring the RAM too see if the usage remains stable.

I'll post the results tomorrow.

@thelinuxracoon
Copy link
Author

@pashagolub it seems to be fine now.

Memory usage has remained stable since the update.

Is this fix going to be added to the 2.8 version, or will it be released in 2.9?
If so, when can we expect the release to be published?

Thanks again for the great help!

@pashagolub
Copy link
Collaborator

Thanks a lot!

This will be a new release. I'm working on it right now!

@klaci71
Copy link

klaci71 commented Dec 12, 2024

We also tested this update, it works well.
Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants