High Postgresql CPU usage #8978

ppanon2022 · 2024-06-26T17:07:38Z

ppanon2022
Jun 26, 2024

Hi,

Didn't notice this at first after upgrading to 2024.05, but we now seem to have very high CPU usage on the postgresql service. 2 of the CPUs are pinned. This was initially much higher. Someone else was running an LCM project promotion that was taking over a day. I restarted Taskomatic and the CPU usage maxed out as (I'm assuming) transactions were being rolled back. However after some hours the PG CPU usage dropped again but was still bottoming out at 3 CPUs max. After a full reboot, PG is still maxing out 2 cores, but there appeared to be no query activity on it.

SELECT * FROM pg_stat_activity;

was just returning a large number of worker processes with idle status and one of two queries.
16384 | uyuni | 25316 | | 16385 | uyuni_db | PostgreSQL JDBC Driver | 127.0.0.1 | | 37134 | 2024-06-26 12:58:20.914405-04 | | 2024-06-26 12:59:35.914384-04 | 2024-06-26 12:59:35.914485-04 | Client | ClientRead | idle | | | | select 'c3p0 ping' from dual | client backend
16384 | uyuni | 25875 | | 16385 | uyuni_db | PostgreSQL JDBC Driver | 127.0.0.1 | | 40924 | 2024-06-26 12:59:35.915078-04 | | 2024-06-26 13:00:49.200844-04 | 2024-06-26 13:00:49.20085-04 | Client | ClientRead | idle | | | | COMMIT | client backend

Now there are more processes running queries, but the PG usage is still steady at about 2 cores steady usage.

Even after all the above and the reboot, the GUI is still showing the LCM Project status as in the process of cloning channels

Any idea what's going on or suggestions for further investigation?

ppanon2022 · 2024-08-08T00:20:00Z

ppanon2022
Aug 8, 2024
Author

After some more investigation I found that the Postgresql processes with high CPU usage have pids that correspond to postgresql pg_stat_activity entries running
select * from rhn_server.update_needed_cache($1) as result

I've tried to interactively run select * from rhn_server.update_needed_cache(1) as result with various values (0, 1, 20) and they all return fast and with no results. Thus presumably the originating process is trying that query over and over again. I've tried killing the pg worker with
SELECT pg_cancel_backend(14687);
(when 14687 was one of the PG workers) and it just restarted/reconnected with a new PG worker pid taking up a full core. Using the source port to trace back to the process initiating the queries, it's a taskomatic java process (which I expected since it was restarting when the connection dropped or when the system was rebooted), but I don't know what to do next? How do I find what taskomatic task is causing this high DB usage and clean it up?

0 replies

ppanon2022 · 2024-08-08T08:19:51Z

ppanon2022
Aug 8, 2024
Author

I had the idea of looking at the Taskomatic log while running pg_cancel_backend on the 100% CPU PG worker thread, and I saw these entries. The canceling statement exception matches the pg_cancel_backend of course. But the question is why is the com.redhat.rhn.manager.errata.cache.UpdateErrataCacheCommand repeating that statement, and how can I fix it? Restarting Taskomatic doesn't stop it, it just picks up again after the restart. I think I remember a sync log error in one of the Redhat channels that an Errata field was too long, so currently that's my best bet as to the source of the problem. However, how would I go about clearing out whatever pending Errata update record is causing this error?

2024-08-08 04:11:26,388 [Thread-66] ERROR com.redhat.rhn.manager.errata.cache.UpdateErrataCacheCommand - Problem updating cache for server
com.redhat.rhn.common.db.WrappedSQLException: ERROR: canceling statement due to user request
Where: SQL statement "insert into rhnServerNeededCache
(server_id, errata_id, package_id, channel_id)
(select distinct sp.server_id, x.errata_id, p.id, x.channel_id
FROM (SELECT sp_sp.server_id, sp_sp.name_id,
sp_sp.package_arch_id, max(sp_pe.evr) AS max_evr
FROM rhnServerPackage sp_sp
join rhnPackageEvr sp_pe ON sp_pe.id = sp_sp.evr_id
GROUP BY sp_sp.server_id, sp_sp.name_id, sp_sp.package_arch_id) sp
join susePackageExcludingPartOfPtf p ON p.name_id = sp.name_id
join rhnPackageEvr pe ON pe.id = p.evr_id AND (sp.max_evr).type = (pe.evr).type AND sp.max_evr < pe.evr
join rhnPackageUpgradeArchCompat puac
ON puac.package_arch_id = sp.package_arch_id
AND puac.package_upgrade_arch_id = p.package_arch_id
join rhnServerChannel sc ON sc.server_id = sp.server_id
join rhnChannelPackage cp ON cp.package_id = p.id
AND cp.channel_id = sc.channel_id
left join (SELECT ep.errata_id, ce.channel_id, ep.package_id
FROM rhnChannelErrata ce
join rhnErrataPackage ep
ON ep.errata_id = ce.errata_id
join rhnServerChannel sc_sc
ON sc_sc.channel_id = ce.channel_id
WHERE sc_sc.server_id = server_id_in) x
ON x.channel_id = sc.channel_id AND x.package_id = cp.package_id
left join rhnErrata e on x.errata_id = e.id
where sp.server_id = server_id_in
and (x.errata_id IS NULL or e.advisory_status != 'retracted') -- packages which are part of a retracted errata should not be installed
and NOT EXISTS (SELECT 1 FROM suseServerAppStreamHiddenPackagesView WHERE sid = server_id_in AND pid = p.id))"
PL/pgSQL function rhn_server.update_needed_cache(numeric) line 5 at SQL statement
at com.redhat.rhn.common.translation.SqlExceptionTranslator.sqlException(SqlExceptionTranslator.java:39) ~[rhn.jar:?]
at com.redhat.rhn.common.db.datasource.CachedStatement.lambda$executeCallable$5(CachedStatement.java:611) ~[rhn.jar:?]
at org.hibernate.jdbc.WorkExecutor.executeReturningWork(WorkExecutor.java:55) ~[hibernate-core.jar:5.3.25.Final]
at org.hibernate.internal.SessionImpl$2.accept(SessionImpl.java:2421) ~[hibernate-core.jar:5.3.25.Final]
at org.hibernate.engine.jdbc.internal.JdbcCoordinatorImpl.coordinateWork(JdbcCoordinatorImpl.java:306) ~[hibernate-core.jar:5.3.25.Final]
at org.hibernate.internal.SessionImpl.doWork(SessionImpl.java:2428) ~[hibernate-core.jar:5.3.25.Final]
at org.hibernate.internal.SessionImpl.doReturningWork(SessionImpl.java:2424) ~[hibernate-core.jar:5.3.25.Final]
at com.redhat.rhn.common.db.datasource.CachedStatement.doWithStolenConnection(CachedStatement.java:942) ~[rhn.jar:?]
at com.redhat.rhn.common.db.datasource.CachedStatement.executeCallable(CachedStatement.java:593) ~[rhn.jar:?]
at com.redhat.rhn.common.db.datasource.CallableMode.execute(CallableMode.java:44) ~[rhn.jar:?]
at com.redhat.rhn.domain.server.ServerFactory.updateServerNeededCache(ServerFactory.java:277) ~[rhn.jar:?]
at com.redhat.rhn.manager.errata.cache.UpdateErrataCacheCommand.processServer(UpdateErrataCacheCommand.java:195) ~[rhn.jar:?]
at com.redhat.rhn.manager.errata.cache.UpdateErrataCacheCommand.updateErrataCacheForServer(UpdateErrataCacheCommand.java:105) ~[rhn.jar:?]
at com.redhat.rhn.taskomatic.task.errata.ErrataCacheWorker.run(ErrataCacheWorker.java:63) ~[rhn.jar:?]
at EDU.oswego.cs.dl.util.concurrent.PooledExecutor$Worker.run(PooledExecutor.java:732) ~[concurrent-1.3.4.jar:?]
at java.lang.Thread.run(Thread.java:829) ~[?:?]
Caused by: org.postgresql.util.PSQLException: ERROR: canceling statement due to user request
Where: SQL statement "insert into rhnServerNeededCache
(server_id, errata_id, package_id, channel_id)
(select distinct sp.server_id, x.errata_id, p.id, x.channel_id
FROM (SELECT sp_sp.server_id, sp_sp.name_id,
sp_sp.package_arch_id, max(sp_pe.evr) AS max_evr
FROM rhnServerPackage sp_sp
join rhnPackageEvr sp_pe ON sp_pe.id = sp_sp.evr_id
GROUP BY sp_sp.server_id, sp_sp.name_id, sp_sp.package_arch_id) sp
join susePackageExcludingPartOfPtf p ON p.name_id = sp.name_id
join rhnPackageEvr pe ON pe.id = p.evr_id AND (sp.max_evr).type = (pe.evr).type AND sp.max_evr < pe.evr
join rhnPackageUpgradeArchCompat puac
ON puac.package_arch_id = sp.package_arch_id
AND puac.package_upgrade_arch_id = p.package_arch_id
join rhnServerChannel sc ON sc.server_id = sp.server_id
join rhnChannelPackage cp ON cp.package_id = p.id
AND cp.channel_id = sc.channel_id
left join (SELECT ep.errata_id, ce.channel_id, ep.package_id
FROM rhnChannelErrata ce
join rhnErrataPackage ep
ON ep.errata_id = ce.errata_id
join rhnServerChannel sc_sc
ON sc_sc.channel_id = ce.channel_id
WHERE sc_sc.server_id = server_id_in) x
ON x.channel_id = sc.channel_id AND x.package_id = cp.package_id
left join rhnErrata e on x.errata_id = e.id
where sp.server_id = server_id_in
and (x.errata_id IS NULL or e.advisory_status != 'retracted') -- packages which are part of a retracted errata should not be installed
and NOT EXISTS (SELECT 1 FROM suseServerAppStreamHiddenPackagesView WHERE sid = server_id_in AND pid = p.id))"
PL/pgSQL function rhn_server.update_needed_cache(numeric) line 5 at SQL statement
at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2565) ~[postgresql.jar:42.2.25]
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2297) ~[postgresql.jar:42.2.25]
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:322) ~[postgresql.jar:42.2.25]
at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:481) ~[postgresql.jar:42.2.25]
at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:401) ~[postgresql.jar:42.2.25]
at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:164) ~[postgresql.jar:42.2.25]
at org.postgresql.jdbc.PgCallableStatement.executeWithFlags(PgCallableStatement.java:83) ~[postgresql.jar:42.2.25]
at org.postgresql.jdbc.PgPreparedStatement.execute(PgPreparedStatement.java:153) ~[postgresql.jar:42.2.25]
at com.mchange.v2.c3p0.impl.NewProxyCallableStatement.execute(NewProxyCallableStatement.java:4519) ~[c3p0-0.9.5.5.jar:0.9.5.5]
at com.redhat.rhn.common.db.NamedPreparedStatement.execute(NamedPreparedStatement.java:121) ~[rhn.jar:?]
at com.redhat.rhn.common.db.datasource.CachedStatement.lambda$executeCallable$5(CachedStatement.java:599) ~[rhn.jar:?]
... 14 more

0 replies

ppanon2022 · 2024-08-08T20:13:14Z

ppanon2022
Aug 8, 2024
Author

Definitely a problem with the errata cache task. The history for Bunch errata-cache-bunch shows most tasks getting skipped, the status is INTERRUPTED for the ones that were running when the server needed to be rebooted for updates. The currently running errata-cache task has been running for 41992 seconds.

I suppose I can disable the schedule for that task but then we won't get errata/patch information.

0 replies

ppanon2022 · 2024-08-12T23:34:30Z

ppanon2022
Aug 12, 2024
Author

I tried switching to debug logging for the errata tasks by adding to /usr/share/rhn/classes/log4j2.xml

        <Logger name="com.redhat.rhn.taskomatic.task.ErrataCacheTask" level="debug" />
        <Logger name="com.redhat.rhn.taskomatic.task.ErrataQueue" level="debug" />

by adjusting the obsolete troubleshooting instructions in the Taskomatic Wiki page
to the log4j2 config, and I didn't see any detail debug info in the logs.

So I switched to doing the whole task module, and I get the start of many workers

2024-08-12 19:40:22,861 [DefaultQuartzScheduler_Worker-13] DEBUG com.redhat.rhn.taskomatic.task.ErrataCacheTask - Starting run 4670780
2024-08-12 19:40:22,861 [DefaultQuartzScheduler_Worker-13] DEBUG com.redhat.rhn.taskomatic.task.ErrataCacheTask - Queue size (before run): 0
2024-08-12 19:40:22,913 [DefaultQuartzScheduler_Worker-13] INFO  com.redhat.rhn.taskomatic.task.ErrataCacheTask - In the queue: 309
2024-08-12 19:40:22,915 [DefaultQuartzScheduler_Worker-13] DEBUG com.redhat.rhn.taskomatic.task.ErrataCacheTask - Putting worker
2024-08-12 19:40:22,915 [DefaultQuartzScheduler_Worker-13] DEBUG com.redhat.rhn.taskomatic.task.ErrataCacheTask - Put worker
2024-08-12 19:40:22,915 [DefaultQuartzScheduler_Worker-13] DEBUG com.redhat.rhn.taskomatic.task.ErrataCacheTask - Putting worker
2024-08-12 19:40:22,916 [DefaultQuartzScheduler_Worker-13] DEBUG com.redhat.rhn.taskomatic.task.ErrataCacheTask - Put worker

a long list of Putting worker / Put worker lines

2024-08-12 19:40:23,017 [DefaultQuartzScheduler_Worker-13] DEBUG com.redhat.rhn.taskomatic.task.ErrataCacheTask - Putting worker
2024-08-12 19:40:23,017 [DefaultQuartzScheduler_Worker-13] DEBUG com.redhat.rhn.taskomatic.task.ErrataCacheTask - Put worker
2024-08-12 19:40:23,046 [Thread-80] DEBUG com.redhat.rhn.taskomatic.task.ErrataCacheTask - Updating errata cache for sid [1000011147]
2024-08-12 19:40:23,055 [Thread-79] DEBUG com.redhat.rhn.taskomatic.task.ErrataCacheTask - Updating errata cache for sid [1000011140]

0 replies

aaannz · 2024-08-13T07:37:34Z

aaannz
Aug 13, 2024
Collaborator

Thanks for this report and continued investigation. Indeed the update_needed_cache and similar query seems to be performing very badly recently. I am looking into it.

For the time being, can you try if calling ANALYZE VERBOSE rhnServer, rhnServerPackage, rhnPackageEvr, rhnPackageUpgradeArchCompat, rhnServerChannel, rhnChannelPackage, rhnChannelErrata, rhnErrataPackage, rhnErrata, rhnServerNeededCache; and see if it improves a bit?

6 replies

ppanon2022 Aug 14, 2024
Author

Well, one possible concern is that during the analyze verbose I get

INFO: analyzing "public.rhnserver"
INFO: "rhnserver": scanned 649 of 649 pages, containing 941 live rows and 51 dead rows; 941 rows in sample, 941 estimated total rows
INFO: analyzing "public.rhnserverpackage"
INFO: "rhnserverpackage": scanned 20745 of 20745 pages, containing 1912211 live rows and 19432 dead rows; 30000 rows in sample, 1912211 estimated total rows
INFO: analyzing "public.rhnpackageevr"
INFO: "rhnpackageevr": scanned 1403 of 1403 pages, containing 104016 live rows and 27 dead rows; 30000 rows in sample, 104016 estimated total rows
NOTICE: comparing incompatible evr types. Using rpm
NOTICE: comparing incompatible evr types. Using rpm
NOTICE: comparing incompatible evr types. Using deb
NOTICE: comparing incompatible evr types. Using deb
...
Skipping thousands of those NOTICE: comparing incompatible evr types lines
...
INFO: analyzing "public.rhnpackageupgradearchcompat"
INFO: "rhnpackageupgradearchcompat": scanned 1 of 1 pages, containing 127 live rows and 0 dead rows; 127 rows in sample, 127 estimated total rows
INFO: analyzing "public.rhnserverchannel"
INFO: "rhnserverchannel": scanned 51 of 51 pages, containing 5065 live rows and 171 dead rows; 5065 rows in sample, 5065 estimated total rows

Perhaps those incompatible evr types is breaking an index and causing a table scan?

ppanon2022 Aug 14, 2024
Author

Note that I've opened a bug report for this #9150
Further discussion should probably done there

ppanon2022 Aug 14, 2024
Author

P.S. Adding the index made no difference.

ppanon2022 Aug 14, 2024
Author

Well, I've upgraded the system to 2024.07 and I don't seem to be having the High CPU usage anymore. I don't know whether that was due to 2024.07 or a combination of that and the added index, but performance seems reasonable again. The errata-cache job is running and has been running awhile, but the CPU usage is low. Still get the NOTICEs when running VACUUM ANALYZE rhnPackageEvr; so whatever it was was not related to those errors.

aaannz Aug 15, 2024
Collaborator

There was this PR merged which also influence errata query - #9065

I am not sure if it was already included in 2024.07, but if so then this might be what fixed that.
I am going to take a look into that EVR comparison.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High Postgresql CPU usage #8978

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 5 comments 6 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

High Postgresql CPU usage #8978

ppanon2022 Jun 26, 2024

Replies: 5 comments · 6 replies

ppanon2022 Aug 8, 2024 Author

ppanon2022 Aug 8, 2024 Author

ppanon2022 Aug 8, 2024 Author

ppanon2022 Aug 12, 2024 Author

aaannz Aug 13, 2024 Collaborator

ppanon2022 Aug 14, 2024 Author

ppanon2022 Aug 14, 2024 Author

ppanon2022 Aug 14, 2024 Author

ppanon2022 Aug 14, 2024 Author

aaannz Aug 15, 2024 Collaborator

ppanon2022
Jun 26, 2024

Replies: 5 comments 6 replies

ppanon2022
Aug 8, 2024
Author

ppanon2022
Aug 8, 2024
Author

ppanon2022
Aug 8, 2024
Author

ppanon2022
Aug 12, 2024
Author

aaannz
Aug 13, 2024
Collaborator

ppanon2022 Aug 14, 2024
Author

ppanon2022 Aug 14, 2024
Author

ppanon2022 Aug 14, 2024
Author

ppanon2022 Aug 14, 2024
Author

aaannz Aug 15, 2024
Collaborator