Writes to Mongodb in Actionrunner are slow and cpu intensive with large documents #4798

jdmeyer3 · 2019-10-02T17:53:17Z

SUMMARY

When an action return ~4MB of data the actionrunner writing to MongoDB takes ~20 seconds for each document. During the writes, the CPU utilization spikes to 100%.

STACKSTORM VERSION

St2 v3.1.0

OS: CentOS 7.7.1908 Kernel 3.10.0-957.el7.x86_64
Kubernetes v1.14.1
Docker 19.03.2
Base Docker Image: CentOS 7.6.1810 (custom image)

Steps to reproduce the problem

In a Python 3 pack, create a python action with the following code

#!/usr/bin/env python3

import json

import requests
import datetime
from st2common.runners.base_action import Action


class RequestLargeDataset(Action):
    """
    Retrieves ~4MB of json data
    """

    def run(self, **kwargs):
        dataset = requests.get('https://data.ct.gov/api/views/rybz-nyjw/rows.json?accessType=DOWNLOAD')
        data = json.loads(dataset.text)
        return data

Expected Results

Low CPU utilization and the result returning within a relatively short time (<5 seconds)

Actual Results

CPU spiked to 100% for 20-30 seconds

adding some logs around st2actions.container.base.py:296 with

    def _update_status(self, liveaction_id, status, result, context):
        try:
            LOG.error("updating liveaction")
            LOG.debug('Setting status: %s for liveaction: %s', status, liveaction_id)
            liveaction_db, state_changed = self._update_live_action_db(
                liveaction_id, status, result, context)
            LOG.error("done updating liveaction")
        except Exception as e:
            LOG.exception(
                'Cannot update liveaction '
                '(id: %s, status: %s, result: %s).' % (
                    liveaction_id, status, result)
            )
            raise e

        try:
            LOG.error('updating execution')
            executions.update_execution(liveaction_db, publish=state_changed)
            LOG.error('done updating execution')
            extra = {'liveaction_db': liveaction_db}
            LOG.debug('Updated liveaction after run', extra=extra)
        except Exception as e:
            LOG.exception(
                'Cannot update action execution for liveaction '
                '(id: %s, status: %s, result: %s).' % (
                    liveaction_id, status, result)
            )
            raise e

        return liveaction_db

I'm getting logs like

2019-10-02 18:17:23,854 ERROR [-] updating liveaction
2019-10-02 18:17:28,963 ERROR [-] done updating liveaction
2019-10-02 18:17:28,964 ERROR [-] updating execution
2019-10-02 18:17:33,268 ERROR [-] done updating execution

Where it takes 10 seconds to update liveaction and another 5 seconds to update execution.

Additionally, I've done a cProfile against the actionrunner and I'm seeing the following

It looks like the CPU is spending most of its time building the mongoengine document object. May be related to MongoEngine/mongoengine#1230

The text was updated successfully, but these errors were encountered:

Kami · 2019-10-02T18:55:19Z

Thanks for reporting this.

There we indeed multiple mongoengine issues related to writing large documents in the past.

Over the releases, the performance was improved in mongoengine itself and we've also done some performance optimizations, but it could be that there are more issues hiding (would need to dig in).

In the past, we also discussed an option of skipping mongoengine serialization / conversion layer all together and using pymongo directly in critical code paths. But this would require quite some code changes.

Kami · 2021-02-21T21:40:15Z

I know this issue has been opened quite a long time ago, but the problem hasn't gone away yet and I finally had a chance to work on #4846 and made good progress on it.

Initial micro benchmarks and end to end load tests are very promising - we see database write speed improvements up to 10x and up to 5x for reads when writing / reading execution results - all of that of course also translates to much faster action execution and workflow completion times for executions / workflows which operate on larger datasets.

Some more data and numbers is available here - #4846 (comment), #4846 (comment), #4846 (comment).

I hope there will be no last minute surprises and we will be able to include those changes with v3.5.0 release.

Kami · 2021-02-21T21:49:23Z

Oh, and when I was testing the fix end to end with Python runner action which returns 4 MB of data (very similar to the example scenario you used), the whole action execution with the new code on my computer now takes ~1 second vs ~12 seconds with current master.

This includes the whole action execution flow end to end - from execution being scheduled, action ran by the Python runner and all the state persisted to the database (action execution, live action objects, etc.). Actual database writes itself for those two models are now in the range of ~200 ms range.

amanda11 · 2021-06-29T14:34:47Z

As per comments above, this should be much improved in 3.5.0.

arm4b · 2021-08-22T13:46:34Z

I really think we can close this now with the #4846 performance improvements by @Kami.

@jdmeyer3 Feel free to re-open it if you're still seeing an unacceptable performance with the st2 v3.5.0.

arm4b added the performance label Oct 2, 2019

arm4b added the bug label Oct 10, 2019

jdmeyer3 mentioned this issue Oct 17, 2019

fixing subprocess to use system buffer instead of being unbuffered #4803

Merged

arm4b added the enhancement label Feb 14, 2021

arm4b mentioned this issue Feb 14, 2021

[RFC] Switching from json + ujson to orjson JSON serialization and de-serialization library StackStorm/community#65

Closed

3 tasks

Kami self-assigned this Feb 21, 2021

Kami added this to the 3.5.0 milestone Feb 21, 2021

arm4b closed this as completed Aug 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Writes to Mongodb in Actionrunner are slow and cpu intensive with large documents #4798

Writes to Mongodb in Actionrunner are slow and cpu intensive with large documents #4798

jdmeyer3 commented Oct 2, 2019 •

edited

Loading

Kami commented Oct 2, 2019 •

edited

Loading

Kami commented Feb 21, 2021

Kami commented Feb 21, 2021

amanda11 commented Jun 29, 2021

arm4b commented Aug 22, 2021

Writes to Mongodb in Actionrunner are slow and cpu intensive with large documents #4798

Writes to Mongodb in Actionrunner are slow and cpu intensive with large documents #4798

Comments

jdmeyer3 commented Oct 2, 2019 • edited Loading

SUMMARY

STACKSTORM VERSION

Steps to reproduce the problem

Expected Results

Actual Results

Kami commented Oct 2, 2019 • edited Loading

Kami commented Feb 21, 2021

Kami commented Feb 21, 2021

amanda11 commented Jun 29, 2021

arm4b commented Aug 22, 2021

jdmeyer3 commented Oct 2, 2019 •

edited

Loading

Kami commented Oct 2, 2019 •

edited

Loading