Performance of Purge Operation #3288
-
For our deployment, we have a db that contains lots of docs created rapidly that are highly transient. We only need them for a couple days and then attempt to purge anything older than that time period to only keep a couple days worth of data. Our job that does the purging can only get throughput of about 2000 document purges per minute. The way the job works is there are two workers in parallel making batch purge requests (100 docs per request) and each worker can get through about 10 requests per minute, hence the 2000 number. If I add a third worker in parallel, you start to get internal server errors on some of the requests (in the batch request some of the purges succeed and some fail). So it seems like I'm hitting a performance bottleneck here. I guess my question is, is this expected that purge is such an expensive operation that you can only get through this many documents per minute? If this is not expected then I guess I should start looking at the hardware we're using and configurations. To continue using the db, we're going to need to scale way past this amount of purging per minute over the next year. I know that the recommended approach from the developers is to delete docs and then periodically replicate the db without the deleted docs and then make the replicated db the new live db. But as far as I understand it, there's not a way to do that without taking down time even if brief? Or is there some away to achieve this. As of now, it seems like our only option is to do purges. edit: I will add we mostly use pretty much all default configs |
Beta Was this translation helpful? Give feedback.
Replies: 5 comments 5 replies
-
hey @bdoyle0182 thanks for interesting on using purge operation against documents in database. Compared to recreation of new database, I think that purge operation doesn't bring down time. Just about performance of purge operation, 2000 document purges per minute seems not to be expected. From technical point of view, purge operation will return once document was purged while secondary index or other indexes might be purged later using purge tree. You may want to make comparison between CURD operation and purge operation in the same configuration and see whether there is similar result, or only related to purge operation. |
Beta Was this translation helpful? Give feedback.
-
So you're suggesting that the purge operation technically should be no different from CRUD operations on the backend (ignoring cleaning up the index since that sounds like a background operation post request)? Our CRUD operations are definitely more performant we get thousands of reads per minute. It's possible we're hitting an i/o bottleneck though on our hardware. |
Beta Was this translation helpful? Give feedback.
-
This is what I'm seeing on the server: then a bunch of doc ids listed |
Beta Was this translation helpful? Give feedback.
-
The performance improved significantly upon recreating the db and having a much smaller view tied to the db that was being purged. So it seems like the size of the view was causing problems |
Beta Was this translation helpful? Give feedback.
-
Glad to know. Actually, it is async to purge primary data of database and secondary index. |
Beta Was this translation helpful? Give feedback.
The performance improved significantly upon recreating the db and having a much smaller view tied to the db that was being purged. So it seems like the size of the view was causing problems