Performance of Purge Operation #3288

bdoyle0182 · 2020-12-03T01:12:58Z

bdoyle0182
Dec 3, 2020

For our deployment, we have a db that contains lots of docs created rapidly that are highly transient. We only need them for a couple days and then attempt to purge anything older than that time period to only keep a couple days worth of data.

Our job that does the purging can only get throughput of about 2000 document purges per minute. The way the job works is there are two workers in parallel making batch purge requests (100 docs per request) and each worker can get through about 10 requests per minute, hence the 2000 number. If I add a third worker in parallel, you start to get internal server errors on some of the requests (in the batch request some of the purges succeed and some fail). So it seems like I'm hitting a performance bottleneck here.

I guess my question is, is this expected that purge is such an expensive operation that you can only get through this many documents per minute? If this is not expected then I guess I should start looking at the hardware we're using and configurations. To continue using the db, we're going to need to scale way past this amount of purging per minute over the next year.

I know that the recommended approach from the developers is to delete docs and then periodically replicate the db without the deleted docs and then make the replicated db the new live db. But as far as I understand it, there's not a way to do that without taking down time even if brief? Or is there some away to achieve this. As of now, it seems like our only option is to do purges.

edit: I will add we mostly use pretty much all default configs

Answered by bdoyle0182

Dec 20, 2020

The performance improved significantly upon recreating the db and having a much smaller view tied to the db that was being purged. So it seems like the size of the view was causing problems

View full answer

jiangphcn · 2020-12-03T03:05:00Z

jiangphcn
Dec 3, 2020
Collaborator

hey @bdoyle0182 thanks for interesting on using purge operation against documents in database. Compared to recreation of new database, I think that purge operation doesn't bring down time. Just about performance of purge operation, 2000 document purges per minute seems not to be expected. From technical point of view, purge operation will return once document was purged while secondary index or other indexes might be purged later using purge tree. You may want to make comparison between CURD operation and purge operation in the same configuration and see whether there is similar result, or only related to purge operation.

0 replies

bdoyle0182 · 2020-12-03T16:05:49Z

bdoyle0182
Dec 3, 2020
Author

So you're suggesting that the purge operation technically should be no different from CRUD operations on the backend (ignoring cleaning up the index since that sounds like a background operation post request)? Our CRUD operations are definitely more performant we get thousands of reads per minute. It's possible we're hitting an i/o bottleneck though on our hardware.

4 replies

bdoyle0182 Dec 3, 2020
Author

Here's an example response when some fail in the batch

{ "error" : "case_clause", "reason" : "{error,[{error,internal_server_error},\n {ok,[{1,\n <<227,29,37,226,30,183,94,228,21,242,108,119,103,12,112,59>>}]},\n {accepted,[{1,\n <<15,76,183,11,29,202,96,46,27,105,130,252,140,243,75,\n 153>>}]},\n {accepted,[{1,\n <<30,60,154,138,255,243,156,193,114,103,145,200,27,145,\n 191,66>>}]},\n {accepted,[{1,\n <<239,96,236,36,65,127,136,106,9,174,78,246,7,5,136,111>>}]},\n {accepted,[{1,\n <<100,206,17,154,127,44,251,27,254,249,36,124,92,213,24,\n 207>>}]},\n {accepted,[{1,\n <<196,179,134,165,226,197,100,212,25,58,16,33,120,70,73,\n 132>>}]},\n {ok,[{1,<<20,1,139,51,32,68,76,113,184,81,200,222,149,126,34,203>>}]},\n {error,internal_server_error},\n {accepted,[{1,\n <<25,105,12,123,68,56,188,48,45,85,114,249,55,241,53,102>>}]},\n {accepted,[{1,\n <<181,107,83,217,65,96,146,204,168,225,98,221,160,162,\n 151,94>>}]},\n {ok,[{1,\n <<237,120,205,12,206,249,110,249,91,17,137,244,205,129,102,1>>}]},\n {accepted,[{1,<<\"�D�3�� h8��ȡ">>}]},\n {accepted,[{1,<<8,18,45,194,109,160,83,81,78,92,209,8,78,51,2,118>>}]},\n {accepted,[{1,\n <<222,250,73,244,16,35,126,27,229,249,73,98,95,34,38,162>>}]},\n {ok,[{1,<<206,228,60,176,84,142,203,7,223,14,78,141,38,232,54,171>>}]},\n {ok,[{1,\n <<120,148,104,184,161,110,177,75,152,93,225,213,116,135,111,\n 188>>}]},\n {ok,[{1,\n <<65,158,190,127,118,185,183,111,189,211,156,213,143,134,95,35>>}]},\n {error,internal_server_error},\n {ok,[{1,\n <<181,199,91,241,25,13,144,44,221,228,101,219,16,154,235,251>>}]},\n {ok,[{1,\n <<87,169,81,32,74,124,83,222,119,227,164,61,36,179,252,151>>}]},\n {ok,[{1,\n <<147,195,38,193,238,111,29,243,162,74,75,179,167,184,6,187>>}]},\n {ok,[{1,<<99,14,189,151,192,82,88,226,179,86,91,239,94,39,237,237>>}]},\n {ok,[{1,\n <<137,146,229,123,182,162,58,140,137,178,80,150,48,235,76,65>>}]},\n {error,internal_server_error},\n {ok,[{1,\n <<139,239,39,225,167,189,214,64,127,207,126,6,138,84,148,36>>}]},\n {ok,[{1,\n <<119,239,222,120,129,255,41,235,220,215,200,76,255,226,141,16>>}]},\n {ok,[{1,<<106,129,23,191,31,14,235,0,52,11,80,156,150,240,28,32>>}]},\n {ok,[{1,\n <<100,109,44,200,17,47,17,102,206,217,0,15,211,102,222,165>>}]},\n {error,internal_server_error},\n {ok,[{1,\n <<124,251,59,152,132,98,26,194,31,129,179,111,114,152,172,14>>}]},\n {ok,[{1,\n <<215,175,50,71,253,188,162,138,211,65,95,28,194,219,23,79>>}]},\n {ok,[{1,\n <<241,155,144,31,225,169,225,27,17,9,223,133,102,67,171,50>>}]},\n {error,internal_server_error},\n {ok,[{1,<<220,244,168,171,113,180,188,53,33,156,73,76,62,66,18,63>>}]},\n {error,internal_server_error},\n {ok,[{1,\n <<77,160,39,88,166,98,31,157,178,100,245,231,97,206,229,13>>}]},\n {ok,[{1,<<134,74,249,11,206,91,52,33,4,84,130,195,94,3,83,194>>}]},\n {ok,[{1,<<211,80,83,61,209,39,46,120,33,222,4,204,169,32,149,11>>}]},\n {accepted,[{1,<<88,13,17,15,79,0,178,39,3,50,0,41,66,109,152,104>>}]},\n {ok,[{1,<<47,137,226,161,3,117,67,59,176,240,32,113,92,150,81,237>>}]},\n {error,internal_server_error},\n {ok,[{1,\n <<122,251,226,140,131,184,212,111,240,234,38,28,111,57,133,81>>}]},\n {ok,[{1,<<"\v�m��iG�uk�.�/u8">>}]},\n {accepted,[{1,\n <<186,54,35,170,216,129,200,245,222,21,135,99,210,52,14,\n 133>>}]},\n {ok,[{1,<<"<�+�g\e�\\�\rP��T�w">>}]},\n {ok,[{1,<<219,9,252,9,46,70,129,240,52,197,52,127,19,200,32,213>>}]},\n {ok,[{1,\n <<139,112,17,248,219,110,107,171,147,119,241,123,226,129,110,\n 71>>}]},\n {ok,[{1,\n <<164,235,103,47,184,34,43,184,31,161,142,53,11,192,173,47>>}]},\n {ok,[{1,\n <<76,246,122,250,133,126,26,94,127,162,211,2,23,204,36,115>>}]},\n {ok,[{1,\n <<251,226,74,195,212,19,127,73,255,189,85,252,46,237,200,6>>}]},\n {accepted,[{1,\n <<112,57,106,158,149,184,91,103,116,185,111,132,232,86,\n 79,242>>}]},\n {ok,[{1,\n <<161,188,68,163,63,66,189,123,240,111,141,159,162,39,35,64>>}]},\n {ok,[{1,<<26,181,170,168,174,184,91,154,0,85,109,6,160,30,182,74>>}]},\n {ok,[{1,<<6,7,37,241,123,89,204,103,198,169,62,119,89,85,190,133>>}]},\n {accepted,[{1,\n <<31,165,149,102,205,201,90,34,159,151,187,149,102,139,\n 152,24>>}]},\n {ok,[{1,\n <<144,170,11,248,25,155,176,114,108,201,56,67,248,119,102,48>>}]},\n {ok,[{1,\n <<218,102,22,245,93,201,238,110,240,84,88,114,12,220,19,219>>}]},\n {error,internal_server_error},\n {accepted,[{1,\n <<98,133,20,89,211,106,105,132,215,64,238,249,189,22,\n 120,21>>}]},\n {ok,[{1,\n <<163,170,36,17,235,166,66,237,71,194,134,11,123,196,39,178>>}]},\n {ok,[{1,\n <<237,243,211,65,66,19,10,33,198,216,126,88,60,175,138,230>>}]},\n {accepted,[{1,\n <<122,166,209,233,109,176,240,120,114,55,227,9,115,2,\n 166,37>>}]},\n {ok,[{1,<<5,64,7,46,23,93,4,148,221,66,170,56,147,22,171,234>>}]},\n {ok,[{1,\n <<254,89,224,151,211,236,76,222,99,191,56,101,248,239,124,75>>}]},\n {accepted,[{1,\n <<114,137,135,251,171,233,250,196,42,194,92,17,214,151,\n 191,108>>}]},\n {ok,[{1,<<76,29,71,210,74,220,145,212,98,46,241,43,7,95,240,22>>}]},\n {ok,[{1,\n <<139,116,56,100,182,209,137,85,255,0,178,123,17,69,60,244>>}]},\n {accepted,[{1,\n <<150,175,94,149,183,118,41,49,75,113,161,204,86,153,\n 140,23>>}]},\n {ok,[{1,<<17,76,243,223,86,79,168,12,70,223,98,20,167,78,124,125>>}]},\n {ok,[{1,\n <<166,158,54,220,1,122,78,190,229,209,31,238,79,146,121,208>>}]},\n {ok,[{1,<<"��z^\r>��zCz\"�Q">>}]},\n {ok,[{1,\n <<126,65,132,173,62,197,236,226,48,140,171,72,206,61,229,6>>}]},\n {ok,[{1,<<150,23,198,89,45,63,68,102,6,152,162,61,246,112,87,203>>}]},\n {ok,[{1,\n <<123,126,129,150,133,188,17,210,217,196,103,164,1,204,56,249>>}]},\n {ok,[{1,<<210,121,210,98,194,46,47,4,52,78,187,151,131,72,182,100>>}]},\n {ok,[{1,\n <<240,24,175,219,180,197,186,57,207,135,175,23,104,234,182,133>>}]},\n {accepted,[{1,\n <<91,202,17,208,130,116,172,201,123,113,206,5,162,106,\n 224,31>>}]},\n {accepted,[{1,\n <<47,161,210,1,241,48,123,12,7,65,97,71,10,125,113,209>>}]},\n {ok,[{1,\n <<160,233,137,121,84,87,120,114,88,79,57,197,62,90,165,129>>}]},\n {ok,[{1,<<130,240,44,155,130,161,207,87,74,130,63,39,75,60,52,169>>}]},\n {accepted,[{1,\n <<144,125,21,119,206,247,15,78,14,220,88,101,51,143,232,\n 142>>}]},\n {error,internal_server_error},\n {ok,[{1,\n <<116,185,187,139,45,43,241,197,247,185,232,175,100,172,213,75>>}]},\n {error,internal_server_error},\n {ok,[{1,\n <<188,223,31,204,31,135,255,122,61,158,190,145,143,207,99,161>>}]},\n {ok,[{1,\n <<215,34,225,245,194,140,163,180,76,50,23,214,86,107,184,212>>}]},\n {ok,[{1,\n <<219,59,165,14,96,160,78,155,81,194,80,158,124,241,63,226>>}]},\n {ok,[{1,\n <<111,121,164,130,20,192,68,252,245,231,232,224,31,125,204,228>>}]},\n {ok,[{1,\n <<195,222,161,145,54,245,130,248,230,101,88,137,178,110,65,180>>}]},\n {accepted,[{1,\n <<12,174,101,224,20,214,214,208,105,48,17,55,61,167,112,\n 24>>}]},\n {accepted,[{1,\n <<111,137,173,139,116,181,8,194,254,83,149,136,69,79,\n 189,188>>}]},\n {error,internal_server_error},\n {error,internal_server_error},\n {ok,[{1,<<105,58,35,17,175,209,174,2,47,218,34,200,220,92,53,42>>}]},\n {ok,[{1,\n <<190,60,236,14,101,253,175,64,144,249,149,154,216,76,204,20>>}]},\n {ok,[{1,<<86,20,233,108,174,52,38,13,74,70,9,242,134,47,3,45>>}]},\n {error,internal_server_error},\n {ok,[{1,\n <<102,122,173,110,157,137,241,55,248,246,233,34,194,97,176,201>>}]},\n {accepted,[{1,\n <<118,183,92,43,212,2,145,37,163,228,201,180,147,81,\n 141,192>>}]}]}",
"ref" : 69914851
}`

bdoyle0182 Dec 3, 2020
Author

Also the way we retrieve the docs to purge is through a view query so I wonder if we re-query some of the same docs that we purged on a previous request because they haven't been purged from the view yet. Is that possible?

jiangphcn Dec 5, 2020
Collaborator

Our CRUD operations are definitely more performant we get thousands of reads per minute. It's possible we're hitting an i/o bottleneck though on our hardware.

You mentioned that 2000 document purges per minute. Compared to thousands of reads per minute, it is possible that the speed of reads is faster than the speed of purges because read is faster than writer. Just thinking how much order of magnitude between them.

Also the way we retrieve the docs to purge is through a view query so I wonder if we re-query some of the same docs that we purged on a previous request because they haven't been purged from the view yet. Is that possible?

Because purge is not always synchronized operation, it might cause issue if we re-issue purge request against the same document id. Is it possible to leave some time window so that purge operation against one document completed totally to avoid conflict?

bdoyle0182 Dec 7, 2020
Author

Yea reads was probably a bad example since they're expected to be faster than writing. But comparing to writes, they're still much more performant than purging because we're writing way more docs than we're purging per minute; which is why I'm trying to figure this out since we're creating a backlog.

The re-issue purge request is interesting I can explore that. Because we do a query on the view and then attempt to purge them over about a five minute window, then requery the view again. If the view hasn't been purged of those docs then it will requery some of the same docs and attempt to purge them again.

bdoyle0182 · 2020-12-04T16:50:58Z

bdoyle0182
Dec 4, 2020
Author

This is what I'm seeing on the server:
error <0.21058.459> 7a8b39c7e8 rexi_server: from: *host*(<0.18861.456>) mfa: fabric_rpc:purge_docs/3 exit:{timeout,{gen_server,call,[<0.436.0>,{purge_docs,

then a bunch of doc ids listed

1 reply

nickva Aug 10, 2022
Collaborator

This particular timeout was fixed in #4143

bdoyle0182 · 2020-12-20T17:56:56Z

bdoyle0182
Dec 20, 2020
Author

The performance improved significantly upon recreating the db and having a much smaller view tied to the db that was being purged. So it seems like the size of the view was causing problems

0 replies

jiangphcn · 2020-12-21T02:13:07Z

jiangphcn
Dec 21, 2020
Collaborator

Glad to know. Actually, it is async to purge primary data of database and secondary index.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance of Purge Operation #3288

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 5 comments 5 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Performance of Purge Operation #3288

bdoyle0182 Dec 3, 2020

Replies: 5 comments · 5 replies

jiangphcn Dec 3, 2020 Collaborator

bdoyle0182 Dec 3, 2020 Author

bdoyle0182 Dec 3, 2020 Author

bdoyle0182 Dec 3, 2020 Author

jiangphcn Dec 5, 2020 Collaborator

bdoyle0182 Dec 7, 2020 Author

bdoyle0182 Dec 4, 2020 Author

nickva Aug 10, 2022 Collaborator

bdoyle0182 Dec 20, 2020 Author

jiangphcn Dec 21, 2020 Collaborator

bdoyle0182
Dec 3, 2020

Replies: 5 comments 5 replies

jiangphcn
Dec 3, 2020
Collaborator

bdoyle0182
Dec 3, 2020
Author

bdoyle0182 Dec 3, 2020
Author

bdoyle0182 Dec 3, 2020
Author

jiangphcn Dec 5, 2020
Collaborator

bdoyle0182 Dec 7, 2020
Author

bdoyle0182
Dec 4, 2020
Author

nickva Aug 10, 2022
Collaborator

bdoyle0182
Dec 20, 2020
Author

jiangphcn
Dec 21, 2020
Collaborator