Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

av-store: granular pruning of data #7237

Closed
sandreim opened this issue May 16, 2023 · 6 comments
Closed

av-store: granular pruning of data #7237

sandreim opened this issue May 16, 2023 · 6 comments
Assignees
Labels
T4-parachains_engineering This PR/Issue is related to Parachains performance, stability, maintenance.

Comments

@sandreim
Copy link
Contributor

Currently we block the subsystem for whatever times it takes to prune the data both on timer and on finality. We should make this process more granular as to allow the processing of messages in between otherwise we can end up in a situation where this takes > 10s and node crashes due to SubsystemStalled error.

prune_all(&subsystem.db, &subsystem.config, &*subsystem.clock)?;

In the past I have seen very high times up to 10s on a few nodes, but not as of recently on tests with small PoVs.

Screenshot 2023-05-16 at 17 17 35
@sandreim sandreim added the T4-parachains_engineering This PR/Issue is related to Parachains performance, stability, maintenance. label May 16, 2023
@bkchr
Copy link
Member

bkchr commented May 16, 2023

Do you know what takes the time? The actual IO when applying the TX? Or the processing of what to delete? If it is the later, we should just move this to some background process.

@alexggh alexggh self-assigned this May 17, 2023
@sandreim
Copy link
Contributor Author

Do you know what takes the time? The actual IO when applying the TX? Or the processing of what to delete? If it is the later, we should just move this to some background process.

I am not sure which part exactly is heavy. Before moving this as a background task we'd have to be certain that we don't corrupt up the db with multiple writers/readers on same column.

@bkchr
Copy link
Member

bkchr commented May 17, 2023

If the db is Sync & Send, I would assume that you can read and write from multiple threads :P

@sandreim
Copy link
Contributor Author

Yes it is, but usually we do this from a single subsystem thread. Maybe in this case, it shouldn't be an issue as the keys should no longer be accessed by anything since this will happen after 25h hours.

alexggh added a commit that referenced this issue May 19, 2023
There are situations where pruning of the data could take more than a few
seconds and that might make the whole subsystem unreponsive. To avoid this just
move the prune process on a separate thread.

See: #7237, for more details.

Signed-off-by: Alexandru Gheorghe <[email protected]>
alexggh added a commit to alexggh/polkadot that referenced this issue May 22, 2023
There are situations where pruning of the data could take more than a few
seconds and that might make the whole subsystem unreponsive. To avoid this just
move the prune process on a separate thread.

See: paritytech#7237, for more details.

Signed-off-by: Alexandru Gheorghe <[email protected]>
alexggh added a commit to alexggh/polkadot that referenced this issue May 23, 2023
There are situations where pruning of the data could take more than a few
seconds and that might make the whole subsystem unreponsive. To avoid this just
move the prune process on a separate thread.

See: paritytech#7237, for more details.

Signed-off-by: Alexandru Gheorghe <[email protected]>
alexggh added a commit to alexggh/polkadot that referenced this issue May 23, 2023
There are situations where pruning of the data could take more than a few
seconds and that might make the whole subsystem unreponsive. To avoid this just
move the prune process on a separate thread.

See: paritytech#7237, for more details.

Signed-off-by: Alexandru Gheorghe <[email protected]>
@alexggh
Copy link
Contributor

alexggh commented May 23, 2023

Moving prune_all to a separate blocking task: #7263

alexggh added a commit to alexggh/polkadot that referenced this issue Jun 5, 2023
There are situations where pruning of the data could take more than a few
seconds and that might make the whole subsystem unreponsive. To avoid this just
move the prune process on a separate thread.

See: paritytech#7237, for more details.

Signed-off-by: Alexandru Gheorghe <[email protected]>
alexggh added a commit to alexggh/polkadot that referenced this issue Jun 6, 2023
There are situations where pruning of the data could take more than a few
seconds and that might make the whole subsystem unreponsive. To avoid this just
move the prune process on a separate thread.

See: paritytech#7237, for more details.

Signed-off-by: Alexandru Gheorghe <[email protected]>
paritytech-processbot bot pushed a commit that referenced this issue Jun 8, 2023
* av-store: Move prune on a separate thread

There are situations where pruning of the data could take more than a few
seconds and that might make the whole subsystem unreponsive. To avoid this just
move the prune process on a separate thread.

See: #7237, for more details.

Signed-off-by: Alexandru Gheorghe <[email protected]>

* av-store: Add log that prunning started

Signed-off-by: Alexandru Gheorghe <[email protected]>

* av-store: modify log severity

Signed-off-by: Alexandru Gheorghe <[email protected]>

---------

Signed-off-by: Alexandru Gheorghe <[email protected]>
@alexggh
Copy link
Contributor

alexggh commented Jun 9, 2023

Fixed with: #7263

@alexggh alexggh closed this as completed Jun 9, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
T4-parachains_engineering This PR/Issue is related to Parachains performance, stability, maintenance.
Projects
No open projects
Status: Done
Development

No branches or pull requests

3 participants