Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CarWriter sink #3461

Merged
merged 54 commits into from
Sep 18, 2023
Merged

Add CarWriter sink #3461

merged 54 commits into from
Sep 18, 2023

Conversation

elmattic
Copy link
Contributor

@elmattic elmattic commented Sep 5, 2023

Summary of changes

Changes introduced in this pull request:

Reference issue to close (if applicable)

Closes #3192

Other information and links

Change checklist

  • I have performed a self-review of my own code,
  • I have made corresponding changes to the documentation,
  • I have added tests that prove my fix is effective or that my feature works (if possible),
  • I have made sure the CHANGELOG is up-to-date. All user-facing changes should be reflected in this document.

src/utils/db/car_stream.rs Outdated Show resolved Hide resolved
@elmattic elmattic requested a review from aatifsyed September 6, 2023 15:18
Copy link
Contributor

@lemmih lemmih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. We need to delete all references to fvm_ipld_car but that can be done in a follow-up PR.

@elmattic
Copy link
Contributor Author

elmattic commented Sep 12, 2023

I found the (one?) source of the non-determinism. Looks like if I remove the call to poll_ready in poll_flush method, I have a solid file size of 2432050 bytes (run the command around 20 times).

This is still puzzling me because the previous file created using write_stream_async method is giving me 2433114 bytes.

However I find that this file doesn't deflate well:

$ zstd -d actor_bundles.car.zst
ctor_bundles.car.zst : 0 B...     ctor_bundles.car.zst : Read error (39) : premature end 
$ zstd --test actor_bundles.car.zst 
ctor_bundles.car.zst : 0 B...     ctor_bundles.car.zst : Read error (39) : premature end 

This is not the case for the new one, and I have same size and checksum once deflated (compared to the old method but without compression):

$ zstd -d actor_bundles.car.zst
actor_bundles.car.zst: 63329900 bytes                                          
$ sha256sum actor_bundles.car
2c5b55a53ab84bbb602cfe4005ad0a4ab38c84ca6298f93ec0a35794f7d4ffa8  actor_bundles.car

@lemmih
Copy link
Contributor

lemmih commented Sep 12, 2023

Could you write a QuickCheck test that passes data through CarWriter and CarStream, making sure the data isn't changed?

@elmattic
Copy link
Contributor Author

I've did some experiments and managed to match the 2433114 file size consistently (so the size we get using the write_stream_async method) by forcing a flush after poll_write so returning self.project().inner.poll_flush(cx) instead of Poll::Ready(Ok(())). Had to remove poll_shutdown in close method though otherwise it panics (flush after shutdown error).

But then trying to deflate or --test the compressed file I get back to same error: Read error (39) : premature end.

@aatifsyed
Copy link
Contributor

I'm not sure about this implementation approach - I'll get back to you in a moment

Copy link
Contributor

@aatifsyed aatifsyed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4f8ce1e

Can you have a look at this, and then we'll chat

@lemmih
Copy link
Contributor

lemmih commented Sep 14, 2023

I'm not sure about this implementation approach - I'll get back to you in a moment

I think the approach is fine. Offloading frame encoding to unsigned_varint/tokio-utils is obviously preferable, but it's not a different approach.

Copy link
Contributor

@aatifsyed aatifsyed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's refactor this once Nullus157/async-compression#246 is fixed

@elmattic elmattic added this pull request to the merge queue Sep 18, 2023
Merged via the queue into main with commit 6ef0ea9 Sep 18, 2023
22 checks passed
@elmattic elmattic deleted the elmattic/car-writer-sink branch September 18, 2023 18:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Consider struct CarWriter: futures::Sink
3 participants