-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add basic write support #45
base: main
Are you sure you want to change the base?
Conversation
wrt 1 - you mean writing directly to s3? Does s3 allow random access of the already open file (we would need to go back to the directory to update file positions i would think) |
AFAIK, S3 doesn't support seek operations or partial updates, but with a multipart upload, you can defer uploading header and root directory. So a direct upload to S3 would be possble. But this has no high priority for me. |
TIL, thx. And lets follow YAGNI - if someone wants it, they will add it :) |
This is now ready for a review. I realized an important property, which I want to add as a followup: deduplicating non-subsequent tiles. This has performance drawbacks when reading, but probably reduces the file size substantially. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks pretty good! I left a few thoughts
Regarding the failed semver check:
Shall I add |
Might as well bump the version |
Why do you want to make it non-exhaustive? I am ok to break compat - not a biggie tbh |
but come to think of it, errors are a pretty good candidate for non-exhaustive, so yeah, lets keep it going forward... but enabling it would still be a breaking change, right? |
one more thing - i think |
|
Ohhh! This is great. Just seeing the PR now for some reason. I'll give it a review shortly. Thank you @pka! |
About 3 times faster and very similar compression rate for MVTs
mod tile; | ||
mod writer; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you clarify why cfg(__async) was removed here? We really ought to make this PR behind a feature flag
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was glad to reduce the cfg(__)
complexity a bit with 72f5762 - it reminded me of the bad old C/++ #ifndef
days. IMO, a pmtile library should come with its core functionality (like calculating a tile id) and dependencies (like flate2) by default. I understand the feature flags for the different backend implementations with heavy or specialized dependencies, but not for everything. That said, if the maintainers prefer a feature flag for writing pmtile files, I can give it a try.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that we shouldn't go overboard on features. My only concern is that if we have additional dependencies due to write support, it is better to make write - optional, simply because write is a relatively rare usage compare to read. My understanding is that you need compression, which might be a significant extra compilation burden - and that IMO should not be levied on users unless needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree a feature would be nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First of all, thanks for setting this all up! I'm glad we're getting this functionality added.
I'd like to understand how much impact having the leaf directories at the end of the archive would have on performance. Hopefully it's not too much, because this elegantly solves the need for a temporary file.
Another option is to reserve N bytes at the beginning of every archive for the header + directories, but that's suboptimal as well.
@@ -206,6 +209,9 @@ impl<B: AsyncBackend + Sync + Send, C: DirectoryCache + Sync + Send> AsyncPmTile | |||
.read_to_end(&mut decompressed_bytes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The decompressed_bytes
declaration should probably move inside of this now, if we're supporting compression modes where it's not needed.
mod tile; | ||
mod writer; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree a feature would be nice.
|
||
/// Set the compression for metadata and directories. | ||
#[must_use] | ||
pub fn internal_compression(mut self, compression: Compression) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this (and a few other of the functions) use a Builder
pattern without using a Builder struct. Can we split the building from the Writer struct? Then the writer struct will be quite simple (really only needs an add_tile
and finalize
method).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PmTilesWriter
is a builder struct, generating a PmTilesStreamWriter
struct with create
. Builder
is not in the name, because the resulting code looks better without it, IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh! Hmm…I see. I think the nomenclature threw me off: I wouldn't expect a PmTilesWriter
to produce a PmTilesStreamWriter
—one of which is a builder for the other.
Any ideas for a better name? I think the namespacing is probably contributing to it. We probably should have pmtiles::Writer
and pmtiles::Reader
instead of PmTilesWriter
and …PmTilesReader
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we switched to pmtiles::writer::StreamWriter
then we could have pmtiles::writer::Builder
without confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I preferred the somewhat confusing names for prettier resulting user code:
let mut writer = Builder::new(TileType::Mvt).create(file)?
vs.
let mut writer = PmTilesWriter::new(TileType::Mvt).create(file)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can accomplish the same thing with name spacing: pmtiles::writer::StreamBuilder
and pmtiles::writer::StreamWriter
. It's a pretty common pattern in Rust, and I think keeps things concise and manageable.
Any thoughts on this @nyurik?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It took me a few seconds to understand "name spacing" meant "namespacing" :D Naming is hard...
I am not a big fan of mega-generic names like Builder
by itself, nor adding namespace as part of the required pattern... I do like PmTilesWriter::new(...).create(...)
as it is cleaner to the user. We could name the resulting struct as PmTilesWriterStream
or PmTilesStream
or whatever.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, this is a pretty fundamental decision—so I'd say this blocks the MR until we decide.
I did some research, and here's my suggestion.
The "thing" we want to build is called PmTilesWriter
(I think "stream" is an implementation detail—99% of use-cases prefer streaming anyways.).
Then we can have a Builder
in the same module, but hide the complexity behind a PmTilesWriter::builder()
call.
This results in the following pseudocode:
let writer = PmTilesWriter::builder(tile_type).create("file.pmtiles");
The user never needs to know about the underlying builder structure, and it keeps the naming manageable. I'm happy to do the refactor, but just want to make sure there's no strong objections.
I would not expect a performance impact, even via http. An optimization for initial loading would be adding the entries for z0-z2 to the root directory, which is a TODO copied from the pmtiles-go implementation. Shouldn't be difficult to add to the current implementation. |
I think this would be theoretically possible if we use multi-part upload, and refrain from uploading the first part until the end, but I'd have to check the docs to confirm this is possible. |
When I have the chance, I was planning to grab a copy of this PR and do some additional cleanup that makes sense now. I'll make it dependent against your code, @pka, so we don't have merge issues. I'm struggling with how much to ask for in this PR vs what we should clean up after. Let's not immediately release after merging to give us some time to clean up the APIs in a more general way. |
Okay! We're close. Just have to resolve the naming detail, and I'll be happy to do a cleanup pass after merging. |
This is a minimal writer implementation. Before implementing the missing parts, I would like to discuss the basic design.
Main questions:
Missing functionality: