Skip to content

Commit

Permalink
Parquet Writer: Make column descriptor public on the writer
Browse files Browse the repository at this point in the history
This is so that it's possible to gather information from the column
we're about to write to. That information was already present in the
column but burried inside the internal of the Writer.

Because the ColumnDescPtr is an Arc, it is not expensive to clone the
arc and make it publicly available to the column writer.

```rust
while let Ok(Some(mut col)) = writer.next_column() {
    let descriptor = col.descriptor();
    let name = descriptor.name();
}
```

Without this patch, it's required to implement a sort of book keeping by
the caller to make sure the data we're about to write to matches the
column we have.

With the patch, it removes the need to guess which column the code
refers to.
  • Loading branch information
pier-oliviert committed Nov 2, 2022
1 parent 62e878e commit 78b5e9e
Showing 1 changed file with 19 additions and 3 deletions.
22 changes: 19 additions & 3 deletions parquet/src/file/writer.rs
Original file line number Diff line number Diff line change
Expand Up @@ -419,8 +419,13 @@ impl<'a, W: Write> SerializedRowGroupWriter<'a, W> {
/// closed returns `Err`.
pub fn next_column(&mut self) -> Result<Option<SerializedColumnWriter<'_>>> {
self.next_column_with_factory(|descr, props, page_writer, on_close| {
let column_writer = get_column_writer(descr, props.clone(), page_writer);
Ok(SerializedColumnWriter::new(column_writer, Some(on_close)))
let column_writer =
get_column_writer(descr.clone(), props.clone(), page_writer);
Ok(SerializedColumnWriter::new(
column_writer,
descr,
Some(on_close),
))
})
}

Expand Down Expand Up @@ -465,6 +470,7 @@ impl<'a, W: Write> SerializedRowGroupWriter<'a, W> {
/// A wrapper around a [`ColumnWriter`] that invokes a callback on [`Self::close`]
pub struct SerializedColumnWriter<'a> {
inner: ColumnWriter<'a>,
descriptor: ColumnDescPtr,
on_close: Option<OnCloseColumnChunk<'a>>,
}

Expand All @@ -473,9 +479,14 @@ impl<'a> SerializedColumnWriter<'a> {
/// optional callback to be invoked on [`Self::close`]
pub fn new(
inner: ColumnWriter<'a>,
descriptor: ColumnDescPtr,
on_close: Option<OnCloseColumnChunk<'a>>,
) -> Self {
Self { inner, on_close }
Self {
inner,
descriptor,
on_close,
}
}

/// Returns a reference to an untyped [`ColumnWriter`]
Expand All @@ -488,6 +499,11 @@ impl<'a> SerializedColumnWriter<'a> {
get_typed_column_writer_mut(&mut self.inner)
}

/// Returns a clone to a [`ColumnDescPtr`]
pub fn descriptor(&self) -> ColumnDescPtr {
self.descriptor.clone()
}

/// Close this [`SerializedColumnWriter]
pub fn close(mut self) -> Result<()> {
let r = match self.inner {
Expand Down

0 comments on commit 78b5e9e

Please sign in to comment.