Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add examples for arrow crate to readme and library documentation #131

Merged
merged 4 commits into from
Feb 6, 2024
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
74 changes: 71 additions & 3 deletions Readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,39 @@ arrays, and deserialization from arrays to Rust structs.
[datafusion]: https://github.com/apache/arrow-datafusion/

## Example
Given this Rust structure
```rust
#[derive(Serialize, Deserialize)]
struct Record {
a: f32,
b: i32,
}

let records = vec![
Record { a: 1.0, b: 1 },
Record { a: 2.0, b: 2 },
Record { a: 3.0, b: 3 },
];
```

### Serialize to `arrow` `RecordBatch`
chmp marked this conversation as resolved.
Show resolved Hide resolved
```rust
use serde_arrow::schema::{TracingOptions, SerdeArrowSchema};

// Determine Arrow schema
let fields =
SerdeArrowSchema::from_type::<Record>(TracingOptions::default())?
.to_arrow_fields()

// Convert to Arrow arrays
let arrays = serde_arrow::to_arrow(&fields, &records)?;

// Form a RecordBatch
let schema = Schema::new(&fields);
let batch = RecordBatch::try_new(schema.into(), arrays)?;
```

### Serialize to `arrow2` arrays
```rust
use serde_arrow::schema::{TracingOptions, SerdeArrowSchema};

Expand All @@ -55,7 +87,27 @@ let fields =
let arrays = serde_arrow::to_arrow2(&fields, &records)?;
```

These arrays can now be written to disk using the helper method defined in the
These arrays can now be written to disk in formats such as Parquet using the
appropriate Arrow or Arrow2 APIs.

### Write `arrow` `RecordBatch` to Parquet

You can write the `RecordBatch` to a Parquet file using [ArrowWriter] from the
[parquet] crate:

[ArrowWriter]: https://docs.rs/parquet/latest/parquet/arrow/arrow_writer/struct.ArrowWriter.html
[parquet]: https://docs.rs/parquet/latest/parquet/


```rust
let file = File::create("example.pq");
let mut writer = ArrowWriter::try_new(file, batch.schema(), None)?;
writer.write(&batch)?;
writer.close()?;
```

### Write `arrow2` arrays to Parquet
using the helper method defined in the
[arrow2 guide][arrow2-guide]. For parquet:

```rust,ignore
Expand All @@ -71,14 +123,30 @@ write_chunk(

The written file can now be read in Python via

### Polars
chmp marked this conversation as resolved.
Show resolved Hide resolved
```python
# using polars
import polars as pl
pl.read_parquet("example.pq")
shape: (3, 2)
chmp marked this conversation as resolved.
Show resolved Hide resolved
┌─────┬─────┐
│ a ┆ b │
│ --- ┆ --- │
│ f32 ┆ i32 │
╞═════╪═════╡
│ 1.0 ┆ 1 │
│ 2.0 ┆ 2 │
│ 3.0 ┆ 3 │
└─────┴─────┘
```

# using pandas
### Pandas
```python
import pandas as pd
pd.read_parquet("example.pq")
a b
0 1.0 1
1 2.0 2
2 3.0 3
```

[arrow2-guide]: https://jorgecarleitao.github.io/arrow2
Expand Down
44 changes: 43 additions & 1 deletion serde_arrow/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,49 @@
//! - the [status summary][_impl::docs::status] for an overview over the
//! supported Arrow and Rust constructs
//!
//! ## Example
//! ## `arrow` Example
//! ```rust
//! # use serde::{Deserialize, Serialize};
//! # #[cfg(feature = "has_arrow")]
//! # fn main() -> serde_arrow::Result<()> {
//! use arrow::datatypes::Schema;
//! use arrow::record_batch::RecordBatch;
//! use serde_arrow::schema::{TracingOptions, SerdeArrowSchema};
//!
//! ##[derive(Serialize, Deserialize)]
//! struct Record {
//! a: f32,
//! b: i32,
//! }
//!
//! let records = vec![
//! Record { a: 1.0, b: 1 },
//! Record { a: 2.0, b: 2 },
//! Record { a: 3.0, b: 3 },
//! ];
//!
//! // Determine Arrow schema
//! let fields = Vec::<Field>::from_type::<Record>(TracingOptions::default())?;
//!
//! // Convert Rust records to Arrow arrays
//! let arrays = serde_arrow::to_arrow(&fields, &records)?;
//!
//! // Create RecordBatch
//! let schema = Schema::new(fields);
//! let batch = RecordBatch::try_new(schema, arrays)?;
//! # Ok(())
//! # }
//! # #[cfg(not(feature = "has_arrow"))]
//! # fn main() { }
//! ```
//!
//! The `RecordBatch` can then be written to disk, e.g., as parquet using
//! the [`ArrowWriter`] from the [`parquet`] crate:
chmp marked this conversation as resolved.
Show resolved Hide resolved
//!
//! [`ArrowWriter`]: https://docs.rs/parquet/latest/parquet/arrow/arrow_writer/struct.ArrowWriter.html
//! [`parquet`]: https://docs.rs/parquet/latest/parquet/
//!
//! ## `arrow2` Example
//!
//! Requires one of `arrow2` feature (see below).
//!
Expand Down
Loading