Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce RowLayout to represent rows for different purposes #2261

Merged
merged 7 commits into from
Apr 20, 2022

Conversation

yjshen
Copy link
Member

@yjshen yjshen commented Apr 18, 2022

Which issue does this PR close?

The second part of #2188.

Rationale for this change

To support an 8-byte aligned row layout for grouping states of hash aggregation.

What changes are included in this PR?

Enable the reading and writing raw-bytes rows with two possible layouts.

Are there any user-facing changes?

An API change might have no effects since it's on the optional feature row.

@github-actions github-actions bot added the datafusion Changes in the datafusion crate label Apr 18, 2022
use std::sync::Arc;

const UTF8_DEFAULT_SIZE: usize = 20;
const BINARY_DEFAULT_SIZE: usize = 100;

#[derive(Copy, Clone, Debug)]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main changes are below. Other files changes are almost mechanical.

@yjshen yjshen requested a review from alamb April 18, 2022 10:27
Copy link
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not totally familiar with this code but the changes look reasonable to me

@alamb
Copy link
Contributor

alamb commented Apr 18, 2022

I plan to review this PR first thing tomorrow morning US eastern time (~ 6AM or so)

@yjshen yjshen mentioned this pull request Apr 19, 2022
3 tasks
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good to me -- I didn't see any tests for the new WordAligned format -- I think we should add some and I suggested one possible way

Thanks @yjshen

use std::sync::Arc;

const UTF8_DEFAULT_SIZE: usize = 20;
const BINARY_DEFAULT_SIZE: usize = 100;

#[derive(Copy, Clone, Debug)]
/// Type of a RowLayout
pub enum RowType {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

datafusion/core/src/row/layout.rs Outdated Show resolved Hide resolved
datafusion/core/src/row/layout.rs Outdated Show resolved Hide resolved
datafusion/core/src/row/layout.rs Outdated Show resolved Hide resolved
datafusion/core/src/row/layout.rs Outdated Show resolved Hide resolved
@@ -85,52 +85,36 @@ macro_rules! fn_get_idx_opt {

/// Read the tuple `data[base_offset..]` we are currently pointing to
pub struct RowReader<'a> {
/// Layout on how to read each field
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 this is nice

@@ -96,51 +96,34 @@ macro_rules! fn_set_idx {

/// Reusable row writer backed by Vec<u8>
pub struct RowWriter {
/// Layout on how to write each field
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️ for reduced repetition

datafusion/core/src/row/mod.rs Show resolved Hide resolved
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed the changes commit by commit - LGTM -- thanks @yjshen

BooleanArray,
Boolean,
vec![Some(true), Some(false), None, Some(true), None],
WordAligned
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👌 very nice

offsets.push(offset);
offset += 8; // a 8-bytes word for each field
assert!(!matches!(f.data_type(), DataType::Decimal(_, _)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@alamb alamb merged commit ec3543b into apache:master Apr 20, 2022
"plan: {}",
DisplayableExecutionPlan::with_metrics(plan).one_line()
);
assert!(
Copy link
Contributor

@tustvold tustvold Apr 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yjshen yjshen deleted the agg_row branch April 22, 2022 08:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datafusion Changes in the datafusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants