Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add record_batch! macro for easy record batch creation #6553

Closed
timsaucer opened this issue Oct 12, 2024 · 7 comments · Fixed by #6588
Closed

Add record_batch! macro for easy record batch creation #6553

timsaucer opened this issue Oct 12, 2024 · 7 comments · Fixed by #6588
Assignees
Labels
arrow Changes to the arrow crate enhancement Any new improvement worthy of a entry in the changelog good first issue Good for newcomers

Comments

@timsaucer
Copy link

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

In apache/datafusion#12846 we added a macro to easily create a record batch for use in building up unit tests. It would be nice to move this upstream to this repo.

Describe the solution you'd like

Port code and optionally add remaining data types.

Describe alternatives you've considered

None

Additional context

apache/datafusion#12846

@timsaucer timsaucer added the enhancement Any new improvement worthy of a entry in the changelog label Oct 12, 2024
@Xuanwo
Copy link
Member

Xuanwo commented Oct 12, 2024

Hi, I like this idea. How about renaming this macro to record_batch!() like vec!()?

@timsaucer
Copy link
Author

I incorporated your recommendation in the last push of the PR in datafusion.

@alamb alamb changed the title Import downstream macro for easy record batch creation Add record_batch! macro for easy record batch creation Oct 13, 2024
@alamb
Copy link
Contributor

alamb commented Oct 13, 2024

I incorporated your recommendation in the last push of the PR in datafusion.

I think it looks really nice -- thank you @timsaucer and @Xuanwo

@alamb alamb added arrow Changes to the arrow crate good first issue Good for newcomers labels Oct 13, 2024
@alamb
Copy link
Contributor

alamb commented Oct 13, 2024

I think this is a good first issue for someone who wants to try their hand out at contributing

@ByteBaker
Copy link
Contributor

take

@ByteBaker
Copy link
Contributor

@alamb this is the basic idea I have. Lemme know if I should change this.

let schema: Arc<Schema> = ...;
let options: RecordBatchOptions = ...;
let array: Arc<Int32Array> = ...; // Can have more arrays

record_batch!(schema); // Creates empty RecordBatch
record_batch!(schema, [array]); // Returns a result
record_batch!(schema, [array], options); // Creates w/ options, returns a result

Or do we want something that allows us to write a template-like, visually intuitive structure for creating an RB? For instance:

record_batch!{
    // Always required
    [
     "id", Int32, false // More fields can go in next lines
    ],
    // Following bit be optional. 
    // Not giving data would create an empty RecordBatch
    [
     Int32Array[1, 2, 3, 4, 5] // More arrays can go in next lines
    ],
    // `RecordBatchOptions` with `match_field_names` = false & `row_count` = Some(10)
    // This bit is optional, and the absence will just use the default options
    {true, 10}

}

Also, how do we reconcile the return types in this case?

  • Using just Schema would create a RecordBatch directly
  • Any other combination would return Result<RecordBatch>

@alamb
Copy link
Contributor

alamb commented Oct 15, 2024

Or do we want something that allows us to write a template-like, visually intuitive structure for creating an RB? For instance:

Thanks @ByteBaker -- maybe you can check out the implementation that @timsaucer put into DataFusion in apache/datafusion#12846

ByteBaker pushed a commit to ByteBaker/arrow-rs that referenced this issue Oct 18, 2024
ByteBaker pushed a commit to ByteBaker/arrow-rs that referenced this issue Oct 18, 2024
ByteBaker pushed a commit to ByteBaker/arrow-rs that referenced this issue Oct 18, 2024
ByteBaker added a commit to ByteBaker/arrow-rs that referenced this issue Oct 18, 2024
ByteBaker added a commit to ByteBaker/arrow-rs that referenced this issue Nov 16, 2024
ByteBaker added a commit to ByteBaker/arrow-rs that referenced this issue Nov 16, 2024
ByteBaker added a commit to ByteBaker/arrow-rs that referenced this issue Nov 16, 2024
ByteBaker added a commit to ByteBaker/arrow-rs that referenced this issue Nov 16, 2024
ByteBaker added a commit to ByteBaker/arrow-rs that referenced this issue Nov 16, 2024
@alamb alamb closed this as completed in 1f19412 Nov 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate enhancement Any new improvement worthy of a entry in the changelog good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants