Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support write multi fragments or empty fragment in one spark task #3183

Merged
merged 6 commits into from
Dec 2, 2024

Conversation

SaintBacchus
Copy link
Contributor

@SaintBacchus SaintBacchus commented Nov 28, 2024

Now FileFragment::create only support create one file fragment and in spark connector will cause these two issues:

  1. if the spark task is empty, this api will have exception since there is no data to create the fragment.
  2. if the task data stream is very large, it will generate a huge file in lance format. It is not friendly for spark parallism.

So I remove the assigned fragment id and add a new method named FileFragment::create_fragments to generate empty or multi fragments.

image

@github-actions github-actions bot added enhancement New feature or request java labels Nov 28, 2024
Copy link

ACTION NEEDED
Lance follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

For details on the error please inspect the "PR Title Check" action.

@SaintBacchus SaintBacchus changed the title feat: Support write multi fragments or empty fragment in one spark task feat: support write multi fragments or empty fragment in one spark task Nov 28, 2024
@codecov-commenter
Copy link

codecov-commenter commented Nov 28, 2024

Codecov Report

Attention: Patch coverage is 85.03937% with 19 lines in your changes missing coverage. Please review.

Project coverage is 78.72%. Comparing base (dc9afbb) to head (0489103).

Files with missing lines Patch % Lines
rust/lance/src/dataset/fragment.rs 0.00% 11 Missing ⚠️
rust/lance/src/dataset/fragment/write.rs 94.73% 0 Missing and 6 partials ⚠️
java/core/lance-jni/src/fragment.rs 0.00% 1 Missing ⚠️
java/core/lance-jni/src/utils.rs 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3183      +/-   ##
==========================================
+ Coverage   78.06%   78.72%   +0.66%     
==========================================
  Files         243      243              
  Lines       82661    82793     +132     
  Branches    82661    82793     +132     
==========================================
+ Hits        64529    65180     +651     
+ Misses      14932    14833      -99     
+ Partials     3200     2780     -420     
Flag Coverage Δ
unittests 78.72% <85.03%> (+0.66%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@LuQQiu LuQQiu merged commit 39222ec into lancedb:main Dec 2, 2024
25 of 26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request java
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants