Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

seg_id0 is duplicated for the root segment for big files when multiple files are loaded #710

Closed
yruslan opened this issue Sep 26, 2024 · 0 comments · Fixed by #711
Closed
Labels
bug Something isn't working

Comments

@yruslan
Copy link
Collaborator

yruslan commented Sep 26, 2024

Describe the bug

seg_id0 should never be duplicated for the root segment.

However, when loading big files, we see duplications.

Code snippet that caused the issue

Maybe this is happening only when record length field is used:

  .option("record_format", "F")
  .option("record_length_field", "REC_LENGTH + 17")
  .option("segment_field", "SEGMENT-ID")

Expected behavior

seg_id0 should never be duplicated for the root segment.

Context

  • Cobrix version: 2.7.5
  • Spark version: 3.3.4
  • Scala version: 2.12
  • Operating system: --

Copybook (if possible)

--

Attach a small data file that can help reproduce the issue, if possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant