Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Empty line at the end causes cobrix to create 1 more record #397

Closed
MaksymFedorchuk opened this issue Jul 2, 2021 · 4 comments
Closed
Labels
accepted Accepted for implementation bug Something isn't working

Comments

@MaksymFedorchuk
Copy link

For example we have file like this with empty line at the end :
fdhfhdsfsdff
dfdhjfwdsdd
dsfddkkfkgk

And if I read it by specifying
spark.read
.format("cobol")
.option("is_record_sequence", "true")
.option("is_text", "true")
.option("encoding", "ascii")
.option("copybook", path_to_copybook)
.load(path_to_file)

I get 4 records instead of 3, so is that a bug or it can be fixed by some option?

@MaksymFedorchuk MaksymFedorchuk added the bug Something isn't working label Jul 2, 2021
@yruslan
Copy link
Collaborator

yruslan commented Jul 8, 2021

Can I ask you to attach the test file?
I just want to check if the empty line contains no characters or at least 1 character.
When reading text files Cobrix filters out empty lines, but since Windows uses CR LF line ending characters, and Linux/MacOs uses just LF, it is possible that one character ends up in the last record.
I'll check the file and determine if it is a bug or a feature. It's more likely to be a bug though

@MaksymFedorchuk
Copy link
Author

testfile.txt

@yruslan
Copy link
Collaborator

yruslan commented Jul 8, 2021

I've noticed something interesting. Try removing option("is_record_sequence", "true") and please let me know if it worked as expected

@yruslan yruslan added the accepted Accepted for implementation label Jul 8, 2021
@yruslan
Copy link
Collaborator

yruslan commented Jul 8, 2021

Bug confirmed. It happens when is_text = true and is_record_sequence = true

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted Accepted for implementation bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants