mrc file with utf-8 BOM fails #161

patrickzurek · 2016-09-09T18:25:41Z

JIRA issue created by: rcook
Originally opened: 2012-06-26 09:32 AM

Issue body:
GC Issue http://code.google.com/p/xcoaitoolkit/issues/detail?id=86 and there are attachments

Reported by project member [email protected], Jul 21, 2011

The attached file starts with 3 bytes (EF BB BF)
http://en.wikipedia.org/wiki/Byte_order_mark#UTF-8

When I run convertload.sh on it, I get:

ERROR - [LIB] MarcException unable to parse record length. NumberFormatException For input string: "03".

I'm not sure if we should support this or not, but let's decide.

randy_urresearch.mrk

28.0 KB Download
Delete comment
Comment 1 by project member [email protected], Jul 21, 2011

Nate reported a possibly similar failure with URResearch/IR+. The reason for the failure in IR+ was due to the byte order mark embedded in the file. I’m guessing the same error would occur with XC since it uses marc4j as well. If this is the same issue, Nate offered to give some advice on how to fix this issue. Please let me know if it is similar and we can get the correct discussions going.

Delete comment
Comment 2 by project member [email protected], Jul 21, 2011

Yes, this is the same issue that Nate encountered. I just spoke w/ him. He modified the file using marcedit to make it work. Steps involved:

open marcedit (I used version 5.5.4218.36332)
File -> MARC Tools -> MarcBreaker
a) Input File: point to attached file
b) Output File: give it a new name)
c) check "Translate to UTF-8"
d) click "Execute"
Open the new file in MARCEditor
a) File -> Compile file into MARC
b) save the file somewhere (this file will then work in the oai-toolkit)

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mrc file with utf-8 BOM fails #161

mrc file with utf-8 BOM fails #161

patrickzurek commented Sep 9, 2016

mrc file with utf-8 BOM fails #161

mrc file with utf-8 BOM fails #161

Comments

patrickzurek commented Sep 9, 2016