Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wont Read Onix Feed #5

Open
acolchagoff opened this issue Mar 20, 2014 · 4 comments
Open

Wont Read Onix Feed #5

acolchagoff opened this issue Mar 20, 2014 · 4 comments

Comments

@acolchagoff
Copy link

Ive got an onix feed that is sent to me via a zip file in an email. The zip file contains a 100+ mb xml file and a dtd file. The top of the file looks like this:

<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE ONIXMessage SYSTEM
"ONIX_BookProduct_3.0_short.dtd">
<ONIXmessage release="3.0">
<header>
<sender>
<x298>Publisher</x298>
<x299>Vendor</x299>
<j272>[email protected]</j272>
</sender>
<x307>20140311</x307>
<m183>An Onix message file from Publisher</m183>
</header>

in spite of the fact that this file has well over 10,000 products in it, the gem wont read any of them.

reader.each do |product|
    puts product.inspect
end

The each loop does nothing, it never fires, its as if the XML file had zero products in it.

Ive spent several days here, heres the entire algorithm for reference:

def self.parse_onix(publisher_id, onix_file)
    Zip::ZipFile.open(onix_file.tempfile.path) do |zip|
        xml_file = ""
        dir = "#{Rails.root.to_s}/tmp/onix/"

        zip.each do |entry|
            next if entry.name =~ /__MACOSX/ or \
             entry.name =~ /\.DS_Store/ or !entry.file?
            logger.debug "#{entry.name}"
            puts entry.name
            FileUtils::mkdir_p(dir)
            #this_file = FileUtils.touch(dir + entry.name)
            entry.extract(dir + entry.name)

            p '--->Thing:'+entry.name.last(3)
            if entry.name.last(3) == 'xml'
                xml_file = dir + entry.name
            end
        end

        Work.fix_dtd_path(dir, xml_file)

        reader = ONIX::Reader.new(xml_file)

        puts reader.inspect

        reader.each do |product|
            puts product.inspect
        end
    end
end


def self.fix_dtd_path(dir, xml_file)
    xml = File.read(xml_file)

    # fix the path in the DOCTYPE
    dtd_file = 'ONIX_BookProduct_3.0_short.dtd'
    xml = xml.gsub(dtd_file, dir + dtd_file)
    File.delete(xml_file)
    File.open(xml_file, 'w') do |file|
        file.write(xml)
    end
end
@varunarang
Copy link

I am not sure what might the problem be, but can you try converting the ONIX file to reference tags:

ONIX::Normaliser.process("oldfile.xml", "newfile.xml")

If this converts it to reference tags, you should be able to iterate over the products in the file.

@acolchagoff
Copy link
Author

Unfortunately Normalizing doesn't seem to help... but I think I've figured out the issue.
It doesn't appear that this gem supports onix 3.0 short, which is what my xml feed is. because the feed is in short format, all of my tag names are different (for example, 'Header' becomes 'header', 'PublisherIDType' becomes 'x447' etc...) the gem is looking for standard tags and ignoring short tags.

Would this explain the issues i'm having?

@acolchagoff
Copy link
Author

Making progress,
I'm getting this error when calling normalize.

/var/folders/nb/nc2b5f2s7rdch1nxfxyd2d200000gq/T/onix20140331-4641-16q41ea:3: warning: failed to load external entity "/var/folders/nb/nc2b5f2s7rdch1nxfxyd2d200000gq/T/ONIX_BookProduct_3.0_short.dtd"
"ONIX_BookProduct_3.0_short.dtd">

The normalizer appears to be looking in the temp directory for my dtd file when it didn't move it there. The dtd file is still back in the zip folder.

@acolchagoff
Copy link
Author

okay after manually copying 3 dtd files (interdependent dtd's?) Ive fixed that error, but he xslt conversion still seems to be failing, I think its because the xslt script distributed with the gem is for ONIX 2.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants