Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nokogiri 1.5.0.beta2: XSD validation passes in MRI & crashes in Java version #373

Closed
scottlowe opened this issue Nov 14, 2010 · 10 comments
Closed

Comments

@scottlowe
Copy link

Hi there,

I've got an XML document that successfully validates against an XSD under MRI 1.9.2, however when the same code is run under the Java version of Nokogiri the code crashes with a document parse error.

To illustrate the issue, I've posted a small script here: https://github.com/scottlowe/nokogiri-java-bug-report

I daresay there could a problem with the XSD itself (I didn't create it!), but in any case the behaviour between MRI and Java versions is different.

I've also run this code against a fresh checkout from the Nokogiri repo, and the result is the same.

-- Scott

@yokolet
Copy link
Member

yokolet commented Nov 14, 2010

I confirmed the problem. I'll figure out what causes the error. Thanks for the good, helpful example.

@scottlowe
Copy link
Author

A bit more clarification: I now know that the error is being thrown by the schema validator against the schema itself. So the XML doc that is being validated is fine, and it's the XSD that's the problem. It looks like Xerces is being much more strict than the C validator.

If I run the JDK xjc tool, I get similar results:

scotts-macbook-pro:nokogiri-java-bug-report scott$ xjc sample_schema.xsd 
parsing a schema...
[ERROR] cos-element-consistent: Error for type '#AnonType_customer_addresses'. Multiple elements with name 'address_type', with different types, appear in the model group.
line 1114 of file:/Users/scott/Working/nokogiri-java-bug-report/sample_schema.xsd

[ERROR] cos-element-consistent: Error for type '#AnonType_customer'. Multiple elements with name 'co_type', with different types, appear in the model group.
line 1163 of file:/Users/scott/Working/nokogiri-java-bug-report/sample_schema.xsd

[ERROR] cos-nonambig: co_industry_sector and co_industry_sector (or elements from their substitution group) violate "Unique Particle Attribution". During validation against this schema, ambiguity would be created for those two particles.
line 1163 of file:/Users/scott/Working/nokogiri-java-bug-report/sample_schema.xsd

So the question is - what can we do when there are differing levels of strictness (and rules) between the Java and C schema validators?

@yokolet
Copy link
Member

yokolet commented Nov 20, 2010

Hi, just want to know this. Did you, scottlowe, closed this issue? This issue is marked closed.

Anyways, there exist some incompatibilities between pure Java and libxml version. Those incompatibilities remain (are unable to fix) when Java APIs don't have any workaround. As you said, Xerces checks severely than libxml, which is the reason of some incompatibilities of pure Java Nokogiri.

I'll try some schema feature settings. But, if none of them works, the answer will be "please fix the schema itself."

@scottlowe
Copy link
Author

Hi Yoko,

No, I didn't close the issue, and I didn't realise that it had been closed!

To be fair, I could understand why another developer might have closed it, given that I now understand that the issue relates to the Xerces validator being much more aggressive.

I have also been thinking that the answer might be "fix the schema itself", and if it was my schema, I would fix it... but in this case the schema is supplied by a third party, and I must conform to their schema, which they are unlikely to change :-(

Whatever you find, or eventually decide, thank you for checking up on this issue for me, anyway.

-- Scott

@yokolet
Copy link
Member

yokolet commented Nov 22, 2010

I might have closed this issue by a mistake with another issue. I'll open this.

I tried all possible feature settings of Xerces but no good news. :(
So, for now, this will be marked as an incompatibility. Sorry. But, I'll try to seek the compatible way although fix will be in future version.

@scottlowe
Copy link
Author

Thank you. Could you label the issue with 'pure-java' and/or 'JRuby' to help others find it in the future? I don't have permission to label.

It's funny... even though this is an issue for me, I kinda respect Xerces for being so merciless on schemas :-)

@yokolet
Copy link
Member

yokolet commented Nov 23, 2010

Labeled. I'll keep this issue open, as you (scottlowe) said, for other people who might hit this kind of trouble.

Yes, I know Xerces is really strict. So I think I should test other XML APIs.

@scottlowe
Copy link
Author

Um... this is embarrassing. I've only just realised how close the "Comment and Close" button is to the "Comment" button. It probably was me that closed this issue by mistake, after all!

Apologies to everybody :-s

@quoideneuf
Copy link
Contributor

Hi,

I am wondering if this issue still exists and if there is any solution. I am seeing similarly inconsistent behavior when I try to validate XML files against the LOC's EAD schema. (loc.gov/ead/ead.xsd)

Brians-MacBook-Air:test hoffman$ ruby ./schema_test.rb 
RuntimeError: Could not parse document: src-resolve: Cannot resolve the name 'xlink:extendedLink' to a(n) 'attribute group' component.
  from_document at nokogiri/XmlSchema.java:143
            new at /Users/hoffman/.rvm/gems/jruby-1.6.7.2/gems/nokogiri-1.5.5-java/lib/nokogiri/xml/schema.rb:37
         Schema at /Users/hoffman/.rvm/gems/jruby-1.6.7.2/gems/nokogiri-1.5.5-java/lib/nokogiri/xml/schema.rb:8
         (root) at ./schema_test.rb:5

@quoideneuf
Copy link
Contributor

I did something similar to scotlowe and created a repo to illustrate the problem I am having:

https://github.com/lcdhoffman/nokogiri-jruby-schema-test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants