Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CRuby/JRuby inconsistent behavior aliasing the default namespace #940

Closed
rdingwell opened this issue Jul 17, 2013 · 4 comments
Closed

CRuby/JRuby inconsistent behavior aliasing the default namespace #940

rdingwell opened this issue Jul 17, 2013 · 4 comments

Comments

@rdingwell
Copy link

There is inconsistent behavior between Cruby and Jruby when aliasing the default namespace for node. I created a test to highlight the difference.

    module Nokogiri
      module XML
        class TestAliasedDefaultNamespaces < Nokogiri::TestCase
          def setup
            super
          end

          def test_alised_defautl_namespace_on_parse
                        doc = Nokogiri::XML('<apple xmlns="ns:fruit" xmlns:fruit="ns:fruit" />')
                        ns = doc.root.namespaces
                        assert_equal  "ns:fruit", ns["xmlns:fruit"], "Should have parsed aliased default namespace"
          end

          def test_add_aliased_default_namespace
                        doc = Nokogiri::XML('<apple xmlns="ns:fruit" />')
                        doc.root.add_namespace_definition("fruit", "ns:fruit")
                        ns = doc.root.namespaces
                        assert_equal  "ns:fruit", ns["xmlns:fruit"],"Should have added aliased default namespace"
          end


        end
      end
    end

Executing this under Cruby the test will pass. The output under JRuby however will be the following.

          1) Failure:
    test_alised_defautl_namespace_on_parse(Nokogiri::XML::TestAliasedDefaultNamespaces) [/Users/bobd/projects/nokogiri/test/namespaces/test_aliased_default_namespaces.rb:11]:
    Should have parsed aliased default namespace.
    Expected: "ns:fruit"
      Actual: nil

      2) Failure:
    test_add_aliased_default_namespace(Nokogiri::XML::TestAliasedDefaultNamespaces) [/Users/bobd/projects/nokogiri/test/namespaces/test_aliased_default_namespaces.rb:18]:
    Should have added aliased default namespace.
    Expected: "ns:fruit"
      Actual: nil

This can cause issues when dealing with xpath statments: Take the following example

<apple xmlns="ns:fruit"><seeds/></apple>

This creates and apple element in the ns:fruit namespace. Now without being able to alias the default namespace we have no way to execute xpath statements on it based on the structure of the document. This behavior is the same in both C Ruby and JRuby

    irb> doc = Nokogiri::XML('<apple xmlns="ns:fruit" ><seeds/></apple>')
    #<Nokogiri::XML::Document:0x3fc8d4ff7b70 name="document" children=[#<Nokogiri::XML::Element:0x3fc8d4ff6fcc name="apple" namespace=#<Nokogiri::XML::Namespace:0x3fc8d4ff6f7c href="ns:fruit"> children=[#<Nokogiri::XML::Element:0x3fc8d4ff6aa4 name="seeds" namespace=#<Nokogiri::XML::Namespace:0x3fc8d4ff6f7c href="ns:fruit">>]>]>
    irb> doc.root.at_xpath("/apple/seeds")
    nil
    irb> doc.root.at_xpath("//seeds")
    nil
    irb> doc.root.at_xpath("/apple")
    nil

Now if we add a namespace alias for the default namespace we can do this in C Ruby.

    irb> doc.root.add_namespace_definition("fr","ns:fruit")
    #<Nokogiri::XML::Namespace:0x3fc8d4f57a1c prefix="fr" href="ns:fruit">
    irb> doc.root.at_xpath("/fr:apple")
    #<Nokogiri::XML::Element:0x3fc8d4ff6fcc name="apple" namespace=#<Nokogiri::XML::Namespace:0x3fc8d4ff6f7c href="ns:fruit"> children=[#<Nokogiri::XML::Element:0x3fc8d4ff6aa4 name="seeds" namespace=#<Nokogiri::XML::Namespace:0x3fc8d4ff6f7c href="ns:fruit">>]>
    irb> doc.root.at_xpath("/fr:apple/fr:seeds")
    #<Nokogiri::XML::Element:0x3fc8d4ff6aa4 name="seeds" namespace=#<Nokogiri::XML::Namespace:0x3fc8d4ff6f7c href="ns:fruit">>
    irb>

Trying to do this in JRuby however does not work

irb> doc = Nokogiri::XML('<apple xmlns="ns:fruit" ><seeds/></apple>')
#<Nokogiri::XML::Document:0x21b6 name="document" children=[#<Nokogiri::XML::Element:0x21bc name="apple" namespace=#<Nokogiri::XML::Namespace:0x21b8 href="ns:fruit"> children=[#<Nokogiri::XML::Element:0x21ba name="seeds" namespace=#<Nokogiri::XML::Namespace:0x21b8 href="ns:fruit">>]>]>
irb> doc.root.add_namespace_definition("fr","ns:fruit")
#<Nokogiri::XML::Namespace:0x21b8 href="ns:fruit">
irb> doc.root.at_xpath("/fr:apple")
Nokogiri::XML::XPath::SyntaxError: /fr:apple
    from nokogiri/XmlXpathContext.java:123:in `evaluate'
    from /Users/bobd/.rvm/gems/jruby-1.7.4/gems/nokogiri-1.5.10-java/lib/nokogiri/xml/node.rb:159:in `xpath'
    from org/jruby/RubyArray.java:2417:in `map'
    from /Users/bobd/.rvm/gems/jruby-1.7.4/gems/nokogiri-1.5.10-java/lib/nokogiri/xml/node.rb:150:in `xpath'
    from /Users/bobd/.rvm/gems/jruby-1.7.4/gems/nokogiri-1.5.10-java/lib/nokogiri/xml/node.rb:239:in `at_xpath'
    from (irb):32:in `evaluate'
    from org/jruby/RubyKernel.java:1093:in `eval'
    from org/jruby/RubyKernel.java:1489:in `loop'
    from org/jruby/RubyKernel.java:1254:in `catch'
    from org/jruby/RubyKernel.java:1254:in `catch'
    from /Users/bobd/.rvm/rubies/jruby-1.7.4/bin/irb:13:in `(root)'

Furthermore , if we look at the namespaces for the document the alias that we just added is not there in JRuby.

irb> doc.root.namespaces
{
  "xmlns" => "ns:fruit"
}
irb>

But oddly if we look at the document itself it is there.

    irb> doc.to_s
    "<?xml version=\"1.0\"?>\n<apple xmlns=\"ns:fruit\" xmlns:fr=\"ns:fruit\"><seeds/>  </apple>"
    irb>
@rdingwell
Copy link
Author

I think I found the problem, The hashCode method in NokogiriNamespaceCache does not take negative values into account when generating a hash code for a prefix and href.

This is the method in question;

private Long hashCode(String prefix, String href) {
        long prefix_hash = prefix.hashCode();
        long href_hash = href.hashCode();
        return prefix_hash << 31 | href_hash;
    }

the hashCode() method from Object returns a integer. This int may be negative and will be up-converted to a long as a negative number , filling all of the leading bits in as 1. We can look at an example and see what this does to actually creating the hashCode to return.

String value hashCode int binary
"" 0 00000000000000000000000000000000
"fruit" 97711124 00001011101001011110100000101000
"ns:fruit" -822481377 11001110111110011110111000011111

When these values get converted to a long the binary for "ns:fruit" ends up looking like this.

1111111111111111111111111111111111001110111110011110111000011111

Performing any kind of a bitwise or on this value is going to cause the entire leading set of bits to filled with 1s, which is not what is expected to happen.

A simple solution to this is to simply mask off the leading bytes to ensure that the leading bits will be filled with zeros instead.

private static long HASH_MASK = 3472485919L;
private Long hashCode(String prefix, String href) {     
        long prefix_hash =(HASH_MASK & prefix.hashCode());
        long href_hash = (HASH_MASK & href.hashCode()); 
        return (prefix_hash << 31 | href_hash);
    }

rdingwell added a commit to rdingwell/nokogiri that referenced this issue Aug 23, 2013
…parklemotion#940.  This allows for namespace prfixes to the redeclared at an element level, allows for default namespaces to be redeclared at an element level and fixes issues related to adding namepsaces delcarations on the document for use in xpath calls
rdingwell added a commit to rdingwell/nokogiri that referenced this issue Sep 27, 2013
…parklemotion#940.  This allows for namespace prfixes to the redeclared at an element level, allows for default namespaces to be redeclared at an element level and fixes issues related to adding namepsaces delcarations on the document for use in xpath calls
rdingwell added a commit to rdingwell/nokogiri that referenced this issue Oct 23, 2013
…parklemotion#940.  This allows for namespace prfixes to the redeclared at an element level, allows for default namespaces to be redeclared at an element level and fixes issues related to adding namepsaces delcarations on the document for use in xpath calls
@flavorjones
Copy link
Member

This seems like a simple fix -- @jvshahid or @yokolet, want to take a whack at a fix?

@yokolet
Copy link
Member

yokolet commented Apr 7, 2014

Yes. Also, I need to look at related pull request. I think I have time to do something this week or so.

@yokolet
Copy link
Member

yokolet commented Apr 15, 2014

This bug has been fixed by the commit 742751a
Also, the test in this issue has been added to master.

Thanks for reporting the bug.

@yokolet yokolet closed this as completed Apr 15, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants