-
-
Notifications
You must be signed in to change notification settings - Fork 903
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Need help parsing a standard nginx directory listing. Different results with ruby and jruby. #1888
Comments
Hi, thanks for opening this issue, but I'm unable to reproduce what you're seeing. Here's the code I used to reproduce this without the open-uri network call: #! /usr/bin/env ruby
require "nokogiri"
require "yaml"
# copypasta from `curl https://archive.anarchiehandy.de/i9305`
puts Nokogiri::VERSION_INFO.to_yaml
puts "---"
nginx_response = <<EOHTML
<html>
<head><title>Index of /i9305/</title></head>
<body bgcolor="white">
<h1>Index of /i9305/</h1><hr><pre><a href="../">../</a>
<a href="ResurrectionRemix/">ResurrectionRemix/</a> 03-Mar-2019 10:12 -
<a href="TWRP/">TWRP/</a> 12-Mar-2019 19:27 -
<a href="override_TWRP/">override_TWRP/</a> 12-Mar-2019 19:27 -
</pre><hr></body>
</html>
EOHTML
doc = Nokogiri::HTML(nginx_response)
puts doc.to_html For CRuby, the output is:
For JRuby, the output is:
The parsed document structures are identical in structure. Is it possible that you're getting different results back from your open-uri network call? |
Ah, interesting -- this appears to be a difference in how Nokogiri and open-uri behave between CRuby and JRuby. Digging into it now. |
Related narrative here: #1821 |
OK, narrowing this down: in JRuby, In the meantime, a workaround is to add
|
OK, got it: the fix for #1124 was incompletely applied only to |
related to incomplete application of fix from #1124
I've pushed a branch, @jvshahid - I have to ask for your help here. The issue appears to be with // if setEncoding returned true, then the stream is set
// to the EncodingReaderInputStream
if (setEncoding(context, data))
return; |
Wow, thank you a lot for investigating this. |
@flavorjones I think I understand the issue. I would like to take some time to fix the following issues as well, unless anyone object:
|
@jvshahid Thanks for looking into it! I think it makes sense to take time and do the right thing, your suggestions sound right to me. |
We don't have to figure out the encoding again. This was already figured out in the Ruby code. fixes #1888
We don't have to figure out the encoding again. This was already figured out in the Ruby code. fixes #1888
FYI, I pushed the fix in #1897 |
John's PR was merged, this will be fixed in v1.11.0 when it drops. Watch the milestone for progress: https://github.com/sparklemotion/nokogiri/milestone/18 |
Thank you a lot for taking care of this! |
Dear Nokogiri community,
I want to parse the content of my nginx directory listing and it all works just fine with normal ruby, but with jruby I can't get nokogiri to behave the same way than on normal ruby and parse the listing. Does anyone have an idea to help me out?
To Reproduce
Result with ruby
Result with jruby:
Environment
ruby:
jruby:
Any advice is greatly appreciated.
The text was updated successfully, but these errors were encountered: