-
-
Notifications
You must be signed in to change notification settings - Fork 903
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
memory corruption from meta tags claiming a charset of ISO-8859-1 #55
Comments
jmhodges - what version of Nokogiri are you using? With 1.2.3 I cannot reproduce. On 1.2.4 I get an encoding error:
|
Yeah, 1.2.4 and 1.2.3 for me. Here's a gist of an irb session that didn't segfault immediately with nokogiri 1.2.3 and here's one that did with nokogiri 1.2.4. As you can see, I only get the error you did when the code does not segfault in 1.2.3. What is the output of meta_tag.to_s (l.to_s in those posts) on your machine? I did see similar errors in a larger file that had some entities seemingly translated to utf-8 while the file was being parsed (I believe! Not sure!) as ISO-8859-1. |
Sorry, added a simpler bit of code that expresses it. Using #at instead of #traverse. |
whoop, I've managed to get valgrind to complain. consider it reproduced, and I'm on the case. |
Cool, I wasn't able to get the OS X beta valgrind to talk to me much about it but I haven't used it before. If you want, I can run it again and post something up (assuming, you're not already on OS X). |
This is a libxml2 bug. I just wrote a C program that reproduces it, and will be submitting it to libxml2's tracker tonight. |
C program to reproduce is at http://gist.github.com/112897 |
Bug has been filed at http://bugzilla.gnome.org/show_bug.cgi?id=582913 |
mdalessio, I love you a little bit. Never change. <3 |
Also, sorry for not hunting this down myself. I got lost in a thicket of build problems last night. |
awwww, now you made me blush. |
closing this ticket, since it's now in the good hands of the libxml2 team. |
F yer I: Daniel V (maintainer of libxml2) has updated the libxml2 bugzilla ticket. Here's his comment
You might want to check out that version and verify that it addresses your particular pain. Have a nice day. |
Neither libxml2 nor Nokogiri contain an API for setting the line numbers for a node. When the libxml2 headers are available, the line numbers can be set directly in the node structure. Closes: #53
Here's an example. google-try contains nothing but the meta tag as seen in the read.
Sometimes this segfaults. Sometimes it works fine. Hunting.
Oh, and if you remove the 'charset=ISO-8859-1', it does not happen ever.
And this is OS X, libxml2 2.7.3 (2.7.3_0 in MacPorts).
The text was updated successfully, but these errors were encountered: