Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use the :encoding arg on Document.parse even when contents are blank #1043

Closed

Conversation

batter
Copy link
Contributor

@batter batter commented Feb 5, 2014

Currently, Nokogiri::XML::Document.parse and Nokogiri::HTML::Document.parse discards the encoding argument if it receives nil or blank contents for the string_or_io argument. This pull tweaks the code to set the encoding value (if it is passed in) even when the string_or_io argument is blank.

# before this pull request
doc = Nokogiri.HTML(nil, nil, 'UTF-8')
doc.encoding # => nil
# after this pull request
doc = Nokogiri.HTML(nil, nil, 'UTF-8')
doc.encoding # => 'UTF-8'

This is really just a convenience thing, so that if you want to compose a blank document with an particular encoding, you can do something like this:

doc = Nokogiri.HTML(nil, nil, 'UTF-8')
Nokogiri::HTML::DocumentFragment.new(doc, string)

instead of having to do something like this:

doc = Nokogiri::HTML::Document.new
doc.encoding = 'UTF-8'
Nokogiri::HTML::DocumentFragment.new(doc, string)

This is particularly helpful for Ruby18 users where the String class does not have the concept of encoding on it's own. True, you can technically condense the above example down into this:

doc = Nokogiri::HTML::Document.new.tap { |d| d.encoding = 'UTF-8' }
Nokogiri::HTML::DocumentFragment.new(doc, string)

But it's cleaner to just have the parse method enforce the encoding regardless of contents it receives.

@knu
Copy link
Member

knu commented Apr 8, 2014

Looks good to me, but we are already in RC period.
What do you think, @flavorjones?

@flavorjones
Copy link
Member

I'm very sorry to not have responded to this long ago.

This seems like a reasonable change. Will make one comment inline.

if RUBY_VERSION =~ /^1\.9/
class TestDocumentEncoding < Nokogiri::TestCase
class TestDocumentEncoding < Nokogiri::TestCase
if RUBY_VERSION =~ /^1\.9/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imagine, given the time frame of the submission, that this was intended to exclude 1.8 and not to exclude 2.0+. Correct?

@flavorjones
Copy link
Member

Merged manually. Thank you for this contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants