Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add HTML & XML Inspectors API using Nokogiri #546

Merged
merged 6 commits into from
May 16, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
137 changes: 137 additions & 0 deletions bridgetown-builder/lib/bridgetown-builder/dsl/inspectors.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
# frozen_string_literal: true

module Bridgetown
module Builders
module DSL
module Inspectors
# Add a couple familar DOM API features
module QuerySelection
def query_selector(selector)
css(selector).first
end

def query_selector_all(selector)
css(selector)
end
end

# HTML inspector type
module HTML
# Are there inspectors available? Is it an .htm* file?
def self.can_run?(resource, inspectors)
inspectors &&
resource.destination&.output_ext&.starts_with?(".htm") &&
!resource.data.bypass_inspectors
end

# Process the resource with the available inspectors and return the output HTML
#
# @return [String] transformed HTML
def self.call(resource, inspectors)
doc = Nokogiri.HTML5(resource.output)

inspectors.each do |block|
block.call(doc, resource)
end

doc.to_html
end
end

# XML inspector type
module XML
# Strip the resource's initial extension dot. `.rss` => `rss`
def self.resource_ext(resource)
resource.destination&.output_ext&.delete_prefix(".")
end

# Are there any inspectors available which match the resource extension?
def self.can_run?(resource, inspectors)
inspectors &&
inspectors[resource_ext(resource)] &&
!resource.data.bypass_inspectors
end

# Process the resource with the available inspectors and return the output XML
#
# @return [String] transformed XML
def self.call(resource, inspectors)
doc = Nokogiri::XML(resource.output)

inspectors[resource_ext(resource)].each do |block|
block.call(doc, resource)
end

doc.to_xml
end
end

class << self
# Require the Nokogiri gem if necessary and add the `QuerySelection` mixin
def setup_nokogiri
unless defined?(Nokogiri)
Bridgetown::Utils::RequireGems.require_with_graceful_fail "nokogiri"
end

return if Nokogiri::XML::Node <= QuerySelection

Nokogiri::XML::Node.include QuerySelection
end

# Shorthand for `HTML.call`
def process_html(...)
HTML.call(...)
end

# Shorthand for `XML.call`
def process_xml(...)
XML.call(...)
end
end

# Set up an inspector to review or manipulate HTML resources
# @yield the block to be called after the resource has been rendered
# @yieldparam [Nokogiri::HTML5::Document] the Nokogiri document
def inspect_html(&block)
unless @_html_inspectors
@_html_inspectors = []

Inspectors.setup_nokogiri

hook :resources, :post_render do |resource|
next unless HTML.can_run?(resource, @_html_inspectors)

resource.output = Inspectors.process_html(resource, @_html_inspectors)
end
end

@_html_inspectors << block
end

# Set up an inspector to review or manipulate XML resources
# @param extension [String] defaults to `xml`
# @yield the block to be called after the resource has been rendered
# @yieldparam [Nokogiri::XML::Document] the Nokogiri document
def inspect_xml(extension = "xml", &block)
unless @_xml_inspectors
@_xml_inspectors = {}

Inspectors.setup_nokogiri

hook :resources, :post_render do |resource|
next unless Inspectors::XML.can_run?(resource, @_xml_inspectors)

resource.output = Inspectors.process_xml(resource, @_xml_inspectors)
end
end

(@_xml_inspectors[extension.to_s] ||= []).tap do |arr|
arr << block
end

@_xml_inspectors
end
end
end
end
end
3 changes: 3 additions & 0 deletions bridgetown-builder/lib/bridgetown-builder/plugin.rb
Original file line number Diff line number Diff line change
Expand Up @@ -3,15 +3,18 @@
require "bridgetown-builder/dsl/generators"
require "bridgetown-builder/dsl/helpers"
require "bridgetown-builder/dsl/hooks"
require "bridgetown-builder/dsl/inspectors"
require "bridgetown-builder/dsl/http"
require "bridgetown-builder/dsl/liquid"
require "bridgetown-builder/dsl/resources"

module Bridgetown
module Builders
class PluginBuilder
include DSL::Generators
include DSL::Helpers
include DSL::Hooks
include DSL::Inspectors
include DSL::HTTP
include DSL::Liquid
include DSL::Resources
Expand Down
123 changes: 123 additions & 0 deletions bridgetown-builder/test/test_inspectors.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
# frozen_string_literal: true

require "helper"

Bridgetown::Builder # trigger autoload

class TestInspectors < BridgetownUnitTest
include Bridgetown::Builders::DSL::Hooks
include Bridgetown::Builders::DSL::Inspectors
include Bridgetown::Builders::DSL::Resources

def functions # stub to get hooks working
@_test_functions
end

context "a resource after being transformed" do
setup do
Bridgetown.sites.clear
@site = Site.new(site_configuration)
@_test_functions = []

inspect_html do |document|
document.query_selector_all("h1").each do |heading|
heading.content = heading.content.sub("World", "Universe")
heading.add_class "universal"
end
end

inspect_xml "atom" do |document, resource|
title = document.query_selector("entry > title")
title.content = title.content.upcase

assert_equal ".atom", resource.extname
end
end

teardown do
@_html_inspectors = nil
@_xml_inspectors = nil
end

should "allow manipulation via Nokogiri" do
add_resource :posts, "html-inspectors.md" do
title "I'm a Markdown post!"
content <<~MARKDOWN
# Hello World!
MARKDOWN
end

resource = @site.collections.posts.resources.first
assert_equal 1, @site.collections.posts.resources.length
assert_equal "# Hello World!", resource.content.strip
resource.transform!
assert_equal %(<html><head></head><body><h1 id="hello-world" class="universal">Hello Universe!</h1>\n</body></html>),
resource.output.strip
end

should "bypass inspectors with special front matter variable" do
add_resource :posts, "html-inspectors-bypass.md" do
title "I'm a Markdown post!"
bypass_inspectors true
content <<~MARKDOWN
# Hello World!
MARKDOWN
end

resource = @site.collections.posts.resources.first
assert_equal 1, @site.collections.posts.resources.length
assert_equal "# Hello World!", resource.content.strip
resource.transform!
refute_equal %(<html><head></head><body><h1 id="hello-world" class="universal">Hello Universe!</h1>\n</body></html>),
resource.output.strip
end

should "not mess up non-HTML resources" do
add_resource :posts, "no-html-inspectors.json" do
content <<~JSON
{ a: 1, b: "2" }
JSON
end

resource = @site.collections.posts.resources.first
assert_equal 1, @site.collections.posts.resources.length
assert_equal %({ a: 1, b: "2" }), resource.content.strip
resource.transform!
assert_equal %({ a: 1, b: "2" }),
resource.output.strip
end

should "work with XML resources too" do
add_resource :pages, "sample-feed.atom" do
content <<~XML
<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">

<title>Example Feed</title>
<link href="http://example.org/"/>
<updated>2003-12-13T18:30:02Z</updated>
<author>
<name>John Doe</name>
</author>
<id>urn:uuid:60a76c80-d399-11d9-b93C-0003939e0af6</id>

<entry>
<title>Atom-Powered Robots Run Amok</title>
<link href="http://example.org/2003/12/13/atom03"/>
<id>urn:uuid:1225c695-cfb8-4ebb-aaaa-80da344efa6a</id>
<updated>2003-12-13T18:30:02Z</updated>
<summary>Some text.</summary>
</entry>

</feed>
XML
end

resource = @site.collections.pages.resources.first
assert_equal 1, @site.collections.pages.resources.length
assert_includes resource.content, "<title>Atom-Powered Robots Run Amok</title>"
resource.transform!
assert_includes resource.output, "<title>ATOM-POWERED ROBOTS RUN AMOK</title>"
end
end
end
9 changes: 3 additions & 6 deletions bridgetown-core/lib/bridgetown-core/utils/require_gems.rb
Original file line number Diff line number Diff line change
Expand Up @@ -41,15 +41,12 @@ def require_with_graceful_fail(names)
require name
rescue LoadError => e
Bridgetown.logger.error "Dependency Error:", <<~MSG
Yikes! It looks like you don't have #{name} or one of its dependencies installed.
In order to use Bridgetown as currently configured, you'll need to install this gem.
Oops! It looks like you don't have #{name} or one of its dependencies installed.
Please double-check you've added #{name} to your Gemfile.

If you've run Bridgetown with `bundle exec`, ensure that you have included the #{name}
gem in your Gemfile as well.
If you're stuck, you can find help at https://www.bridgetownrb.com/community

The full error message from Ruby is: '#{e.message}'

If you run into trouble, you can find helpful resources at https://www.bridgetownrb.com/community
MSG
raise Bridgetown::Errors::MissingDependencyException, name
end
Expand Down
6 changes: 0 additions & 6 deletions bridgetown-website/frontend/javascript/index.js.rb
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,6 @@

import "turbo_transitions"
import "wiggle_note"
import [ add_heading_anchors ], from: "./lib/functions"

set_timeout 1000 do
document.documentElement.remove_attribute :fresh
Expand All @@ -37,9 +36,6 @@
end
end

#import smoothscroll from 'smoothscroll-polyfill'
#smoothscroll.polyfill()

import "index.css"

import components from "bridgetownComponents/**/*.{js,jsx,js.rb,css}"
Expand All @@ -64,6 +60,4 @@
end
end
end

add_heading_anchors()
end
9 changes: 0 additions & 9 deletions bridgetown-website/frontend/javascript/lib/functions.js.rb

This file was deleted.

14 changes: 14 additions & 0 deletions bridgetown-website/plugins/builders/inspectors.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
class Builders::Inspectors < SiteBuilder
def build
inspect_html do |document|
document.query_selector_all("article h2[id], article h3[id]").each do |heading|
heading << document.create_text_node(" ")
heading << document.create_element(
"a", "#",
href: "##{heading[:id]}",
class: "heading-anchor"
)
end
end
Comment on lines +3 to +12
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😍 This is FANTASTIC

end
end
4 changes: 3 additions & 1 deletion bridgetown-website/src/_docs/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,4 +11,6 @@ Bridgetown gives you a lot of flexibility to customize how it builds your site.
* [Environments](/docs/configuration/environments)
* [Markdown Options](/docs/configuration/markdown)
* [Liquid Options](/docs/configuration/liquid)
* Puma Configuration (_docs coming soon_)
* Puma Configuration (_docs coming soon_)

Most of the ways you'll enhance and extend your site however is through writing [plugins](/docs/plugins). Continue reading for information on how to get started writing your first plugin or installing third-party plugins.
4 changes: 4 additions & 0 deletions bridgetown-website/src/_docs/plugins.md
Original file line number Diff line number Diff line change
Expand Up @@ -171,6 +171,10 @@ Easily pull data in from external APIs, and use a special DSL (Domain-Specific L

Hooks provide fine-grained control to trigger custom functionality at various points in the build process.

### [HTML & XML Inspectors](/docs/plugins/inspectors)

Post-process the HTML or XML output of resources using the Nokogiri Ruby gem and its DOM-like API.

### [Generators](/docs/plugins/generators)

Generators allow you to automate the creating or updating of content in your site using Bridgetown's internal Ruby API.
Expand Down
Loading