-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
アクセス可能なリンクがリンク切れと判定されないようにする #4046
Changes from all commits
67f0abf
824de67
579c43a
9617aad
9873ba8
12084f9
3f26c08
e3e2304
2ab1ac0
cbc5bf7
471321e
f0afbb8
111d4d4
9e2536c
2dede3e
a88c60e
bf1429a
1e6cbde
03d357e
ba1c01f
579d713
54625c2
b4953cd
c9d9f83
e73e103
4707a61
cb8e73e
4309569
4036231
09e20c9
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3,90 +3,55 @@ | |
require 'net/http' | ||
|
||
module LinkChecker | ||
class Checker | ||
DENY_LIST = %w[ | ||
codepen.io | ||
www.amazon.co.jp | ||
module Checker | ||
DENY_HOST = [ | ||
'codepen.io', | ||
'www.amazon.co.jp' # アクセスを繰り返すとリンク切れ判定のレスポンスが返されるようになるため | ||
].freeze | ||
attr_reader :errors | ||
|
||
def initialize | ||
@errors = [] | ||
@error_links = [] | ||
end | ||
module_function | ||
|
||
def notify_missing_links | ||
check | ||
return if @error_links.empty? | ||
def check_broken_links(links) | ||
links_with_valid_url = links.select { |link| valid_url?(link.url) && !denied_host?(link.url) } | ||
links_with_response = check_response(links_with_valid_url) | ||
broken_links = links_with_response.select { |link| !link.response || link.response > 403 } | ||
|
||
texts = ['リンク切れがありました。'] | ||
@error_links.map do |link| | ||
texts << "- <#{link.url}|#{link.title}> in: <#{link.source_url}|#{link.source_title}>" | ||
end | ||
summary(broken_links) | ||
end | ||
|
||
ChatNotifier.message(texts.join("\n"), username: 'リンクチェッカー') | ||
def valid_url?(url) | ||
uri = Addressable::URI.parse(url) | ||
uri.scheme && uri.host | ||
rescue Addressable::URI::InvalidURIError | ||
false | ||
end | ||
|
||
def check | ||
def denied_host?(url) | ||
uri = Addressable::URI.parse(url) | ||
DENY_HOST.include?(uri.host) | ||
end | ||
|
||
def check_response(links) | ||
locks = Queue.new | ||
5.times { locks.push :lock } | ||
all_links.reject! do |link| | ||
url = URI.encode_www_form_component(link.url) | ||
uri = URI.parse(url) | ||
|
||
!uri || DENY_LIST.include?(uri.host) | ||
end | ||
all_links.map do |link| | ||
links.each do |link| | ||
Thread.new do | ||
lock = locks.pop | ||
response = Client.request(link.url) | ||
link.response = response | ||
@error_links << link if !response || response > 403 | ||
link.response = Client.request(link.url) | ||
locks.push lock | ||
end | ||
end.each(&:join) | ||
|
||
@error_links.sort { |a, b| b.source_url <=> a.source_url } | ||
end | ||
|
||
def all_links | ||
page_links + practice_links | ||
end | ||
|
||
private | ||
|
||
def page_links | ||
links = [] | ||
Page.order(:created_at).each do |page| | ||
extractor = Extractor.new( | ||
page.body, | ||
page.title, | ||
"https://bootcamp.fjord.jp#{Rails.application.routes.url_helpers.polymorphic_path(page)}" | ||
) | ||
links += extractor.extract | ||
end.join | ||
end | ||
|
||
links | ||
end | ||
|
||
def practice_links | ||
links = [] | ||
Practice.order(:created_at).each do |practice| | ||
practice_url = Rails.application.routes.url_helpers.polymorphic_path(practice) | ||
extractor = Extractor.new( | ||
practice.description, | ||
practice.title, | ||
"https://bootcamp.fjord.jp#{practice_url}" | ||
) | ||
links += extractor.extract | ||
def summary(broken_links) | ||
return if broken_links.empty? | ||
|
||
extractor = Extractor.new( | ||
practice.goal, | ||
practice.title, | ||
"https://bootcamp.fjord.jp#{practice_url}" | ||
) | ||
links += extractor.extract | ||
end | ||
links | ||
texts = ['リンク切れがありました。'] | ||
texts << broken_links.sort.map(&:to_s) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Link クラスの sort 処理(並び順の判定ロジック)と文字列表現( |
||
texts.join("\n") | ||
end | ||
end | ||
end |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,28 +1,23 @@ | ||
# frozen_string_literal: true | ||
|
||
module LinkChecker | ||
Link = Struct.new(:title, :url, :source_title, :source_url, :response) | ||
module Extractor | ||
MARKDOWN_LINK_REGEXP = %r{\[(.*?)\]\((#{URI::DEFAULT_PARSER.make_regexp}|/.*?)\)}.freeze | ||
|
||
class Extractor | ||
def initialize(markdown_text, source_title, source_url) | ||
@markdown_text = markdown_text | ||
@source_title = source_title | ||
@source_url = source_url | ||
module_function | ||
|
||
def extract_links_from_multi(documents) | ||
documents.flat_map { |document| extract_links_from_a(document) } | ||
end | ||
|
||
def extract | ||
links = @markdown_text.scan(/\[(.*?)\]\((.+?)\)/)&.map do |match| | ||
title = match[0].strip | ||
url = match[1].strip | ||
if url.match?(%r{^/}) | ||
uri = URI(@source_url) | ||
uri.path = '' | ||
url = uri.to_s + url | ||
end | ||
Link.new(title, url, @source_title, @source_url) | ||
end | ||
def extract_links_from_a(document) | ||
document.body.scan(MARKDOWN_LINK_REGEXP).map do |title, url_or_path| | ||
title = title.strip | ||
url_or_path = url_or_path.strip | ||
url_or_path = "https://bootcamp.fjord.jp#{url_or_path}" if url_or_path.match?(%r{^/}) | ||
|
||
links.select { |link| URI::DEFAULT_PARSER.make_regexp.match(link.url) } | ||
Link.new(title, url_or_path, document.title, "https://bootcamp.fjord.jp#{document.path}") | ||
end | ||
end | ||
end | ||
end |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
# frozen_string_literal: true | ||
|
||
module LinkChecker | ||
Link = Struct.new(:title, :url, :source_title, :source_url, :response) do | ||
def to_s | ||
"- <#{url} | #{title}> in: <#{source_url} | #{source_title}>" | ||
end | ||
|
||
def <=>(other) | ||
(source_url <=> other.source_url).nonzero? || url <=> other.url | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Check.summary のテストを常に成功させるため、 link の並び順を決める(比較する)ときに source_url だけでなく url も比較できるようにしました |
||
end | ||
end | ||
end |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -35,7 +35,6 @@ page5: | |
title: WIPのテスト | ||
body: WIP | ||
user: komagata | ||
published_at: "2021-04-01 00:00:00" | ||
|
||
page6: | ||
title: ヘルプのページ | ||
|
@@ -78,3 +77,29 @@ page11: | |
user: komagata | ||
practice: practice1 | ||
published_at: "2021-10-01 00:00:00" | ||
|
||
page12: | ||
title: apt | ||
body: |- | ||
aptとはdebianでソフトウェアをネットワークからインストールするコマンドです。 | ||
[TEST](/test)(/test2) | ||
[missing](test) | ||
- 参考 | ||
- [APT - Wikipedia](http://ja.wikipedia.org/wiki/APT) | ||
## Q&A | ||
- Q. `$ apt-cache search vim` の検索結果が多すぎる | ||
- A. [正規表現](https://ja.wikipedia.org/wiki/%E6%AD%A3%E8%A6%8F%E8%A1%A8%E7%8F%BE) を使う。 | ||
- 完全一致: `$ apt-cache search ^vim$` | ||
- 前方一致: `$ apt-cache search ^vim` | ||
user: komagata | ||
published_at: "2022-01-01 00:00:00" | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. published_atの記載がないと正しく表示されないケースがあるかもしれません。 |
||
page13: | ||
title: リンク切れチェッカーのテスト用リンクを載せたページ | ||
body: |- | ||
[リンク切れ判定対象外の URL へのリンク](https://www.amazon.co.jp) | ||
[末尾が閉じ括弧の URL へのリンク](https://ja.wikipedia.org/wiki/マジックナンバー_(プログラム)) | ||
[SSLサーバー証明書の検証に失敗する host へのリンク](https://www.tablesgenerator.com/markdown_tables) | ||
[日本語を含む URL へのリンク](https://ja.wikipedia.org/wiki/あ) | ||
user: komagata | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. published_atの記載がないと正しく表示されないケースがあるかもしれません。 |
||
published_at: "2022-01-01 00:00:00" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checker#check
の中で以下の3つの処理が行われており、メソッドの責務が大きいと思ったため、それぞれ分けて行うようにしました。