Shinso is a tool that will "vampire" content by crawling, parsing and making smart decisions about the page content.
You should be able to use shinso in a Rails 2.3 environment with ruby 1.8 without much hustle.
-
Put the shunso.rb file in your Ruby on Rails project under the models folder.
-
Requirements
- ruby 1.8+
- rails 2.3+
- zlib
- nokogiri
- sanitize
- htmlentities
- readability
- charguess
- rest-client
-
In your controller simple do
@shinso = Shinso.new timeout(25) do if @shinso.crawl(params[:url]) @shinso.parse() @shinso.decision() #@shinso.semantics() end end
where "semantics" is a way you can call a service to return some custom data back to you
The code is very legacy now. Contributions are welcome as long as they are tested against a bunch of multi-language articles.
Shinso.rb is released under the MIT License.