Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does this work? #1

Open
Anonyfox opened this issue Feb 25, 2017 · 1 comment
Open

Does this work? #1

Anonyfox opened this issue Feb 25, 2017 · 1 comment

Comments

@Anonyfox
Copy link

Hey, I just stumbled upon this repo, and it seems that you have ported the famous readability algorithm into rust, using kuchiki and therefore html5ever. First: truly great!

But it seems that this algo does crash when used on actual HTML websites, I get panics like

1:        0x10a9bc24c - std::sys::imp::backtrace::tracing::imp::write::hf587afb8e94ad165
   2:        0x10a9be23e - std::panicking::default_hook::{{closure}}::haf3443cb412055ce
   3:        0x10a9bdde3 - std::panicking::default_hook::h742f925bfab3bbfa
   4:        0x10a9be6f7 - std::panicking::rust_panic_with_hook::h6f06ff8d28a94df6
   5:        0x10a9be5a4 - std::panicking::begin_panic::h7b9167ba3324cfae
   6:        0x10a9be4c2 - std::panicking::begin_panic_fmt::hb5f8f1fe0fe23e28
   7:        0x10a9be427 - rust_begin_unwind
   8:        0x10a9e5e60 - core::panicking::panic_fmt::he6eb92dab4407c61
   9:        0x10a9e5eed - core::option::expect_failed::hf8bba00a6e833438
  10:        0x10a70f373 - <core::option::Option<T>>::expect::hba43ec4f65591df2
  11:        0x10a6cf697 - <std::collections::hash::map::HashMap<K, V, S> as core::ops::Index<&'a Q>>::index::he1febf3b2b851612
  12:        0x10a782795 - readability::Readability::add_info::h3257b725054a9642
  13:        0x10a782026 - readability::Readability::readify::h110ae48756961de8
  14:        0x10a781a7a - readability::Readability::parse::h69c7871f90548046

Maybe this repo needs also some small polish, like publishing on crates.io and a README with a short "how to use". I just figured out that

readability::new().parse(&html_string).text_contents()

works more or less to get started, but I tinkered with kuchiki before. Do you want some help? I might not be of good use for the algorithmic side in Rust yet, but when you have a working state of this crate I'd like to write some docs for you in exchange. What dou you think?

@loyd
Copy link
Owner

loyd commented Feb 28, 2017

Hi, thank you for your attention! I plan to go back to the project next month (I need it in my degree work). I will need to port mozilla's tests and some heuristics to improve precision. Also it's good to abstract the library over any DOM, not only kuchiki.

... actual HTML websites, I get panics like

Can you provide me a webpage that you used when you got this error?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants