-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perform exception handling outside Parser.parse method #107
Conversation
-> allows more consistent use of config.isHaltOnError() (by default everything is logged, but overriding that behavior is easier now) considering all checked and unchecked exceptions as parsing exceptions (was the de facto behavior)
Removed the (new) method: Went all the way for the newer method: |
I'll fix the build errors later today, but you can have a look already and give feedback if you want. It started out after seeing possible non consequent exception handling inside the Parser where the config.isHaltOnError() is used for binary content, but nowhere else. There were multiple try catches everywhere which made things less readable. I also broke up some logic inside a couple of methods for readability. |
Should be ok now, the most breaking changes are the "onError"-methods mentioned and the parse method that throws Exception instead of ParseException. The reason I went for updating the exception handling is that I found a bug with the css parser when parsing some css code with the latest constructs, and I would like that the css file is still processed even if parsing it for outgoing urls fails. At the moment we're not there yet as I kept the behavior as it used to be (even if it looks a little different). Changing how we deal with parse errors inside css files together with fixing this inside ph-css should move things forward. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR. See my comments :)
crawler4j-core/src/main/java/edu/uci/ics/crawler4j/fetcher/PageFetchResult.java
Show resolved
Hide resolved
crawler4j-core/src/main/java/edu/uci/ics/crawler4j/crawler/WebCrawler.java
Show resolved
Hide resolved
crawler4j-core/src/main/java/edu/uci/ics/crawler4j/crawler/WebCrawler.java
Show resolved
Hide resolved
crawler4j-core/src/main/java/edu/uci/ics/crawler4j/crawler/WebCrawler.java
Outdated
Show resolved
Hide resolved
crawler4j-core/src/main/java/edu/uci/ics/crawler4j/crawler/WebCrawler.java
Show resolved
Hide resolved
crawler4j-core/src/main/java/edu/uci/ics/crawler4j/crawler/WebCrawler.java
Show resolved
Hide resolved
crawler4j-core/src/main/java/edu/uci/ics/crawler4j/fetcher/PageFetchResult.java
Show resolved
Hide resolved
Thanks for the explanations, @brbog ! |
…sions, add one where the Exception is passed in
Variable processed is final now. |
-> allows more consistent use of config.isHaltOnError() (by default everything is logged, but overriding that behavior is easier now)
considering all checked and unchecked exceptions as parsing exceptions (was the de facto behavior)