-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/enable paginated fetchers #7082
Feature/enable paginated fetchers #7082
Conversation
Signed-off-by: Dominik Voigt <[email protected]>
Signed-off-by: Dominik Voigt <[email protected]>
// if there are too much requests from the same IP address google is answering with a 503 and redirecting to a captcha challenge | ||
// The caught IOException looks for example like this: | ||
// java.io.IOException: Server returned HTTP response code: 503 for URL: https://ipv4.google.com/sorry/index?continue=https://scholar.google.com/scholar%3Fhl%3Den%26btnG%3DSearch%26q%3Dbpmn&hl=en&q=CGMSBI0NBDkYuqy9wAUiGQDxp4NLQCWbIEY1HjpH5zFJhv4ANPGdWj0 | ||
if (e.getMessage().contains("Server returned HTTP response code: 503 for URL")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This struck me recently, isn't there a way to check if the exception is a more specific exception so that one could check for the status code number?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not quite understand what the benefits of that would be.
Could you provide me with an example to grasp your idea, please? :)
Signed-off-by: Dominik Voigt <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have currently quite a lot of code duplications. The main sources are:
- Complex and string-based queries have to be implemented separately, leading to mostly the same code. Suggestion: remove string-based queries completely
- Unpaged search methods always fall back to paged search method with page number 0. Suggestion: put these fallbacks in the general
PagedSearchFetcher
interface.
@@ -234,8 +228,31 @@ public String getName() { | |||
|
|||
@Override | |||
public URL getComplexQueryURL(ComplexSearchQuery complexSearchQuery) throws URISyntaxException, MalformedURLException { | |||
return getComplexQueryURL(complexSearchQuery, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be the default implementation, i.e. push to PagedSearchBasedParserFetcher
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have currently quite a lot of code duplications. The main sources are:
- Complex and string-based queries have to be implemented separately, leading to mostly the same code. Suggestion: remove string-based queries completely
- Unpaged search methods always fall back to paged search method with page number 0. Suggestion: put these fallbacks in the general
PagedSearchFetcher
interface.
Thanks for your feedback! :)
Regarding 1:
I think that this is a good idea but will require more work than just replacing the method.
This is because for the WebSearchPane for instance the normal string-based search is used as a fallback case when the query could not be parsed.
Therefore I do not feel comfortable just tossing the string-based version.
However, I will address this in the upcoming weeks :)!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I implemented all other suggestions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good!
Concerning the first point, doesn't it work to parse the queries as follows:
author=me and title=something
-> ComplexQuery[author: "me", title: "something", rest: ""]
author=me something
-> ComplexQuery[author: "me", title: "", rest: "something"]
something
-> ComplexQuery[author: "", title: "", rest: ""something]
Then you don't need any fall-back to a purely string-based search.
In my opinion, this question should be resolved before changing the fetcher in other ways. Otherwise you have a lot of overhead/code duplication now, that will be removed later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I now removed the normal performSearch all together :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot! If you now also remove the "complex" in the names, I'm super happy ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I now removed the complex from perform search :)
Regarding the query object, I believe that it is sensible to keep that name as it will be extended in the future such as structured information.
src/test/java/org/jabref/logic/importer/fetcher/SpringerFetcherTest.java
Outdated
Show resolved
Hide resolved
Move common methods into default implementation Signed-off-by: Dominik Voigt <[email protected]>
Signed-off-by: Dominik Voigt <[email protected]>
Signed-off-by: Dominik Voigt <[email protected]>
Adapt query parser and add new tests Signed-off-by: Dominik Voigt <[email protected]>
This PR adds complex search query support for paginated fetchers.
It additionally implements the corresponding interfaces for a couple of fetchers (arXiv, Scholar, IEEE Xplore, Springer Link).
Refs #6236, #5507, koppor#369, koppor#347