Feature/enable paginated fetchers #7082

DominikVoigt · 2020-11-07T13:02:51Z

This PR adds complex search query support for paginated fetchers.
It additionally implements the corresponding interfaces for a couple of fetchers (arXiv, Scholar, IEEE Xplore, Springer Link).
Refs #6236, #5507, koppor#369, koppor#347

Change in CHANGELOG.md described (if applicable)
Tests created for changes (if applicable)
Manually tested changed features in running JabRef (always required)
Screenshots added in PR description (for UI changes)
Checked documentation: Is the information available and up to date? If not created an issue at https://github.com/JabRef/user-documentation/issues or, even better, submitted a pull request to the documentation repository.

…ginated-fetchers

Signed-off-by: Dominik Voigt <[email protected]>

Siedlerchr · 2020-11-07T13:38:46Z

src/main/java/org/jabref/logic/importer/fetcher/GoogleScholar.java

+            // if there are too much requests from the same IP address google is answering with a 503 and redirecting to a captcha challenge
+            // The caught IOException looks for example like this:
+            // java.io.IOException: Server returned HTTP response code: 503 for URL: https://ipv4.google.com/sorry/index?continue=https://scholar.google.com/scholar%3Fhl%3Den%26btnG%3DSearch%26q%3Dbpmn&hl=en&q=CGMSBI0NBDkYuqy9wAUiGQDxp4NLQCWbIEY1HjpH5zFJhv4ANPGdWj0
+            if (e.getMessage().contains("Server returned HTTP response code: 503 for URL")) {


This struck me recently, isn't there a way to check if the exception is a more specific exception so that one could check for the status code number?

I do not quite understand what the benefits of that would be.
Could you provide me with an example to grasp your idea, please? :)

Signed-off-by: Dominik Voigt <[email protected]>

tobiasdiez

You have currently quite a lot of code duplications. The main sources are:

Complex and string-based queries have to be implemented separately, leading to mostly the same code. Suggestion: remove string-based queries completely
Unpaged search methods always fall back to paged search method with page number 0. Suggestion: put these fallbacks in the general PagedSearchFetcher interface.

tobiasdiez · 2020-11-10T11:45:23Z

src/main/java/org/jabref/logic/importer/fetcher/IEEE.java

@@ -234,8 +228,31 @@ public String getName() {

    @Override
    public URL getComplexQueryURL(ComplexSearchQuery complexSearchQuery) throws URISyntaxException, MalformedURLException {
+        return getComplexQueryURL(complexSearchQuery, 0);


This should be the default implementation, i.e. push to PagedSearchBasedParserFetcher

You have currently quite a lot of code duplications. The main sources are:

Complex and string-based queries have to be implemented separately, leading to mostly the same code. Suggestion: remove string-based queries completely

Unpaged search methods always fall back to paged search method with page number 0. Suggestion: put these fallbacks in the general PagedSearchFetcher interface.

Thanks for your feedback! :)
Regarding 1:
I think that this is a good idea but will require more work than just replacing the method.
This is because for the WebSearchPane for instance the normal string-based search is used as a fallback case when the query could not be parsed.
Therefore I do not feel comfortable just tossing the string-based version.
However, I will address this in the upcoming weeks :)!

I implemented all other suggestions

Good!

Concerning the first point, doesn't it work to parse the queries as follows:
author=me and title=something -> ComplexQuery[author: "me", title: "something", rest: ""]
author=me something -> ComplexQuery[author: "me", title: "", rest: "something"]
something -> ComplexQuery[author: "", title: "", rest: ""something]

Then you don't need any fall-back to a purely string-based search.

In my opinion, this question should be resolved before changing the fetcher in other ways. Otherwise you have a lot of overhead/code duplication now, that will be removed later.

I now removed the normal performSearch all together :)

Thanks a lot! If you now also remove the "complex" in the names, I'm super happy ;-)

I now removed the complex from perform search :)
Regarding the query object, I believe that it is sensible to keep that name as it will be extended in the future such as structured information.

src/test/java/org/jabref/logic/importer/fetcher/SpringerFetcherTest.java

Move common methods into default implementation Signed-off-by: Dominik Voigt <[email protected]>

Signed-off-by: Dominik Voigt <[email protected]>

Adapt query parser and add new tests Signed-off-by: Dominik Voigt <[email protected]>

DominikVoigt added 4 commits November 6, 2020 21:06

Add paged fetchers and complex search paged fetchers

e5fca67

Merge remote-tracking branch 'upstream/master' into feature/enable-pa…

f2ca2cf

…ginated-fetchers

Fix Checkstyle

1fdbc6d

Signed-off-by: Dominik Voigt <[email protected]>

Fix Checkstyle

215c177

Signed-off-by: Dominik Voigt <[email protected]>

Siedlerchr reviewed Nov 7, 2020

View reviewed changes

Make method calls consistent

4d86265

Signed-off-by: Dominik Voigt <[email protected]>

tobiasdiez reviewed Nov 10, 2020

View reviewed changes

DominikVoigt added 4 commits November 11, 2020 16:15

Add test interface for paged fetchers

9a87a5e

Move common methods into default implementation Signed-off-by: Dominik Voigt <[email protected]>

Remove performSearch and adapt all tests accordingly

c76bf45

Signed-off-by: Dominik Voigt <[email protected]>

Remove normal paged search

1896325

Signed-off-by: Dominik Voigt <[email protected]>

Remove the complex part from perform search

4e4e419

Adapt query parser and add new tests Signed-off-by: Dominik Voigt <[email protected]>

koppor merged commit 00e3409 into JabRef:master Nov 15, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/enable paginated fetchers #7082

Feature/enable paginated fetchers #7082

DominikVoigt commented Nov 7, 2020

Siedlerchr Nov 7, 2020

DominikVoigt Nov 7, 2020

tobiasdiez left a comment

tobiasdiez Nov 10, 2020

DominikVoigt Nov 11, 2020

DominikVoigt Nov 11, 2020

tobiasdiez Nov 11, 2020 •

edited

Loading

DominikVoigt Nov 12, 2020

tobiasdiez Nov 12, 2020

DominikVoigt Nov 13, 2020

Feature/enable paginated fetchers #7082

Feature/enable paginated fetchers #7082

Conversation

DominikVoigt commented Nov 7, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tobiasdiez left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tobiasdiez Nov 11, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tobiasdiez Nov 11, 2020 •

edited

Loading