Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for multi-paper PDFs (AKA proceedings) #8128

Open
4 tasks
koppor opened this issue Oct 7, 2021 · 5 comments
Open
4 tasks

Support for multi-paper PDFs (AKA proceedings) #8128

koppor opened this issue Oct 7, 2021 · 5 comments

Comments

@koppor
Copy link
Member

koppor commented Oct 7, 2021

One type of publication are conference proceedings. There, multiple papers are collected in proceedings. There are also books with multiple chapters. Example: "Cyber-Physical Systems of Systems", https://link.springer.com/book/10.1007/978-3-319-47590-5

image

As researcher, I am interested in a) PDFs for my BibTeX entries and b) having the first page of the paper opened when I am opening the PDF of the paper. Moreover, I c) have existing PDF files on my hard
disk, I would like to import (refs #7929).

Additionally,, when I import a PDF file to an entry, JabRef's fulltext fetcher sometimes fetches the complete PDF and not the PDF of the entry itself. This is "OK" for me, because I sometimes have multiple papers of one proceeding, thus keeping one proceedings PDF is interesting for me, too.

  • When importing a PDF file (File -> import, "Find unlinked files", following should be done
    • Determine the List<BibEntry> contained in the PDF
    • Create one proceedings/collection/book entry for the whole PDF (collectionEntry)
      • type according to the determined book type
    • For each BibEntry: Create BibEntry in library
      • crossref the collectionEntry (pay attention of the differences of BibTeX and BibLaTeX mode)

Regarding the PDF handling of multi-entry PDF files:

  • JabRef should offer to jump to the first page specified in the pages field when opening the attached PDF
  • When no attached PDF is present, but a cross-referenced entry and that entry has a PDF attached, JabRef should offer the functionality to a) open the PDF of cross-referenced entry and b) jump to a specific page. Thereby, the target page in the pages field should be respected.

Optionally: When attaching a PDF to an existing entry), there should be following done:

  • Split PDF: In case a PDF is a multi-paper PDF, JabRef should split the PDF

    • Keep the original PDF file
    • Determine the pages of the paper inside the PDF
    • Copy these pages into a new PDF file
    • Attach this PDF file to the current entry
  • The split functionality could also be done "on demand". A user selects the PDF attached to an entry and selects "split PDF". Then, JabRef splits the PDF and creates BibEntries for each contained paper.

@btut
Copy link
Contributor

btut commented Oct 13, 2021

This also affects fulltext search.
When linking a proceedings PDF, the whole PDF is indexed. When searching, that PDF will end up in the search results even though the hit might be in a paper that was not added to the database.
Ideally, this would be detected and only the pages of papers in the database are indexed and linked to the correct bibentry.

@zhaoqingying123
Copy link
Contributor

Hi, we are a group of students studying Master of Computer Science in University of Adelaide and we wanted to check if this issue is available for us to work on for our assignment. And if it is available, we would like to get some valuable inputs based on the previous contributions to this issue.

@koppor
Copy link
Member Author

koppor commented Apr 3, 2022

@zhaoqingying123 The issue is still available. Please first start with test cases. For that, please fetch example PDFs (or create example PDFs to avoid licensing issues).

The previous approach was made by University of Basel.

Here is the documentation: https://github.com/thepauljs/jabref/tree/main/docs/sweng

Here is the code: https://github.com/thepauljs/jabref/blob/main/src/main/java/org/jabref/logic/importer/fileformat/MultiPaperHandler.java - with test cases https://github.com/thepauljs/jabref/blob/main/src/test/java/org/jabref/logic/importer/fileformat/MultiPaperHandlerTest.java

The documentation and test cases are a good start. It stills needs much work to get it finished. So, a good chance for you to improve JabRef!

@ThiloteE
Copy link
Member

ThiloteE commented Apr 3, 2022

Welcome and thank you! Adding to koppor, also check out the guidelines for contributing to Jabref. They can be found here: https://github.com/JabRef/jabref/blob/main/CONTRIBUTING.md. See here for a rough outline of this process. In general, it is advised to open a (draft) pull request early on so that reviewers have time to comment and the general direction of the request becomes clear. This will allow you to receive valuable feedback!

If you have any questions, feel free to ask! Either here at GitHub, or you also can join our gitter chat.

@koppor
Copy link
Member Author

koppor commented Jul 16, 2024

There is also the other way round: mulitple PDFs for a single entry. See https://github.com/beckus/ieeetranplus for an example.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Free to take
Development

No branches or pull requests

4 participants