Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Directory of Open Access Books (DOAB) fetcher to web search #8576

Closed
ThiloteE opened this issue Mar 15, 2022 · 16 comments
Closed

Add Directory of Open Access Books (DOAB) fetcher to web search #8576

ThiloteE opened this issue Mar 15, 2022 · 16 comments
Labels
fetcher good first issue An issue intended for project-newcomers. Varies in difficulty. type: enhancement

Comments

@ThiloteE
Copy link
Member

https://doabooks.org/

DOAB is a community-driven discovery service that indexes and provides access to scholarly, peer-reviewed open access books (currently ~ 50 000) and helps users to find trusted open access book publishers. All DOAB services are free of charge and all data is freely available.

Where to start, if trying to implement this: https://www.doabooks.org/en/resources/metadata-harvesting-and-content-dissemination

@ThiloteE
Copy link
Member Author

@etienne428 @Azhen917 If you feel like adding yet another one after you are done with the ones you are working on right now 🙃

@ThiloteE ThiloteE added fetcher good first issue An issue intended for project-newcomers. Varies in difficulty. labels Mar 15, 2022
@Mohamadi98
Copy link
Contributor

I can take a look at this, if you don't mind

@Mohamadi98
Copy link
Contributor

okay thanks,
my initial thought is to add a class here org.jabref.logic.importer.fetcher which will implement SearchBasedParserFetcher interface since the data received from the api is not in Bibtex form, then create methods to add parameters to the Url build depending on the user search for this library (I assume it is the text in the image below), and finally add it here org.jabref.logic.importer.WebFetchers#getSearchBasedFetchers method.
Please let me know if to add anything, or I have thought this all wrong.
I have two questions
1- How is the user search data received by the class that I will create
2- I assume there is a tool that converts (xml or Json) to bibtex can you tell how to use it

jabref image

@RonaldSnijder
Copy link

Hi,

I am responsible for the technical development of the OAPEN Library (https://oapen.org/resources/15635975-metadata) and DOAB (https://www.doabooks.org/en/doab/metadata-harvesting-and-content-dissemination). Both use the same platform, based on DSpace6. While I am no developer, please let me know if I can assist in any way.

Thanks,
Ronald

@Mohamadi98
Copy link
Contributor

@etienne428 Hi,
I compared the results from DefaultQueryTransformer
new DefaultQueryTransformer().transformLuceneQuery(luceneQuery).orElse(""), it can work I will just have to manipulate the string returning from the above method a little to be in the format provided by the API https://www.doabooks.org/en/doab/metadata-harvesting-and-content-dissemination, for example
user input->> "Geography", API URL->>https://directory.doabooks.org/rest/search?query=%22Geography%22
user input->> title:The deliverance of open access books, API URL->>https://directory.doabooks.org/rest/search?query=dc.title:%22the+deliverance+of+open+access+books%22
user input --> classification:Politics & Government, API URL->> https://directory.doabooks.org/rest/search?query=dc.subject.classification:%22Politics+%26+Government%22
but for this solution the label used by the user(title, classification) needs to be known, is there a method that returns the user label ?
Or I can create a transformer that handles all that on it's own.
One last thing, I am new to open source so is there a method where you can review what I have implemented so far, I know I can make a pull request but it is not ready yet.

@k3KAW8Pnf7mkmdSMPHz27
Copy link
Member

One last thing, I am new to open source so is there a method where you can review what I have implemented so far, I know I can make a pull request but it is not ready yet.

When you open a pull request you can open it as "draft" (or convert it to draft after you've opened a regular PR)

@Mohamadi98
Copy link
Contributor

Thanks

@k3KAW8Pnf7mkmdSMPHz27
Copy link
Member

@RonaldSnijder do you know if there is any specific format for the authors' names or to whom we should ask that question?

@RonaldSnijder
Copy link

The authors and editor names are stored as text, formatted last name comma, first name. For instance, my name is stored as "Snijder, Ronald" (https://library.oapen.org/handle/20.500.12657/25287?show=full). Does this answer your question?

@RonaldSnijder
Copy link

Sorry, linked to the OAPEN Library. The DOAB link is this: https://directory.doabooks.org/handle/20.500.12854/26303?show=full

@Siedlerchr
Copy link
Member

@RonaldSnijder For most entries this seems to be correct, however in the example query for https://directory.doabooks.org/rest/search?query=water+AND+fire&expand=metadata
@k3KAW8Pnf7mkmdSMPHz27 and I discovered that the authors include (Ed.) and are in the field dc.contributor.author and not dc.contributor.editor
https://directory.doabooks.org/handle/20.500.12854/61440

And then there is this guy with (Liton) in braces. Although this seem to be his name, he is also printed with braces on the cover...
https://directory.doabooks.org/handle/20.500.12854/56322

@k3KAW8Pnf7mkmdSMPHz27
Copy link
Member

Does this answer your question?

Mostly, as @Siedlerchr says. Perhaps I misphrased my question and I should have asked about how the names get into the system in the first place.

<metadata>
	  <key>dc.contributor.author</key>
	  <language>*</language>
	  <value>Fátima Velez de Castro</value>
</metadata>

Does this mean that the first/last name couldn't be determined?

@RonaldSnijder
Copy link

Hi,

The data in DOAB have been added over a decade by multiple publishers, with no complex validation. Furthermore, the contents have been migrated to a new platform. So, this means there will be names that do not exactly match the pattern, sorry about that. An indeed, there are now three fields describing contributor roles:

Regards,
Ronald

@ThiloteE
Copy link
Member Author

#8598 introduced the DOAB fetcher to websearch. Closing this. There may be follow up pull requests, but the main issue why I opened this thread is adressed :-)

Thank you all for your contribution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fetcher good first issue An issue intended for project-newcomers. Varies in difficulty. type: enhancement
Projects
Archived in project
Development

No branches or pull requests

6 participants
@Siedlerchr @k3KAW8Pnf7mkmdSMPHz27 @Mohamadi98 @ThiloteE @RonaldSnijder and others