-
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove redundant columns after doi_retrieval #35
Conversation
When running the quality check script, you will now be probed to manually check whether the title, abstract, authors, year, and journal are identical for duplicates identified only on title:
You can answer with Y or N, and depending on that answer, the records will either be deduplicated, or kept as they are. |
With the last commits the following was achieved:
Update the merging function in the conservative strategy, which increase the number of found duplicates! |
…/asreview/paper-megameta-postprocessing-screeningresults into conservative-deduplication-patch
…' into conservative-deduplication-patch
This reverts commit 0aed29b.
…/asreview/paper-megameta-postprocessing-screeningresults into conservative-deduplication-patch
With this PR the deduplication script and specifically the conservative deduplication part is patched.
When importing the results from the doi_retrieval part, it appeared that some columns were created through R, which botched the conservative_deduplication script.
With this PR specifically the following is patched:
df_doi_deduplicated
(this a temporarily saved object, this prevents you from having to repeat the doi deduplication of more than 20.300 duplicated sets)This PR fixes issue #36 and issue #37.