Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove redundant columns after doi_retrieval #35

Merged
merged 19 commits into from
Dec 16, 2021

Conversation

SagevdBrand
Copy link
Member

@SagevdBrand SagevdBrand commented Dec 15, 2021

With this PR the deduplication script and specifically the conservative deduplication part is patched.
When importing the results from the doi_retrieval part, it appeared that some columns were created through R, which botched the conservative_deduplication script.

With this PR specifically the following is patched:

  • Remove the redundant columns described above
  • Store the output after doi_deduplication in the R environment as a variable called df_doi_deduplicated (this a temporarily saved object, this prevents you from having to repeat the doi deduplication of more than 20.300 duplicated sets)
  • Fix the text to be put on the console after deduplicating via the conservative strategy.
  • Add descriptives to be outputted to the console for the quality check
  • Add manual check for duplicates identified based on title for quality check
  • Add records to TEST file which will pop up in conservative deduplication strategy

This PR fixes issue #36 and issue #37.

@SagevdBrand
Copy link
Member Author

When running the quality check script, you will now be probed to manually check whether the title, abstract, authors, year, and journal are identical for duplicates identified only on title:

Possibly duplicated titles:
 Outcome of panic disorder with or without concomitant depression: A 2-year prospective follow-up study
Outcome of panic disorder with or without concomitant depression: A 2-year prospective follow-up study 
 
 Abstract:
 In a prospective 2-year follow-up study, 32 patients with panic disorder alone and 20 with panic disorder and concomitant depression were investigated. After controlled treatment with either imipramine or doxepin, patients received naturalistic treatment with antidepressants, benzodiazepines, and supportive psychotherapy. They were evaluated for anxiety, depression, and social disability at least every 3 months during the follow-up period. The data showed fluctuation of symptoms in both groups and a less favorable outcome for the patients with comorbid conditions. However, the overall outcome was better than that reported in other studies and indicates that panic disorder is quite responsive to appropriate treatment.
 
In a prospective 2-year follow-up study, 32 patients with panic disorder alone and 20 with panic disorder and concomitant depression were investigated. After controlled treatment with either imipramine or doxepin, patients received naturalistic treatment with antidepressants, benzodiazepines, and supportive psychotherapy. They were evaluated for anxiety, depression, and social disability at least every 3 months during the follow-up period. The data showed fluctuation of symptoms in both groups and a less favorable outcome for the patients with comorbid conditions. However, the overall outcome was better than that reported in other studies and indicates that panic disorder is quite responsive to appropriate treatment. 
 
 Authors:
 Albus, M., Scheibe, G.
 
Albus, M., Scheibe, G. 
 
 Year:
 1993
1993 
 
 Journal:
 The American Journal of Psychiatry
The American Journal of Psychiatry
Is this an actual duplicate? Y or N?

You can answer with Y or N, and depending on that answer, the records will either be deduplicated, or kept as they are.

@SagevdBrand
Copy link
Member Author

With the last commits the following was achieved:
The TEST files are slightly adapted to:

  • prompt the conservative_deduplication strategy
  • prompt the extra deduplication based on title strategy within the quality check

Update the merging function in the conservative strategy, which increase the number of found duplicates!

@Rensvandeschoot Rensvandeschoot merged commit de9d7a8 into main Dec 16, 2021
@Rensvandeschoot Rensvandeschoot deleted the conservative-deduplication-patch branch December 16, 2021 13:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Deduplication during quality check is not conservative Conservative deduplication does not run
2 participants