Releases: open-sci/2020-2021-grasshoppers-code
New evaluation system
We have reconsidered the evaluation criteria. Instead of 100 random citations, we selected 10 random citations for each regular expression. We decided to examine a fixed quantity of citations per regular expression and not an amount proportional to the number of matches to remove the bias given by the population under consideration.
The function to generate the evaluation data is get_random_results
in evaluation.py.
Classes of errors in DOI names
This version contains bug fixes, performance improvements, and some new features:
- A new support method was implemented, read_cache. If a cache file was created, this support method reads the data processed up to that moment and restarts the process from the last CSV line read and not from the beginning. This function has been integrated in both check_dois_validity and procedures that create or update a cache file in every DOI. The user can customize the number of DOIs after which the cache updates, as well as the location of the cache file.
- The file xu_2019_procedure.py allows to reproduce the Xu et al procedure (2019), in order to measure the improvements due to the new regular expressions introduced.
Finally, in the output folder the complete results for 1,223,298 DOI names can be found, in addition to the result of the manual verification of 100 randomly selected DOIs within the results. The 100 random DOIs were obtained using the new get_random_results method of the Support class.
Classes of errors in DOI names
First stable release of the software for cleaning and classifying invalid DOI names collected by the OpenCitations COCI Project.
open-sci/2020-2021-grasshoppers-code: Release 1.0.0-alpha
First release of our software for cleaning invalid DOIs collected by the OpenCitation's COCI Project.