Releases · open-sci/2020-2021-grasshoppers-code

21 Feb 15:17

v1.2.0

4c87ea5

Latest

We have reconsidered the evaluation criteria. Instead of 100 random citations, we selected 10 random citations for each regular expression. We decided to examine a fixed quantity of citations per regular expression and not an amount proportional to the number of matches to remove the bias given by the population under consideration.

The function to generate the evaluation data is get_random_results in evaluation.py.

Assets 2

10 Jun 08:26

arcangelo7

v1.1.0

d83544b

Classes of errors in DOI names

This version contains bug fixes, performance improvements, and some new features:

A new support method was implemented, read_cache. If a cache file was created, this support method reads the data processed up to that moment and restarts the process from the last CSV line read and not from the beginning. This function has been integrated in both check_dois_validity and procedures that create or update a cache file in every DOI. The user can customize the number of DOIs after which the cache updates, as well as the location of the cache file.
The file xu_2019_procedure.py allows to reproduce the Xu et al procedure (2019), in order to measure the improvements due to the new regular expressions introduced.

Finally, in the output folder the complete results for 1,223,298 DOI names can be found, in addition to the result of the manual verification of 100 randomly selected DOIs within the results. The 100 random DOIs were obtained using the new get_random_results method of the Support class.

Assets 2