Code repository for: 'The effect of dataset size on neural network performance within systematic reviewing'

A repository of code accompanying a study into the effect of dataset size on neural network performance within systematic reviewing. The code here can be used to reproduce the simulation study included in the study. In the simulation study, the systematic review process contained in ASReview is applied to dataset samples of different sizes, using a neural network classifier. The results here were generated using ASReview v0.17.

Installation

Running this simulation study requires Python 3.6+. After installing Python, ASReview can be installed using

pip install asreview

Gensim is also required to run the simulation, it can be installed with

pip install --upgrade gensim

Data

Three different systematic review datasets were used to perform the simulation study:

Nudging - Systematic review study performed by Nagtegaal et al. on nudging healthcare professionals towards evidence based medicine: Dataset - Paper
Software - Systematic review study performed by Hall et al. on software fault detection: Dataset - Paper
Depression - Systematic review study performed by Brouwer et al. on depressive relapse: Dataset - Paper

Smaller datasets were sampled out of the original datasets, the samples used in the simulation study are included in this repository. The full datasets are not included here, but can be obtained from the links above.

How to use

Data preprocessing:

This section can be skipped if the dataset samples included in this repository are used

The original datasets should be placed in the data folder, and the files named 'Brouwer_2019.csv', 'Hall_2012.csv' and 'Nagtegaal_2019.csv'

Then run the data_generation notebook contained in the scripts folder to generate the samples out of the original dataset.

Running the simulation

The commands needed to run the simulation are all included within the jobs.sh file, running this file will perform the full simulation. Warning: running the full simulation can take multiple days. The simulation process can be safely interrupted by using the keyboard interrupt and can be resumed by running jobs.sh again.

Simulation outcomes

The metrics used to evaluate the simulation outcome are written by the shell script to the tables folder (contained in output). Plots for visual analysis can be generated by running the results notebook in the scripts folder.

License

The scripts in this repository are MIT licensed.

Contact

[email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
output		output
scripts		scripts
.gitignore		.gitignore
.zenodo.json		.zenodo.json
LICENSE.txt		LICENSE.txt
README.md		README.md
jobs.sh		jobs.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Code repository for: 'The effect of dataset size on neural network performance within systematic reviewing'

Installation

Data

How to use

Data preprocessing:

Running the simulation

Simulation outcomes

License

Contact

About

Releases 1

Packages

Languages

License

govertv/asreview-study-nn-sample-size

Folders and files

Latest commit

History

Repository files navigation

Code repository for: 'The effect of dataset size on neural network performance within systematic reviewing'

Installation

Data

How to use

Data preprocessing:

Running the simulation

Simulation outcomes

License

Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages