This repository contains the Semantic Clustering plugin for ASReview. It applies multiple techniques (SciBert, PCA, T-SNE, KMeans, a custom Cluster Optimizer) to an ASReview data object, in order to cluster records based on semantic differences. The end result is an interactive dashboard:
The packaged is called semantic_clustering
and can be installed from the
download folder with:
pip install .
or from the command line directly with:
python -m pip install git+https://github.com/asreview/semantic-clusters.git
For help use:
asreview semantic_clustering -h
asreview semantic_clustering --help
Other options are:
asreview semantic_clustering -f <input> -o <output.csv>
asreview semantic_clustering --filepath <input> --output <output.csv>
asreview semantic_clustering -a <output.csv>
asreview semantic_clustering --app <output.csv>
asreview semantic_clustering -v
asreview semantic_clustering --version
asreview semantic_clustering --transformer
The functionality of the semantic clustering extension is implemented in a subcommand extension. The following commands can be run:
In the processing phase, a dataset is processed and clustered for use in the interactive interface. The following options are available:
asreview semantic_clustering -f <input.csv or url> -o <output_file.csv>
Using -f
will process a file and store the results in the file specified in
-o
.
Semantic_clustering uses an ASReviewData
object,
and can handle files, urls and benchmark sets:
asreview semantic_clustering -f benchmark:van_de_schoot_2017 -o output.csv
asreview semantic_clustering -f van_de_Schoot_2017.csv -o output.csv
If an output file is not specified, output.csv
is used as output file name.
Semantic Clustering uses the
allenai/scibert_scivocab_uncased
transformer model as default setting. Using the --transformer <model>
option,
another model can be selected for use instead:
asreview semantic_clustering -f benchmark:van_de_schoot_2017 -o <output_file.csv> --transformer bert-base-uncased
Any pretrained model will work. Here is an example of models, but more exist.
Running the dashboard server is also done from the command line. This command will start a Dash server in the console and visualize the processed file.
asreview semantic_clustering -a output.csv
asreview semantic_clustering --app output.csv
When the server has been started with the command above, it can be found at
http://127.0.0.1:8050/
in your browser.
MIT license
Got ideas for improvement? For any questions or remarks, please send an email to [email protected].