The NIH Common Fund’s Stimulating Peripheral Activity to Relieve Conditions (SPARC) program aims to transform our understanding of nerve-organ interactions with the intent of advancing bioelectronic medicine towards treatments that change lives. Learn more about SPARC
By employing a FAIR (Findable, Accessible, Interoperable and Reusable) first approach SPARC datasets, protocols and publications generated via the SPARC program is intended to be able to be used by researchers globally with reproducible results. However, at the current moment, there is no real tangible way to show or visualize the usage of SPARC data in outside projects and publications.
The SPARClink project was first born as an idea at the 2021 NIH SPARC Codeathon (more details here). The idea behind the topic was created as a method of visualizing citation data on datasets, protocols and publications to determine the degree of use of SPARC material outside of the official channels.
The word 'Impact' can have many different meanings depending on the context that it is viewed in. Within the SPARClink project, we consider impact to be the frequency of citations of SPARC funded resources. The SPARC program intends to advance medical understanding by providing datasets, maps and computational studies that follow FAIR principles and is used by researchers all around the world. The usage of SPARC resouces by platforms and programs ouside SPARC is what we view as the meaning of the term 'Impact'.
The goal of SPARClink is to provide a system that will query all external publications using open source tools and platforms and create an interactable visualization that is helpful to any person (researcher or otherwise) to showcase the impact that SPARC has on the overall scientific research community. These impact measurements are meant to be used as a showcase of the concept of FAIR data and how good data generation practices and methods are useful in advancing the field of bioelectronic medicine.
However, datasets and protocols are not referenced similar to prior research in manuscripts. Dataset and protocol identifiers or urls are only mentioned in text or under supplementary materials, making this a difficult task to accomplish.
Metadata information on datasets and protocols are extracted from Pennsieve, SPARC Airtable database, and Protocols.io. This information is queried against the NIH RePORTER, NCBI, and Google Scholar to extract citations and create a well connected graph using d3.js.
Clone or download the repository.
git clone https://github.com/SPARC-FAIR-Codeathon/SPARClink.git
The development environment uses Anaconda to keep track of the python dependencies. Download Anaconda here: Anaconda Individual Edition.
The following would create a new conda
environment with the dependencies required to run the project.
cd SPARClink
conda env create -f environment.yml --prefix ./env
conda activate ./env
The application uses python-dotenv to load configuration information from a .env
file. Create a .env
file with the following information.
PROTOCOLS_IO_KEY="<protocols.io api key>"
SERPAPI_KEY="<serpapi api key>"
A public API key for protocols.io can be obtained by signing up as shown here. SERP api key is not required at the moment. To integrate google scholar results, an API key can be obtained as shown here.
Unit tests to verify external APIs are written in Python unittest framework. The tests can be run as shown below:
python -m unittest -v tests/test_NIH_NCBI.py
Currently, the central database is implemented as a Firebase real-time database. The database can be updated by running FirebaseImplementation.py
. However, this requires a username and a password.
To use your own Firebase instance, setup a Firebase web app as shown here, and update firebaseConfig
in FirebaseImplementation.py
with the new API keys. Setup a new user, and configure the real-time database. It is recommended to limit the database write permission to authenticated users. Run FireabaseImplementation.py
and enter user's email/password when prompted.
Backend Flow Chart: Shows the methods implemented in the backend to gather citations of datasets, protocols, and SPARC publications.
We have setup a Flask server on pythonanywhere to handle all our machine learning operations. If you would like to setup a backend for your own fork, please setup a flask server on any hosting service of your choice and modify the approriate endpoints in the flask_app.py
file. To learn more about the techniques we used, refer to the Further Reading section.
The vizualizations created from the realtime database can be viewed directly from our demo page or by running the local version of our frontend. We use Vue.js and Tailwind CSS to render the demo webpage. The interactive force directed graph is created via d3.js using data requested from our Firebase real-time database. Within the SPARClink demo page we use the HTML canvas element to render the visualization. In order to get your forked repo frontend to run locally, use the following commands:
cd frontend
npm install
npm run serve
You can now open your browser and visit the url http://localhost:8080/sparclink to view the webpage.
Note:
To use the smart word filter, please refer to the frontend available in the smart_filter
branch. This feature will lead to slower render times on the graph visualization so we have not included it in the main branch.
If you would like to suggest an idea to this project, please let us know in the issues page and we will take a look at your suggestion. Please use the enhacement
tag to label your suggestion.
If you would like to add your own feature, feel free to fork the project and send a pull request our way. This is an open source project so we will welcome your contributiobs with open arms. Refer to our Contributing Guildeines and Code of Conduct for more information. Add a GitHub Star to support active development!
SPARClink is an open source project and distributed under the MIT License. See LICENSE for more details.