Download your RNA data from HuggingFace with rouskinhf!

A wrapper around Huggingface the load data for eFold. You can:

pull datasets from the Rouskinlab's HuggingFace
create datasets from local files

Installation

To download data

pip install rouskinhf

To push data to huggingface (optional)

get a token access from the rouskilab huggingface's page
add this token to your environment

export HUGGINGFACE_TOKEN="hf_yourtokenhere"

To predict structures from rouskinhf (optional)

You'll need to install D. Mathew's RNAstructure Fold (also available on Rouskinlab GitHub).

Check your RNAstructure Fold installation in a terminal:

Fold --version

How to use

Download a dataset

import rouskinhf

rouskinhf.get_dataset(
    name='bpRNA-1m', # the name of a dataset from huggingface/rouskinlab
    force_download = False # use a local copy of the data if it exists
)

Convert whatever format to rouskinhf format

import rouskinhf

rouskinhf.convert(
    format = 'ct', # can be ct, seismic, bpseq, fasta or json (rouskinhf output data structure)
    file_or_folder = 'path/to/my/ct/folder',
    predict_structure = False, # Add structure from RNAstructure
    filter = True, # removes duplicates, non-regular characters and low AUROC
    min_AUROC=0.8,
)

Note: Sequences with bases different than A, C, G, T, U, N, a, c, g, t, u, n are not supported. The data will be filtered out.

Rouskinhf structure format

# rouskinhf_output_file.json
{
    "reference_name": {
        "sequence": "CACGCUAUG",
        "structure": [(0,8), (1,7)], # base pair representation
        # whatever other info you need
    }
}

Name		Name	Last commit message	Last commit date
Latest commit History 225 Commits
.github/workflows		.github/workflows
data		data
rouskinhf		rouskinhf
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
env_template		env_template
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py
tutorial.ipynb		tutorial.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Download your RNA data from HuggingFace with rouskinhf!

Installation

To download data

To push data to huggingface (optional)

To predict structures from rouskinhf (optional)

How to use

Download a dataset

Convert whatever format to rouskinhf format

Rouskinhf structure format

About

Releases 6

Packages

Languages

License

rouskinlab/rouskinhf

Folders and files

Latest commit

History

Repository files navigation

Download your RNA data from HuggingFace with rouskinhf!

Installation

To download data

To push data to huggingface (optional)

To predict structures from rouskinhf (optional)

How to use

Download a dataset

Convert whatever format to rouskinhf format

Rouskinhf structure format

About

Resources

License

Stars

Watchers

Forks

Releases 6

Packages 0

Languages

Packages