Skip to content

Python interface for reading and converting EuPathDB flat file dumps

License

Notifications You must be signed in to change notification settings

satta/eupathtables

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

EuPathTables Build Status

This package provides a Python interface for reading and converting EuPathDB 'gene information table' files as they are provided on the EuPathDB download site. The format in question is a structured text file in a custom format, containing most of the data available in the database in question. Here's an example file.

EuPathTables also recognises UTRs and pseudogenes and provides this information in appropriate fields/types.

Usage

There are two ways of accessing the information in the file: via a Python iterator returning one dict per gene, or via a GenomeTools input stream (which requires the GenomeTools Python bindings). This stream directly returns GenomeTools feature nodes for processing directly from the table without having to create GFF first.

Generator access:

#!/usr/bin/env python

import eupathtables

for g in eupathtables.FlatFileIterator(open('FungiDB-28_Aniger_ATCC1015Gene.txt')):
    print("%s\t%s:%s-%s" % (g['ID'], g['seqid'], g['start'], g['stop']))

Stream access:

#!/usr/bin/env python

import eupathtables
import gt

infile = "FungiDB-28_Aniger_ATCC1015Gene.txt"
# we also create a GAF file with GO terms and products
gaf_out_file = "out.gaf"
# this is the taxon ID to use in the GAF file
taxon_id = 294381

table_in_stream = eupathtables.TableInStream(open(infile), taxon_id)
gff_out_stream = gt.extended.GFF3OutStream(table_in_stream)

fn = gff_out_stream.next_tree()
while fn:
    fn = gff_out_stream.next_tree()

# write GO terms out to GAF 1.0 file
table_in_stream.go_coll.to_gafv1(open(gaf_out_file, "w+"))

We also provide a script for quick conversion to Companion-compatible GAF1 and GFF3:

  eupathtable_to_gff3 -g gaf.out -t 294381 FungiDB-28_Aniger_ATCC1015Gene.txt  > out.gff3

Installation

Download/clone this repo from github, then:

python setup.py install

Contact

[email protected]

About

Python interface for reading and converting EuPathDB flat file dumps

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published