Skip to content

gtf_descriptors.py

Ken Nakatsu edited this page Jul 25, 2023 · 3 revisions

Purpose

Describe gtf files

gtf_descriptors.py - to describe annotation files, i.e. the number of times certain attributes appear

compare_sequence

Compare internal sequence with pulled sequence in a GTF file.

Parameters:

  • gtf_file (str): The file name of the GTF file to be analyzed.
  • output_name (str): The desired output file name to store the comparison results.
  • sequence_feature1 (str): The first sequence feature to compare.
  • sequence_feature2 (str): The second sequence feature to compare.

Usage:

compare_sequence(gtf_file, output_name, sequence_feature1, sequence_feature2)

countby_field

Count the number of occurrences of a certain column value in a GTF file.

Parameters:

  • input_gtf (str): The file name of the input GTF file.
  • output_name (str): The desired output file name to store the count results.
  • field_index (int): The index of the column to be counted.

Returns:

  • return_data (my_dictionary): A custom dictionary object containing the count results.

Usage:

return_data = countby_field(input_gtf, output_name, field_index)

countby_attribute

Count the number of occurrences of an attribute in a GTF file.

Parameters:

  • input_gtf (str): The file name of the input GTF file.
  • output_name (str): The desired output file name to store the count results.
  • countby_value (str): The attribute value to be counted.
  • skip (bool, optional): Flag to skip a certain number of lines in the input GTF file. Defaults to False.
  • num (int, optional): The number of lines to skip if skip is set to True. Defaults to 1.

Returns:

  • return_data (my_dictionary): A custom dictionary object containing the count results.

Usage:

return_data = countby_attribute(input_gtf, output_name, countby_value, skip=False, num=1)