Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explanation of GFA file format #71

Open
nhartwic opened this issue Jul 12, 2019 · 1 comment
Open

Explanation of GFA file format #71

nhartwic opened this issue Jul 12, 2019 · 1 comment

Comments

@nhartwic
Copy link

nhartwic commented Jul 12, 2019

I have two primary questions here...

  1. What information does the 'x' type line contain and where is it documented?
  2. What information does the SD tag, found in 'L' type lines contain? Example "SD:i:2198268"

This file contains information about your version of GFA but it doesn't actually bother to explain the x line in any kind of depth.

Just as an example for the type of answer I'm after, here is my current understanding of the x line...

x seg_name seg_len golden_path_count ? ? read_1:rstart-rend read_1_ori read_2:rstart-rend read_2_ori

Lingering questions here are...

  1. What do the 3 and 4 columns represent?
  2. Why are these two reads chosen to represent the untig?
@rchikhi
Copy link

rchikhi commented Sep 7, 2019

According to the manual:
An 'x' line gives a brief summary of each unitig, which can be inferred from S' and a' lines.

Regarding your lingering question 1, I'll attempt an answer, using the source code.
The 'x' line may be shorter if the unitig is circular (p->start == UINT32_MAX).
Otherwise, columns 3 and 4 likely indicate a number of in/out-going edges (asg_arc_n() doesn't have a comment but reading that part is helpful).
Read1 and read2 are chosen in this function, and they're actually called 'start' and 'end'. It seems to me that they're reads at start/end extremities of the unitig.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants