Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite filter-paf-deletions #54

Merged
merged 10 commits into from
Apr 12, 2022
Merged

Rewrite filter-paf-deletions #54

merged 10 commits into from
Apr 12, 2022

Conversation

glennhickey
Copy link
Collaborator

This tool is supposed to do a coarse-grained chaining on the minigraph mapping output, with respect to the reference contigs in the graph. But it was somehow written to assume there was only one reference contig in the input, even though it gets run at genome scale in cactus.

Anyway, this is a rewrite to make the code simpler and more general:

  • mappings are sorted by query position.
  • target sequences are mapped to reference intervals
  • runs of query contigs are segmented into blocks that have contiguous (modulo given threshold) target reference intervals
  • disjoint blocks are greedily selected based on having least aligned bases and dropped
  • process is repeated until nothing disjoint found

The big question is what threshold to use. 10mb seems to work reasonably well. In general, increasing this trades off recall for precision.

@glennhickey glennhickey merged commit 81784b5 into master Apr 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant