Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing annotation merge in overlapping chunks #15

Open
KatharinaHoff opened this issue Apr 5, 2023 · 0 comments
Open

Fixing annotation merge in overlapping chunks #15

KatharinaHoff opened this issue Apr 5, 2023 · 0 comments
Labels
enhancement New feature or request

Comments

@KatharinaHoff
Copy link
Member

Originally raised by @MarioStanke :

ich hatte das damals so wie unten programmiert mit dem Gedanken, dass es in der Naehe der beiden
Sequenzenden, die den Ueberlapp saeumen, haeufiger Fehler geben kann. Gene am Rand ihres Runs
koennen in dem anderen Run laenger vorhergesagt werden. Die groessere "Verlaengerung" wird dabei
gewaehlt.

In Pygustus ist es unsymmetrisch und ich glaube, es wird in der Reihenfolge der Runs alles genommen,
was nicht ueberlappt:
https://github.com/Gaius-Augustus/pygustus/blob/445ab1691d594984e440d9437822633bb3737d8a/pygustus/gff_methods.py#

Das koennte bei den letzten Genen eines Chunks insbesondere bei den Wierbeltieren ein verkuerzt
vorhergesagtes Gen sein.

Die die kritischen Zeile

  if gene.sequence == last_gene.sequence and int(gene.start) < int(last_gene.end):

durch eine symmetrische Bedingung mit Genlaengen und Vollstaendigkeit zu ersetzen und ggf. das
letzte Gen durch das naechste ersetzen.

------------------------------------- join_aug_pred.pl -------------------------
...

 #   print "rightmost end  of gene in run1 : " . $lastend1 . "\n";
 #   print "leftmost begin of gene in run2 : " . $firstbegin2 . "\n";
    if ($lastend1 < $firstbegin2) {
	return; # no overlap between any genes
    }
    # the two runs really have overlapping genes
    # find a good breakposition and then remove all genes 
    # in run1 ending to the right of the breakposition and all genes
    # in run2 beginning at or to the left of breakposition.
    #                                                     lastgene1
    # run1  ------              -------       ------      ---------
    #          -------                                      ---
    # run2                               -----------      ------------------------        ----------
-         -----
    #                                    firstgene2                                                
---------------
    #                                    |-d2-|                   |-     d1     -|
    #                                                    |breakpoint
    my $breakpoint;
    my $d1=-1; 
    my $d2=-1;
    
    foreach my $gene2 (@{$run2->{"genes"}}) {
	if ($gene2->{"begin"} <= $lastend1 && $gene2->{"end"} - $lastend1 > $d1) {
	    $d1 = $gene2->{"end"} - $lastend1;
	}
    }
#    print "d1=$d1\n";
    foreach my $gene1 (@{$run1->{"genes"}}) {
	if ($gene1->{"end"} >= $firstbegin2 && $firstbegin2 - $gene1->{"begin"} > $d2) {
	    $d2 = $firstbegin2 - $gene1->{"begin"};
	}
    }
#    print "d2=$d2\n";
    if ($d1 >= $d2) {
	# set the breakpoint directly left to the leftmost begin of any gene in run2 that
	# ends at or after lastend1.
	$breakpoint = $lastend1;
	foreach my $gene2 (@{$run2->{"genes"}}) {
	    if ($gene2->{"end"} >= $lastend1 && $gene2->{"begin"}-1 < $breakpoint) {
		$breakpoint = $gene2->{"begin"}-1;
	    }
	}
    } else {
	# set the breakpoint to the rightmost end of any gene in run1 that
	# begins at or before firstbegin2.
	$breakpoint = $firstbegin2;
	foreach my $gene1 (@{$run1->{"genes"}}) {
	    if ($gene1->{"begin"} <= $firstbegin2 && $gene1->{"end"} > $breakpoint) {
		$breakpoint = $gene1->{"end"};
	    }
	}
    }
#    print "breakpoint = $breakpoint\n";
    #now delete the redundant genes from both lists
    my @newgenes1 = ();
    my @newgenes2 = ();
    foreach my $gene1 (@{$run1->{"genes"}}) {
	if ($gene1->{"end"} <= $breakpoint) {
	    push @newgenes1, $gene1;
	}
    }
    $run1->{"genes"} = \@newgenes1;
    foreach my $gene2 (@{$run2->{"genes"}}) {
	if ($gene2->{"begin"} > $breakpoint) {
	    push @newgenes2, $gene2;
	}
    }
    $run2->{"genes"} = \@newgenes2;
@KatharinaHoff KatharinaHoff added the enhancement New feature or request label Apr 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant