Fixing annotation merge in overlapping chunks #15

KatharinaHoff · 2023-04-05T11:58:59Z

Originally raised by @MarioStanke :

ich hatte das damals so wie unten programmiert mit dem Gedanken, dass es in der Naehe der beiden
Sequenzenden, die den Ueberlapp saeumen, haeufiger Fehler geben kann. Gene am Rand ihres Runs
koennen in dem anderen Run laenger vorhergesagt werden. Die groessere "Verlaengerung" wird dabei
gewaehlt.

In Pygustus ist es unsymmetrisch und ich glaube, es wird in der Reihenfolge der Runs alles genommen,
was nicht ueberlappt:
https://github.com/Gaius-Augustus/pygustus/blob/445ab1691d594984e440d9437822633bb3737d8a/pygustus/gff_methods.py#

Das koennte bei den letzten Genen eines Chunks insbesondere bei den Wierbeltieren ein verkuerzt
vorhergesagtes Gen sein.

Die die kritischen Zeile

  if gene.sequence == last_gene.sequence and int(gene.start) < int(last_gene.end):

durch eine symmetrische Bedingung mit Genlaengen und Vollstaendigkeit zu ersetzen und ggf. das
letzte Gen durch das naechste ersetzen.

------------------------------------- join_aug_pred.pl -------------------------
...

 #   print "rightmost end  of gene in run1 : " . $lastend1 . "\n";
 #   print "leftmost begin of gene in run2 : " . $firstbegin2 . "\n";
    if ($lastend1 < $firstbegin2) {
	return; # no overlap between any genes
    }
    # the two runs really have overlapping genes
    # find a good breakposition and then remove all genes 
    # in run1 ending to the right of the breakposition and all genes
    # in run2 beginning at or to the left of breakposition.
    #                                                     lastgene1
    # run1  ------              -------       ------      ---------
    #          -------                                      ---
    # run2                               -----------      ------------------------        ----------
-         -----
    #                                    firstgene2                                                
---------------
    #                                    |-d2-|                   |-     d1     -|
    #                                                    |breakpoint
    my $breakpoint;
    my $d1=-1; 
    my $d2=-1;
    
    foreach my $gene2 (@{$run2->{"genes"}}) {
	if ($gene2->{"begin"} <= $lastend1 && $gene2->{"end"} - $lastend1 > $d1) {
	    $d1 = $gene2->{"end"} - $lastend1;
	}
    }
#    print "d1=$d1\n";
    foreach my $gene1 (@{$run1->{"genes"}}) {
	if ($gene1->{"end"} >= $firstbegin2 && $firstbegin2 - $gene1->{"begin"} > $d2) {
	    $d2 = $firstbegin2 - $gene1->{"begin"};
	}
    }
#    print "d2=$d2\n";
    if ($d1 >= $d2) {
	# set the breakpoint directly left to the leftmost begin of any gene in run2 that
	# ends at or after lastend1.
	$breakpoint = $lastend1;
	foreach my $gene2 (@{$run2->{"genes"}}) {
	    if ($gene2->{"end"} >= $lastend1 && $gene2->{"begin"}-1 < $breakpoint) {
		$breakpoint = $gene2->{"begin"}-1;
	    }
	}
    } else {
	# set the breakpoint to the rightmost end of any gene in run1 that
	# begins at or before firstbegin2.
	$breakpoint = $firstbegin2;
	foreach my $gene1 (@{$run1->{"genes"}}) {
	    if ($gene1->{"begin"} <= $firstbegin2 && $gene1->{"end"} > $breakpoint) {
		$breakpoint = $gene1->{"end"};
	    }
	}
    }
#    print "breakpoint = $breakpoint\n";
    #now delete the redundant genes from both lists
    my @newgenes1 = ();
    my @newgenes2 = ();
    foreach my $gene1 (@{$run1->{"genes"}}) {
	if ($gene1->{"end"} <= $breakpoint) {
	    push @newgenes1, $gene1;
	}
    }
    $run1->{"genes"} = \@newgenes1;
    foreach my $gene2 (@{$run2->{"genes"}}) {
	if ($gene2->{"begin"} > $breakpoint) {
	    push @newgenes2, $gene2;
	}
    }
    $run2->{"genes"} = \@newgenes2;

The text was updated successfully, but these errors were encountered:

KatharinaHoff added the enhancement New feature or request label Apr 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixing annotation merge in overlapping chunks #15

Fixing annotation merge in overlapping chunks #15

KatharinaHoff commented Apr 5, 2023

Fixing annotation merge in overlapping chunks #15

Fixing annotation merge in overlapping chunks #15

Comments

KatharinaHoff commented Apr 5, 2023