-
Notifications
You must be signed in to change notification settings - Fork 594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Port of RevertSam tool into Spark #5395
Merged
+1,635
−95
Merged
Changes from all commits
Commits
Show all changes
16 commits
Select commit
Hold shift + click to select a range
6f98aa4
Added RevertSamSpark, a replacement for the RevertSam tool that allow…
jamesemery 1fd3898
responding to the first round of comments
jamesemery e732739
responding to another round of comments with nothing more than rowdy …
jamesemery 2b696b7
fixing a warning
jamesemery 2e48600
responding to final round of comments
jamesemery a8d8338
fixing the compiler warnings
jamesemery bd0c6f5
cleaning up a mistaken change
jamesemery 2b78ecd
silly !, save it for snake
jamesemery f55d6b3
solving the worlds problems
de998df
getting rid of that grossness
jamesemery 0c00b23
readding the files i deleted by mistake
7238f2f
moving files because directories are hard
01b2bc8
responded to yet another round of comments, when, pray tell, will thi…
jamesemery 3c64144
Merge branch 'je_portRevertSam' of github.com:broadinstitute/gatk int…
f5b7062
fixed a spurious override
jamesemery 2730a47
No, that really did want to check for emptieness
jamesemery File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
721 changes: 721 additions & 0 deletions
721
src/main/java/org/broadinstitute/hellbender/tools/spark/RevertSamSpark.java
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -2,8 +2,10 @@ | |||||
|
||||||
import htsjdk.tribble.AsciiFeatureCodec; | ||||||
import htsjdk.tribble.readers.LineIterator; | ||||||
import org.broadinstitute.hellbender.exceptions.GATKException; | ||||||
import org.broadinstitute.hellbender.exceptions.UserException; | ||||||
import org.broadinstitute.hellbender.utils.SimpleInterval; | ||||||
import org.broadinstitute.hellbender.utils.Utils; | ||||||
|
||||||
import java.util.ArrayList; | ||||||
import java.util.Arrays; | ||||||
|
@@ -17,6 +19,8 @@ | |||||
* <ul> | ||||||
* <li>Header: must begin with line HEADER or track (for IGV), followed by any number of column names, | ||||||
* separated by whitespace.</li> | ||||||
* <li>Header: Custom header delimiters can be provided, with a null header line being interpreted as having a non-delimeted | ||||||
* header which consists of one line.</li> | ||||||
* <li>Comment lines starting with # are ignored</li> | ||||||
* <li>Each non-header and non-comment line is split into parts by whitespace, | ||||||
* and these parts are assigned as a map to their corresponding column name in the header. | ||||||
|
@@ -28,30 +32,62 @@ | |||||
* | ||||||
* </p> | ||||||
* | ||||||
* <h2>File format example</h2> | ||||||
* <h2>File format example 1</h2> | ||||||
* <pre> | ||||||
* HEADER a b c | ||||||
* 1:1 1 2 3 | ||||||
* 1:2 4 5 6 | ||||||
* 1:3 7 8 9 | ||||||
* </pre> | ||||||
* | ||||||
* <h2>File format example 2</h2> | ||||||
* <pre> | ||||||
* a b c | ||||||
* 1:1 1 2 3 | ||||||
* 1:2 4 5 6 | ||||||
* 1:3 7 8 9 | ||||||
* </pre> | ||||||
*/ | ||||||
public final class TableCodec extends AsciiFeatureCodec<TableFeature> { | ||||||
protected static final String HEADER_DELIMITER = "HEADER"; | ||||||
protected static final String DEFAULT_HEADER_DELIMITER = "HEADER"; | ||||||
protected static final String IGV_HEADER_DELIMITER = "track"; | ||||||
protected static final String COMMENT_DELIMITER = "#"; | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Shouldn't this also change?
Suggested change
|
||||||
|
||||||
private final String headerDelimiter; | ||||||
|
||||||
protected String delimiter_regex = "\\s+"; | ||||||
|
||||||
protected List<String> header = new ArrayList<>(); | ||||||
|
||||||
public TableCodec() { | ||||||
private boolean havePassedHeader = false; | ||||||
|
||||||
/** | ||||||
* Create a TableCodec with a configured header line delimiter | ||||||
* | ||||||
* @param headerLineDelimiter the delimeter for comment header lines, or null if the header is a single commented line- | ||||||
*/ | ||||||
public TableCodec(final String headerLineDelimiter) { | ||||||
super(TableFeature.class); | ||||||
if ( "".equals(headerLineDelimiter) ) { | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Isn't this just |
||||||
// Note, it is valid for headerLineDelimiter to be null, just not empty as the regex breaks in that case. | ||||||
throw new GATKException("HeaderLineDelimiter must either be a valid delimiter or null"); | ||||||
} | ||||||
headerDelimiter = headerLineDelimiter; | ||||||
} | ||||||
|
||||||
/** | ||||||
* Create a TableCodec for IGV track data. | ||||||
*/ | ||||||
public TableCodec() { | ||||||
this(DEFAULT_HEADER_DELIMITER); | ||||||
} | ||||||
|
||||||
@Override | ||||||
public TableFeature decode(final String line) { | ||||||
if (line.startsWith(HEADER_DELIMITER) || line.startsWith(COMMENT_DELIMITER) || line.startsWith(IGV_HEADER_DELIMITER)) { | ||||||
if ((headerDelimiter != null && line.startsWith(headerDelimiter)) || | ||||||
(headerDelimiter == null && !havePassedHeader) || | ||||||
line.startsWith(COMMENT_DELIMITER) || line.startsWith(IGV_HEADER_DELIMITER)) { | ||||||
havePassedHeader = true; | ||||||
return null; | ||||||
} | ||||||
final String[] split = line.split(delimiter_regex); | ||||||
|
@@ -66,11 +102,11 @@ public List<String> readActualHeader(final LineIterator reader) { | |||||
boolean isFirst = true; | ||||||
while (reader.hasNext()) { | ||||||
final String line = reader.peek(); // Peek to avoid reading non-header data | ||||||
if ( isFirst && ! line.startsWith(HEADER_DELIMITER) && ! line.startsWith(COMMENT_DELIMITER)) { | ||||||
if ( isFirst && ! line.startsWith(COMMENT_DELIMITER) && headerDelimiter != null && ! line.startsWith(headerDelimiter) ) { | ||||||
throw new UserException.MalformedFile("TableCodec file does not have a header"); | ||||||
} | ||||||
isFirst &= line.startsWith(COMMENT_DELIMITER); | ||||||
if (line.startsWith(HEADER_DELIMITER)) { | ||||||
if (headerDelimiter == null || line.startsWith(headerDelimiter)) { | ||||||
reader.next(); // "Commit" the peek | ||||||
if (!header.isEmpty()) { | ||||||
throw new UserException.MalformedFile("Input table file seems to have two header lines. The second is = " + line); | ||||||
|
5 changes: 1 addition & 4 deletions
5
src/main/java/org/broadinstitute/hellbender/utils/read/GATKRead.java
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there you exposed some potential error states in this class when you allowed arbitrary header and comment lines. An alternative to making these changes would be to use some other table reader like TableReader.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really wanted to avoid the table reader like the plague. It is absurdly heavy duty in its scope to handle short tsvs like this. It seems wholly unnecessary to build a feature out of a simple tsv file that could be just as easily accomplished with a scanner.