-
Notifications
You must be signed in to change notification settings - Fork 594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding command line exclusion lists, so that users can prune fields from output. #5226
Conversation
…. Added some developer utilities for ease of testing and creating new Funcotations.
Codecov Report
@@ Coverage Diff @@
## master #5226 +/- ##
===============================================
+ Coverage 86.798% 86.804% +0.007%
- Complexity 29711 29763 +52
===============================================
Files 1820 1820
Lines 137406 137644 +238
Branches 15160 15177 +17
===============================================
+ Hits 119265 119481 +216
- Misses 12637 12651 +14
- Partials 5504 5512 +8
|
/** | ||
* List of final rendered fields to exclude from a final rendering | ||
*/ | ||
public static final String EXCLUSION_FIELDS_LONG_NAME = "exclusion-list"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure exclusion-list
is a clear name. Maybe something like "--skip-field" would be clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't really bother me, but I think exclude-field
is a little more self-explanatory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like --exclude-field
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor requests and questions.
/** | ||
* List of final rendered fields to exclude from a final rendering | ||
*/ | ||
public static final String EXCLUSION_FIELDS_LONG_NAME = "exclusion-list"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't really bother me, but I think exclude-field
is a little more self-explanatory.
* @return string with the VCF representation of a {@link VcfOutputRenderer#FUNCOTATOR_VCF_FIELD_NAME} for the given | ||
* Funcotation. Never {@code null}, but empty string is possible. | ||
*/ | ||
public static String renderSanitizedFuncotationForVcf(final Funcotation funcotation, final List<String> includedFields) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should there be another method here to for MAF? (i.e. renderSanitizedFuncotationForMaf
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is an excellent question. I originally had renderSanitizedFuncotationForMaf
, but realized that I did not need it.
For VCFs, we render each funcotation separately, then concatenate strings. This approach has the drawback of being less flexible in terms of ordering the fields (and setting up aliases), but that has not mattered yet. Regardless, it means that all string operations (e.g. excluding fields and sanitizing values) must be in the same method (in this case renderSanitizedFuncotationForVcf
) and that method must work on a funcotation.
For MAFs, we flatten out the funcotations and put the fields into a giant map. Then we do the changes to field names and values on that map. But by the time I want to exclude fields and sanitize, the map is already made, so we do not render individual funcotations. Therefore no need for a renderSantiziedFuncotationForMaf
.
If you would like to refactor to make the conventions similar, we can file an issue.
No action.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. If anything, it's more flexible to make the VCF renderer behave more like the MAF renderer.
I'd need to review it to see how feasible that is.
I've added issue #5240 to track this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Glad you filed the issue. I would prefer the VCF renderer behave like the MAF renderer. Then we could probably share a lot more of the code.
@@ -254,7 +259,7 @@ public MafOutputRenderer(final Path outputFilePath, | |||
defaultMap.put(MafOutputRendererConstants.FieldName_BAM_File, MafOutputRendererConstants.UNUSED_STRING); | |||
|
|||
// Cache the manual annotation string so we can pass it easily into any Funcotations: | |||
manualAnnotationSerializedString = (manualAnnotations.size() != 0 ? MafOutputRendererConstants.FIELD_DELIMITER + String.join( MafOutputRendererConstants.FIELD_DELIMITER, manualAnnotations.values() ) + MafOutputRendererConstants.FIELD_DELIMITER : ""); | |||
//manualAnnotationSerializedString = (manualAnnotations.size() != 0 ? MafOutputRendererConstants.FIELD_DELIMITER + String.join( MafOutputRendererConstants.FIELD_DELIMITER, manualAnnotations.values() ) + MafOutputRendererConstants.FIELD_DELIMITER : ""); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove commented out code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -327,12 +334,14 @@ public void write(final VariantContext variant, final FuncotationMap txToFuncota | |||
writeString(entry.getValue()); | |||
writeString(MafOutputRendererConstants.FIELD_DELIMITER); | |||
} | |||
writeLine(manualAnnotationSerializedString); | |||
// writeLine(manualAnnotationSerializedString); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove commented out code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -101,4 +101,11 @@ | |||
doc = "(Advanced / DO NOT USE*) If you select this flag, Funcotator will force a conversion of variant contig names from b37 to hg19. *This option is useful in integration tests (written by devs) only." | |||
) | |||
public boolean forceB37ToHg19ContigNameConversion = false; | |||
|
|||
@Argument( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add to this explanation that the fields will be excluded by exact name matching only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
public Object[][] provideForRenderSanitizedFuncotationForVcf() { | ||
|
||
return new Object[][]{ | ||
{OutputRenderer.createFuncotationFromLinkedHashMap( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you label the test cases with comments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. Done
* Create a dummy gencode funcotation in the most manual way possible. This method does not check for consistency | ||
* or reasonable values. | ||
* | ||
* @param hugoSymbol |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's kind of a pain, but can you fill in some descriptions for these fields?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Back to you, @LeeTL1220 |
@jonn-smith Back to you |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Feel free to merge once tests complete / pass.
* @return string with the VCF representation of a {@link VcfOutputRenderer#FUNCOTATOR_VCF_FIELD_NAME} for the given | ||
* Funcotation. Never {@code null}, but empty string is possible. | ||
*/ | ||
public static String renderSanitizedFuncotationForVcf(final Funcotation funcotation, final List<String> includedFields) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. If anything, it's more flexible to make the VCF renderer behave more like the MAF renderer.
I'd need to review it to see how feasible that is.
I've added issue #5240 to track this.
--exclusion-list
parameter, which can be specified multiple times. This will remove some rendered fields from the output. The field should appear exactly as written in the output MAF or VCF (e.g. ClinVar_ALLELEID). Closes Funcotator - Need to have a funcotation output list #4359.