-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/31243/leeds crc refactor #76
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've requested a small performance-related change. Otherwise, this looks OK to me, though I don't really understand enough about the logic to easily follow the refactoring.
@genes_hash = YAML.safe_load(File.open(Rails.root.join(GENES_FILEPATH))) | ||
@status_hash = YAML.safe_load(File.open(Rails.root.join(STATUS_FILEPATH))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these would be better populated in the #initialize
method (i.e. once per batch), instead of reloading YAML file for every record. Maybe .freeze
the result, if you want to be sure the lookup isn't mutated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @bshand , done in latest commit
genotype = record.raw_fields['genotype'].downcase.strip | ||
mtype = record.raw_fields['moleculartestingtype'].downcase.strip | ||
if genotype.match(/pms2\s-\smlpa\spred\snegative\sc5/i) || | ||
genotype.match(/pms2\s-\sconf\smlpa\spositive\sc5/i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't see this in the rules? Can you send me the updated ones please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have mailed you the rules , and following -
<style> </style>Raw:genotype | Test Scope |
---|---|
PMS2 - MLPA pred negative C5 | Targeted |
PMS2 - Conf MLPA positive C5 | Targeted |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! This makes sense now
raw_report = report | ||
exclude_statements = [ | ||
'Screening for mutations in MLH1, MSH2 and MSH6 is now in progress as requested.', | ||
'MLPA and MSH2 analysis was not requested.', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be good to add these to the constants file instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in latest commit .
@@ -336,7 +340,7 @@ def add_variant_impact(impact) | |||
end | |||
|
|||
def add_variant_class(variant) | |||
if variant.is_a?(Integer) && variant >= 1 && variant <= 5 | |||
if variant.is_a?(Integer) && variant >= 1 && variant <= 7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this is an update to the assigning variant class in the genotype.rb file, have you checked that it won't affect any of the other importers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its extending the range that's allowed to be in variantpathclass and adheres to Zvariantpathclass. If any other importer is able to extract c4/5 pathogenicity values and map upto 7 in future that should be fine.
end | ||
end | ||
|
||
def allocate_genes(_record) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
record is not being used by the function- can this be removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in latest commit
Can you also add more comments please? |
Added now in latest commit at places and left for methods where name seems self explanatory, thanks @lauramccluskey1 ! |
populate_and_persist_genotype(record) | ||
end | ||
|
||
# populate class variables that can be used over different methods |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these are instance variables, may be worth changing the comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ollietulloch , done in latest commit .
end | ||
|
||
# checks if file has colorectal tests and should be processed under this importer | ||
def should_process |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this only returns true of false, could this be a predicate method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ollietulloch , done in latest commit .
def process_variants_from_record(genocolorectal, record) | ||
genotypes = [] | ||
allocate_genes | ||
find_test_status(record) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A small thing but naming methods consistently is really helpful regarding readability. allocate_genes
and find_test_status
do similar things, i.e. setting an instance variable, so perheps calling it allocate_test_status
might be more consistent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ollietulloch , done in latest commit .
@teststatus | ||
end | ||
|
||
def exceptinal_teststatus |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be exceptional_teststatus
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ollietulloch , done in latest commit .
@teststatus | ||
end | ||
|
||
def exceptinal_teststatus |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you update to "exceptional" please?
else | ||
@logger.debug 'Cannot determine test status for : '\ | ||
"#{record.raw_fields['report']} a priori" | ||
def process_normal_targ(genocolorectal, _record, genotypes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
record is not being used by this function- can it be removed?
genotypes << genocolorectal | ||
end | ||
|
||
def process_abnormal_targ(genocolorectal, _record, genotypes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
record is not being used by this function- can it be removed?
end | ||
end | ||
|
||
def process_normal_fs(genocolorectal, _record, genotypes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
record is not being used by this function- can it be removed?
|
||
EXONIC_REPORT_REGEX = /(?<report>(#{GENES})\sexon(s)?[\w\s\-.>():=,&]+)/ix | ||
|
||
PATHOGENIC_REPORT_REGEX = /(?<report>pathogenic\s(#{GENES})[\w\s\-.>():=,&]+)/ix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you have "." as one of the options in the square brackets, do you need the rest of the options if "." includes all of the other characters you have listed?
ex(on)?s?\s?(?<exons>[0-9]+(-[0-9]+)?)\s? | ||
(?<variant>del|dup|ins)| | ||
ex(on)?s?\s?(?<exons>[0-9]+\s?(\s?-\s?[0-9]+)?)\s? | ||
(?<variant>del|dup|ins)?| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are lines 60-61 similar to 58-59 but with a few extra optional values- could this be simplified?
cdna_vars = get_cdna_mutations(report_variants) | ||
if cdna_vars.size > 1 | ||
proteins = report_variants.scan(PROTEIN_REGEX).flatten.uniq | ||
cdna_vars.zip(proteins).each do |cdna_mutation, protein| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this using an assumption that the the order of the cdna will match the order of the protein in the sentence? What happens there isn't a protein listed for every cdna? Would that mean they get matched incorrectly?
genotypes << genocolorectal_dup | ||
end | ||
|
||
def extract_targeted_genes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check if this method is needed- as discussed on call
What?
I have refactored Leeds CRC importer with latest rules provided by Fiona , as per Planio ticket 35286
Why?
Whole Leeds CRC had to be re-written again as older code wasn't getting the right denominators and counts.
How?
Have re-written Leeds CRC importer as per rules sheet provided by Fiona.
Testing?
QA signed off by Fiona after sharing the counts. Also added handler tests for each kind of case.
Anything Else?
It still has rubocop errors for existing code in parent classes like genocolorectal.rb, which can't be edited in this scope of work as being used across all importers.