Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/31243/leeds crc refactor #76

Merged
merged 11 commits into from
Jun 6, 2024

Conversation

shilpigoeldev
Copy link
Contributor

@shilpigoeldev shilpigoeldev commented May 15, 2024

What?

I have refactored Leeds CRC importer with latest rules provided by Fiona , as per Planio ticket 35286

Why?

Whole Leeds CRC had to be re-written again as older code wasn't getting the right denominators and counts.

How?

Have re-written Leeds CRC importer as per rules sheet provided by Fiona.

Testing?

QA signed off by Fiona after sharing the counts. Also added handler tests for each kind of case.

Anything Else?

It still has rubocop errors for existing code in parent classes like genocolorectal.rb, which can't be edited in this scope of work as being used across all importers.

@shilpigoeldev shilpigoeldev self-assigned this May 15, 2024
Copy link
Contributor

@bshand bshand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've requested a small performance-related change. Otherwise, this looks OK to me, though I don't really understand enough about the logic to easily follow the refactoring.

Comment on lines 25 to 26
@genes_hash = YAML.safe_load(File.open(Rails.root.join(GENES_FILEPATH)))
@status_hash = YAML.safe_load(File.open(Rails.root.join(STATUS_FILEPATH)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these would be better populated in the #initialize method (i.e. once per batch), instead of reloading YAML file for every record. Maybe .freeze the result, if you want to be sure the lookup isn't mutated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @bshand , done in latest commit

genotype = record.raw_fields['genotype'].downcase.strip
mtype = record.raw_fields['moleculartestingtype'].downcase.strip
if genotype.match(/pms2\s-\smlpa\spred\snegative\sc5/i) ||
genotype.match(/pms2\s-\sconf\smlpa\spositive\sc5/i)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't see this in the rules? Can you send me the updated ones please?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have mailed you the rules , and following -

<style> </style>
Raw:genotype Test Scope
PMS2 - MLPA pred negative C5 Targeted
PMS2 - Conf MLPA positive C5 Targeted

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! This makes sense now

raw_report = report
exclude_statements = [
'Screening for mutations in MLH1, MSH2 and MSH6 is now in progress as requested.',
'MLPA and MSH2 analysis was not requested.',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be good to add these to the constants file instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in latest commit .

@@ -336,7 +340,7 @@ def add_variant_impact(impact)
end

def add_variant_class(variant)
if variant.is_a?(Integer) && variant >= 1 && variant <= 5
if variant.is_a?(Integer) && variant >= 1 && variant <= 7
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this is an update to the assigning variant class in the genotype.rb file, have you checked that it won't affect any of the other importers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its extending the range that's allowed to be in variantpathclass and adheres to Zvariantpathclass. If any other importer is able to extract c4/5 pathogenicity values and map upto 7 in future that should be fine.

end
end

def allocate_genes(_record)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

record is not being used by the function- can this be removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in latest commit

@lauramccluskey1
Copy link
Collaborator

Can you also add more comments please?

@shilpigoeldev
Copy link
Contributor Author

Can you also add more comments please?

Added now in latest commit at places and left for methods where name seems self explanatory, thanks @lauramccluskey1 !

populate_and_persist_genotype(record)
end

# populate class variables that can be used over different methods
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these are instance variables, may be worth changing the comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ollietulloch , done in latest commit .

end

# checks if file has colorectal tests and should be processed under this importer
def should_process
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this only returns true of false, could this be a predicate method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ollietulloch , done in latest commit .

def process_variants_from_record(genocolorectal, record)
genotypes = []
allocate_genes
find_test_status(record)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A small thing but naming methods consistently is really helpful regarding readability. allocate_genes and find_test_status do similar things, i.e. setting an instance variable, so perheps calling it allocate_test_status might be more consistent?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ollietulloch , done in latest commit .

@teststatus
end

def exceptinal_teststatus
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be exceptional_teststatus?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ollietulloch , done in latest commit .

@shilpigoeldev shilpigoeldev merged commit 8c03df2 into develop Jun 6, 2024
9 checks passed
@shilpigoeldev shilpigoeldev deleted the feature/31243/Leeds_CRC_refactor branch June 6, 2024 08:30
@teststatus
end

def exceptinal_teststatus
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you update to "exceptional" please?

else
@logger.debug 'Cannot determine test status for : '\
"#{record.raw_fields['report']} a priori"
def process_normal_targ(genocolorectal, _record, genotypes)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

record is not being used by this function- can it be removed?

genotypes << genocolorectal
end

def process_abnormal_targ(genocolorectal, _record, genotypes)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

record is not being used by this function- can it be removed?

end
end

def process_normal_fs(genocolorectal, _record, genotypes)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

record is not being used by this function- can it be removed?


EXONIC_REPORT_REGEX = /(?<report>(#{GENES})\sexon(s)?[\w\s\-.>():=,&]+)/ix

PATHOGENIC_REPORT_REGEX = /(?<report>pathogenic\s(#{GENES})[\w\s\-.>():=,&]+)/ix
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have "." as one of the options in the square brackets, do you need the rest of the options if "." includes all of the other characters you have listed?

ex(on)?s?\s?(?<exons>[0-9]+(-[0-9]+)?)\s?
(?<variant>del|dup|ins)|
ex(on)?s?\s?(?<exons>[0-9]+\s?(\s?-\s?[0-9]+)?)\s?
(?<variant>del|dup|ins)?|
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are lines 60-61 similar to 58-59 but with a few extra optional values- could this be simplified?

cdna_vars = get_cdna_mutations(report_variants)
if cdna_vars.size > 1
proteins = report_variants.scan(PROTEIN_REGEX).flatten.uniq
cdna_vars.zip(proteins).each do |cdna_mutation, protein|
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this using an assumption that the the order of the cdna will match the order of the protein in the sentence? What happens there isn't a protein listed for every cdna? Would that mean they get matched incorrectly?

genotypes << genocolorectal_dup
end

def extract_targeted_genes
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check if this method is needed- as discussed on call

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants