-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pylint alerts corrections as part of an intervention experiment #207
Comments
Thanks for being interested in improving our code. We aren't in priniciple against this experiment, but there is a pretty big issue. Our tests are not in good shape now and are likely broken. As such I'm not sure if you will be able to do this. I'm happy to do some mentoring to help you get the tests in shape, though. Let me know if you are still interested and we can see how we will go from there. |
Thank you for the suggestion @blakesweeney The plan includes 86 intervention in 76 files, which is indeed quite large. There are 13 sturcture alerts ( 'too-many-return-statements', 'too-many-branches', 'too-many-statements', 'too-many-nested-blocks'). What do you think about going by risk level? |
I think trying the small changes first is a fine idea. I will say we generally format with black which has different line length cutoffs the pylint (I think). But feel free to open some issues with the small cases and we can review them. Thanks! |
As we discussed, I created first a PR with the minor simple alerts: I'd like to consult you regarding a few alerts, not in the PR. On the other hand, in the file rnacentral_pipeline\databases\generic\v1.py the function gene_info is just pass. As for broad-exception-caught, when one catches Exception it might hide unexpected exceptions (e.g., due to future changes). rnacentral_pipeline\databases\europepmc\stream.py rnacentral_pipeline\databases\ensembl\vertebrates\parser.py bin\litscan-retracted-articles.py rnacentral_pipeline\databases\crw\helpers.py rnacentral_pipeline\databases\rfam\parser.py |
Hi,
I will try to review it this week, but please ping me if I don't get back to
you by end of the day Friday. Below are some comments on your questions.
On Mon, Nov 18, 2024 at 11:42:11AM +0000, evidencebp wrote:
I'd like to consult you regarding a few alerts, not in the PR.
Many files (e.g., rnacentral_pipeline\cli\ribocentre.py, rnacentral_pipeline\cli\search_export.py) have unnecessary-pass alerts.
They use the click.group decorator so it is obviously intended.
Out of curiosity, what is done there?
cli/ribocentre.py is the definition of the CLI commands for parsing ribocentre
data. It is imported in cli/__init__.py. Each group of CLI commands live in
their own module and are imported into the cli/__init__.py. The search_export
file is similar, but for commands dealing with producing our search export. I
would guess that is some sort of false positive for the warning.
On the other hand, in the file rnacentral_pipeline\databases\generic\v1.py the function gene_info is just pass.
I never see it called and there are only references to a variable in this name in the project.
Can it be removed?
This is likely code that I meant to implement, eventually. I should probably
add a TODO note or something for it.
As for broad-exception-caught, when one catches Exception it might hide unexpected exceptions (e.g., due to future changes).
Therefore, it is recommended to catch more specific exceptions.
I list below a few cases in which I could not figure out which exception might be raised.
@blakesweeney , can you consult me?
rnacentral_pipeline\databases\europepmc\stream.py
The function fallback calls fetch.lookup (in line 41).
This could be a few exceptions I think:
- UnknownReference (https://github.com/RNAcentral/rnacentral-import-pipeline/blob/f3dce69fc19e7dfab0bee092c0a05ee772793339/rnacentral_pipeline/databases/europepmc/fetch.py#L68)
- TooManyPublications (https://github.com/RNAcentral/rnacentral-import-pipeline/blob/f3dce69fc19e7dfab0bee092c0a05ee772793339/rnacentral_pipeline/databases/europepmc/fetch.py#L89)
And likely any of the looku pones
rnacentral_pipeline\databases\ensembl\vertebrates\parser.py
The function as_entry calls embl.sequence in the try section (line 54).
The exception here would be some biopython issue, most likely.
bin\litscan-retracted-articles.py
The main function catches in line 80(Exception, psycopg2.DatabaseError)
I think ValueError would be the most likely
rnacentral_pipeline\databases\crw\helpers.py
The function as_entry constructs an Entry object (line 79).
rnacentral_pipeline\databases\rfam\parser.py
Only helpers.sequence is called there. It seems that the possible exception is AttributeError in sequence_id.
Is it so?
Can Exception be replaced with it?
Generally these broad exceptions are for two reasons. I'm lazy and don't
specify exactly what could go wrong in python and because I want the behavior
of the section of code to be 'keep going if anything goes wrong'. In the second
case catching Exception is easier.
Hope that helps.
Kind Regards,
Blake Sweeney
…---
Blake Sweeney, PhD
RNA Resources Project Leader
European Bioinformatics Institute (EMBL-EBI)
Phone: +44 (0)1223 494359 Email: ***@***.***
http://rnacentral.org/
http://rfam.org/
|
I modified rnacentral_pipeline\databases\europepmc\stream.py I also modifed bin\litscan-retracted-articles.py I did not understand the guidance regarding the next too. Can you clarify? rnacentral_pipeline\databases\crw\helpers.py rnacentral_pipeline\databases\ensembl\vertebrates\parser.py |
Hi,
On Wed, Nov 20, 2024 at 05:01:03AM +0000, evidencebp wrote:
I modified rnacentral_pipeline\databases\europepmc\stream.py
Please note that I could not find where TooManyPublications is defined (it isreaised in fetch).
It likely bubbles up from some other function that is called.
I also modifed bin\litscan-retracted-articles.py
I did not understand the guidance regarding the next too. Can you clarify?
rnacentral_pipeline\databases\crw\helpers.py
The function as_entry constructs an Entry object (line 79).
rnacentral_pipeline\databases\ensembl\vertebrates\parser.py
The function as_entry calls embl.sequence in the try section (line 54).
I didn't give any specific exceptions that could happen here, just a general
description of my decision making. I don't have time to work though each case
and figure out all exceptions that can occur. In particular the embl.sequence
one can cause something from Biopython when a EMBL file is formatted oddly.
Generally, I leave these as Exception to more easily deal with these generic
cases.
Kind Regards,
Blake Sweeney
…---
Blake Sweeney, PhD
RNA Resources Project Leader
European Bioinformatics Institute (EMBL-EBI)
Phone: +44 (0)1223 494359 Email: ***@***.***
http://rnacentral.org/
http://rfam.org/
|
Sure, so I will not modify them. |
I'd like to conduct a software engineering experiment regarding the benefit of Pylint alerts removal.
The experiment is described here.
In the experiments, Pylint is used with some specific alerts, files are selected for intervention and control.
After the interventions are done, one can wait and examine the results.
I'm asking for your approval for conducting an intervention in your repository.
See examples of interventions in stanford-oval/storm, gabfl/vault, and coreruleset/coreruleset.
You can see the planed interventions
May I do the interventions?
The text was updated successfully, but these errors were encountered: