Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Phenopype: a phenotyping pipeline for Python #23

Closed
2 of 9 tasks
mluerig opened this issue Mar 26, 2020 · 18 comments
Closed
2 of 9 tasks

Phenopype: a phenotyping pipeline for Python #23

mluerig opened this issue Mar 26, 2020 · 18 comments

Comments

@mluerig
Copy link

mluerig commented Mar 26, 2020

Submitting Author: Name (@mluerig)
Package Name: Phenopype
One-Line Description of Package: a phenotyping pipeline for Python
Repository Link (if existing): https://github.com/mluerig/phenopype


Description

Phenopype is a high throughput phenotyping pipeline for Python that aims at supporting biologists in their efforts to extract high dimensional phenotypic data from digital images. Phenopype provides high level functions for image processing that can be stacked and executed sequentially to efficiently process single images or large data sets in a semi or fully automated fashion. Users can assemble their own function-stacks that can be customized and stored along with raw data for full reproducibility (check the high throughput workflow). Phenopype can be run from Python or from a Python Integrated Development Environment (IDE), like Spyder. Some Python knowledge is necessary, but most of the heavy lifting is done in the background. Phenopype can be installed from the Python Package Index (PYPI) using pip install phenopype.

Scope

  • Please indicate which category or categories this package falls under:

    • Data retrieval
    • Data extraction
    • Data munging
    • Data deposition
    • Data visualization
    • Reproducibility
    • Geospatial
    • Education
    • Unsure/Other (explain below)
  • Explain how the and why the package falls under these categories (briefly, 1-2 sentences). Please note any areas you are unsure of:

Phenopype is designed to extract phenotypic data (https://en.wikipedia.org/wiki/Phenotype) of plants, animals, and other organisms from images and videos.

  • Who is the target audience and what are scientific applications of this package?

Phenopype is intended for ecologists and evolutionary biologists that work with phenotypic data. Phenotypic data are an essential component of ecological and evolutionary research (https://www.nature.com/articles/nrg2897)

  • Are there other Python packages that accomplish the same thing? If so, how does yours differ?

Only low level computer vision packages like OpenCV or scikit-image are out there that require a lot of configuring and a relatively deep understanding of computer vision and Python in general. Phenopype offers high level functions so that users can focus on the relevant analytic parts of image analysis.

  • Any other questions or issues we should be aware of?:

Documentation (https://mluerig.github.io/phenopype/) is semi-complete. I am working on finishing up all docstrings and making the package PEP8 conform.

P.S. Have feedback/comments about our review process? Leave a comment here

EDIT: fixed typos

@lwasser
Copy link
Member

lwasser commented Mar 26, 2020

hi @mluerig ! welcome to pyopensci and thank you for your submission! I will get back to you with comments in the next week or so. Thank you for your patience!!

@lwasser
Copy link
Member

lwasser commented Apr 7, 2020

hey @mluerig can you provide a bit more explanation about the exact functionality that phenotype provides? i was a bit unclear after looking at the docs! many thanks!

@mluerig
Copy link
Author

mluerig commented Apr 7, 2020

I guess the documentation is still a bit confusing. In short phenopype aims at providing a comprehensive and easy-to-use high throughput image analysis workflow using classic computer vision (no machine learning - yet). It aims at ecologists and evolutionary biologists that want to quickly analyze images of organisms and extract phenotypic data.

The provided functions span image data set management (through projects), preprocessing (e.g. setting a common size and color reference and correct images), segmentation (e.g. thresholding and watershed), measurement (e.g. pixel-intensity or landmarks), as well as visualizing and exporting the produced results. The provided functions are designed to be intuitive and intend to minimize user interaction and manual work as much as possible. So, everything is streamlined towards getting solid results fast, even if you don't have a strong programming background.

The other idea is that all settings, intermediate image analysis steps, (e.g. the contour of detected objects) and raw data are available after the analysis. With human-readable configuration files scientists can generate "cookie-cutter" methods that can be reused, and shared. This will also allow reviewers to reproduce the obtained results with a single line of code, so nobody has to dig through complex scripts. This makes the collected data very reproducible, which is becoming more and more important.

Does that makes it a bit more clear? Please let me know where the documentation is unclear. Also, there is a written manuscript that provides more comprehensive information - let me know if you would like to see it.

@lwasser
Copy link
Member

lwasser commented Apr 7, 2020

thank you @mluerig !! i will get back to you. The challenge I am facing right now is just ensuring that we at pyopensci have reviewers who can effectively review packages that are on the analytics side of things! if you were to submit this, can you think of 2 people who would have the skills required to review? i also can ask around on twitter but to be transparent, we are trying to decide what packages we can support review of now and so that is something i consider when a package comes in to us. thank you so much for the speedy response! and again my apologies for such a slow response time.

@mluerig
Copy link
Author

mluerig commented Apr 7, 2020

yupp I have a few people in mind. should I contact them and then get back to you here, or how should we do this?

@lwasser
Copy link
Member

lwasser commented Apr 7, 2020

thank you @mluerig give me a little bit of time.
What i'm trying to sess out is how analytics focused this package is.
The steps will be

  1. you submit a full submission rather than a pre submission as this one is! i will ask you to do this if i can get a bit more info from our team.
  2. I will then ask you about suggesting reviewers and we will ping them here.

we have a meeting coming up on thursday if you have a chance to attend. i plan to bring up this and another presubmission package to ensure it's "in scope". Just so yo uknow my only concern has been analytics focused packages can be difficult to quality check given so few people have the expertise needed. However on the other hand, if we can find people with sufficient expertise, and trust in our reviewers similar to journals I think we should consider a broader range of packages . if you can hang tight until thursday, that would be great. you are also WELCOME to join us at 11am mountain time which i know might be late for you so i understand if that doesn't work!

@mluerig
Copy link
Author

mluerig commented Apr 7, 2020

okay no problem it can wait. I'm also happy to join the meeting if it helps sorting things out (my email is here)

@mluerig
Copy link
Author

mluerig commented Apr 9, 2020

just let me know how I can join, then I'll try to make it (I'll be in the mountains all day)

@lwasser
Copy link
Member

lwasser commented Apr 9, 2020

https://pyopensci.discourse.group/t/april-9-community-meeting/169 if you login to our discourse (you can use your github login!) you will have the meeting information. sorry this took me a bit of time to get to -- was struggling with how to avoid being "bombed" online in a meeting!!

@lwasser
Copy link
Member

lwasser commented Apr 9, 2020

great @mluerig please go ahead and submit an actual submission and we will get this in our review queu

@mluerig
Copy link
Author

mluerig commented Apr 18, 2020

@lwasser Phenopype is almost in ok shape to be reviewed. However, one module (video-analysis, which is an extension of the core image analysis kit) is still not working great and I would like to spend a few more weeks with it. It's an important, but code-wise, peripheral feature. Would you agree to a review of the program as it is now (linux builds passing, 70% coverage, all docstrings are there, tutorials and vignettes as well), and then I add the fully functional video-analysis module to a later re-submission?

@lwasser
Copy link
Member

lwasser commented Apr 20, 2020

@mluerig we prefer that you get the package in full working order PRIOR to submitting it for a review. So let's leave this presubmission open for the time being. Please ping me when all of the code is in a state that you think is acceptable for review. Thank you for checking in / asking about this!

@mluerig
Copy link
Author

mluerig commented Apr 20, 2020

ok I'll finish it up before then.

one more thing about CI: I recently started using travis CI with my program, and I discovered i) that travis doesn't support python builds for windows and macOS, and ii) testing some of the functions is tricky because they open up a GUI requiring user interaction.

I can't change i), and I tried to mimic as much user input for ii) as possible, but ultimately, I will have to run the tests locally to get fully coverage. is this acceptable?

@lwasser
Copy link
Member

lwasser commented Apr 20, 2020

hi @mluerig these are both good questions
re travis. there are a few options. I believe in our cookiecutter example we have an example implementation for appveyor which runs windows. circleci also has windows options now (i haven't tried it yet!). But there are several windows options available. we've been using appveyor for our packages.

Re the gui implementation for tests... i need to dig a little bit more into that. let me see what folks say on the discourse forum / twitter and i will get back to you. Ideally tests can be all run via ci. it seems like you could potentially implement some sort of monkey patching to mimic user input but i really don't know enough about this to make any suggestions! more to come on this.

@mluerig
Copy link
Author

mluerig commented Apr 20, 2020

@lwasser ah I didn't know about appveyor - I'll check it out

yeah I have been monkeypatching keyboard-input using the mock package, that works great. but clicking into an image is something else. I think I could get GUI functions working on CI by supplying some default coordinates and timers, but this would require a special testing interface for some of the functions, which I would implement as a last resort

@lwasser
Copy link
Member

lwasser commented Apr 20, 2020

@mluerig yes appveyor works pretty well!

i can totally get how clicking on an image would be difficult to recreate. let me do a bit of digging and i'll get back to you! specifically what is the user input providing? aoi regions of the image to analyze or training data or something to that effect?

@mluerig
Copy link
Author

mluerig commented Apr 20, 2020

yupp sometimes a mask needs to be selected to detect blobs within or a reference card measured to for automatic detection. it's not unimportant, but again, if we don't find anything I'll fix up an appropriate testing interface in the function (there is really only one complex class handling all the user input, so it's actually not so hard to do this)

@mluerig
Copy link
Author

mluerig commented May 4, 2020

closing - actual submission here: #24

@lwasser let me know if you would like to receive my suggestions for reviewers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

2 participants