-
-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REVIEW]: A Framework to Quality Control Oceanographic Data #2063
Comments
Hello human, I'm @whedon, a robot that can help you with some common editorial tasks. @jessicaaustin, @evanleeturner it looks like you're currently assigned to review this paper 🎉. ⭐ Important ⭐ If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/openjournals/joss-reviews) repository. As a reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all reviews 😿 To fix this do the following two things:
For a list of things I can do to help you, just type:
For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:
|
|
Hi everyone. It's three weeks since the start of the review. Would you have any update for me? Thanks. |
I'll just do a specific ping to reviewers @jessicaaustin and @evanleeturner. When do you think you'll be able to work on your reviews? Thanks. |
I am still working on this review, thank you. |
Thanks for the reminder. I have read all the instructions and made some progress on my review before getting pulled into other tasks. I will pick it up again this week and submit by the end of the week. |
Conflict of interestI do not believe this is a conflict of interest as outlined in the JOSS guidelines, but I do want to state that I am a maintainer of a similar open-source QC library, https://github.com/ioos/ioos_qc . Overall thoughts:I agree 100% with the author's philosophy that QC procedures should be organized and distributed as code packages in order to widely and consistently used. I also agree with the goal of creating a framework that can run tests from multiple different QC test schemes (GTSPP, ARGO, QARTOD, etc) since none of those schemes alone will completely satisfy testing requirements in all situations. I think the integration with oceansdb to run Climatology tests and Location at Sea tests is particularly exciting. I hope publication in JOSS will bring this library to the attention of more groups and encourage further use and collaboration to flesh this out. The library takes in data as a numpy array, so in theory the data could come from any source or format. However you must create a "data object" to wrap your data. Given that there are other mature and well-used libraries with data representations (xarray, pandas, netcdf, etc), why not leverage one of those? Or at least make it easy to use CoTeDe if your data is already in one of those common formats. I created a notebook to try to understand how to run QC tests using the library ( castelao/CoTeDe#42), and I was able to fairly easily load a netcdf file using xarray and run tests, but I got an error when using a pandas dataframe and it was not clear how to debug it. I think a little documentation could go a long way here, to help people get started. ReviewItems marked BLOCKER below should be resolved before acceptance. Everything else is a recommendation. Paper:
Documentation:
Installation
Contributing
Performance
|
@jessicaaustin , thank you for your review. I'll respond to it ASAP. |
Conflict of interestI do not have a conflict of interest in this review. I do maintain internal (non-public) code repository @wdft-coastal that does contain some of the similar functions as presented in the current code but are specific for my dataset and organization. Overall thoughts:I am extremely pleased with the premise of this package for organizing QC procedures and being able to run multiple tests. This is a high priority item for my organization and my current data sets, as our own QAQC methods often can conflict with other agencies that follow different QAQC procedures. In the last few months I have been preparing code that can mimic QARTOD style tests on our own data internally, but I wanted the ability to apply alternative test schemes and to skip tests that are not appropriate for our data. Having a maintained package that accomplishes this task is not only helpful but it also greatly increases your data confidence as now the QC methods are maintained by other groups in a transparent mechanism. I fully expect to be implementing this package into our own website and data products at TWDB. I share reviewer one’s sentiment about the philosophy of leveraging other data libraries such as netCDF or pandas. Although our own internal datasets are SQL we do make use of pandas where possible for data manipulation and in the future we will likely be forced to move to netCDF to process very large datasets for QAQC. This functionality is not a show-stopper for the package, but it is a severe limitation for its acceptance in the community. ReviewItems marked BLOCKER below should be resolved before acceptance. Everything else is a recommendation. Paper
Documentation:
Installation
Contributing
Performance
|
@evanleeturner, thanks for the review. I already started to work on it. Since you mentioned a potential interest in using yourself, you might appreciate how can you customize CoTeDe's configuration. In summary, you can create a setup of tests to apply and save it to use later. I do that myself a lot, so different regions or sensors have their own sequence of procedures to apply. |
@whedon generate pdf |
Hi @evanleeturner , you might be able to edit the top post, and if so, could you click on the review items that you agree with, please? That would help me to write my response. Thanks! |
Response to @jessicaaustin
Thanks! As a side note, the package OceansDB was originally together with CoTeDe, as a single package (since the first generation, ~ 2006), but as CoTeDe grew larger and I realized that some people could use OceansDB outside CoTeDe’s scope, I moved it outside as an independent package (~ 2015).
Thanks for sharing your opinion on using xarray or pandas. It’s now clear for me that I need to clarify this in the documentation. This will be addressed at https://cotede.readthedocs.io/en/dev/data_model.html In summary, I believe that the data model adopted by CoTeDe does not limit the users, but the other way around: it gives more freedom. The package xarray, which was initially based on the netCDF structure, satisfies the data model expected by CoTeDe, thus most of CoTeDe’s functionalities can be used without any change, see the example. Since xarray can read directly netCDF and pandas objects can be converted into xarray objects, all those inputs can be used by CoTeDe, plus any other format that satisfies CoTeDe’s minimalist data model. CoTeDe has been used with SQL databases and MatLab based pipelines, just two examples where less requirements makes a difference.
I see your point and I will consider that. The brief comparison that you suggested above could be a good start, but what should be said about the similarities between the two packages? For instance, what should we say about QC configuration, which seems like ioss_qc followed a quite similar approach as the one proposed by CoTeDe. Would you agree? On a side note, since this is a highly recommended suggestion, I was expecting that ioos_qc would adopt the same policy, but I could not find any reference to CoTeDe in the ioos_qc documentation.
Thanks, that’s a good idea. However, I believe that the choice to be listed belongs to the users. I’ll create a designated space for that on CoTeDe’s website and encourage the users to add themselves to the list.
At some point I removed that from the documentation, but I’ll place it back.
I have expanded the documentation (https://cotede.readthedocs.io/) to include the API reference for the public resources, i.e. everything that a new user would need to know to use CoTeDe. I’m also expanding the notebooks with practical examples to better illustrate how to use the package. I intend to extend the API reference with internal resources that might be useful for developers (https://cotede.readthedocs.io/en/latest/api.html).
Thanks for the suggestion. I understand the benefits of source-forge and I’ll definitely add CoTeDe into that in the future, but it is not the priority at the moment. I noticed that ioos_qc uses pip on its notebooks, which is yet another example of how PIP is convenient and functional.
I removed the recommendation on flake8. I do not want to turn back any potential contribution just due to formatting. I’ve been cleaning, refactoring, and formatting the code and I will continue on doing so.
Thank you, that is a very good idea. I've always assumed that by using Python it was implied that speed was not a priority, but there is nothing to lose on having an idea about the relative cost among the different tests. I’ll implement that on the first chance. Note that one of the consistency tests is to guarantee that the ProfileQC object is serializable, so that it can be transported by multiprocessing. On large scale databases I have been running CoTeDe in parallel, a functionality that I would like to move to the public repository on the first chance. A note, when I read "performance" for a QC procedure the first thing that comes to my mind is not speed but the skill of identifying bad measurements without mistakes. For that, I’ve been adopting consistency tests since the early stages of CoTeDe. |
Response to @evanleeturner
Thanks, I’m glad to hear that we have the same vision on shared public QC procedures and the importance of giving freedom to the operator to decide which tests to apply, including the tunning of each test. Please, don’t hesitate to open issues with requests or discussions that could help you to implement CoTeDe in your project. My vision for the QC configuration was on cases exactly like yours. I’ll improve the documentation on that subject.
Thanks for sharing your opinion on using netCDF or pandas. It’s now clear for me that I need to clarify this in the documentation. This will be addressed at https://cotede.readthedocs.io/en/dev/data_model.html
Issue #43 raises a very good point. We will certainly find derived variables (salinity, density, sound speed, etc.) estimated with different standards (especially on historical data), which may affect inter-comparisons. That is indeed important but is beyond CoTeDe’s goal, which is intended to evaluate the quality of a dataset. Currently, the only place that I use GSW is to estimate sea water density when density is not directly available to evaluate inversions in the water column. For that purpose, I argue that we should use the best and most recent technique. I do see your point that in some situations it might be useful, if not necessary, to employ the old relations for a correct comparison, but I think that this should be done outside CoTeDe, by the operator, and then used as input for CoTeDe.
At some point, I removed that from the documentation, but I’ll put it back.
I have expanded the documentation (https://cotede.readthedocs.io/) to include the API reference for the public resources, i.e. everything that a new user would need to know to use CoTeDe. I’m also expanding the notebooks with practical examples to better illustrate how to use the package. I intend to extend the API reference with internal resources, which might be useful for developers (https://cotede.readthedocs.io/en/dev/api.html).
Thanks for raising that point. I’m adding a section in the documentation about the performance. (https://cotede.readthedocs.io/en/dev/performance.html). |
@kthyng , could you confirm the next step, please? Is it correct that the reviewers need to checkmark each box in the very first post before we move to the next step? Thanks! |
👋 Hey @castelao... Letting you know, |
So I can't seem to edit the first post. When I click on the ... my only options are "copy link" and "quote reply" |
@whedon re-invite @evanleeturner as reviewer |
The reviewer already has a pending invite. @evanleeturner please accept the invite by clicking this link: https://github.com/openjournals/joss-reviews/invitations |
I just did. It said the invitation has expired. |
@whedon add 10.5281/zenodo.583710 as archive |
I'm sorry human, I don't understand that. You can see what commands I support by typing:
|
@whedon set 10.5281/zenodo.583710 as archive |
OK. 10.5281/zenodo.583710 is the archive. |
@castelao Can you edit the metadata at your Zenodo archive so that the title and authors on your paper exactly match there? Also, what is the proper version number for your repository currently? |
@kthyng I just updated the DOI record itself, thanks for noticing that. The current version is 0.21.3 |
@whedon set 10.5281/zenodo.3733959 as archive |
OK. 10.5281/zenodo.3733959 is the archive. |
Looks like the zenodo archive changed numbers too, so I updated that. Let me know if this is incorrect |
@whedon set v0.21.3 as version |
OK. v0.21.3 is the version. |
@whedon accept |
|
|
👋 @openjournals/joss-eics, this paper is ready to be accepted and published. Check final proof 👉 openjournals/joss-papers#1410 If the paper PDF and Crossref deposit XML look good in openjournals/joss-papers#1410, then you can now move forward with accepting the submission by compiling again with the flag
|
@whedon accept deposit=true |
|
🐦🐦🐦 👉 Tweet for this paper 👈 🐦🐦🐦 |
🚨🚨🚨 THIS IS NOT A DRILL, YOU HAVE JUST ACCEPTED A PAPER INTO JOSS! 🚨🚨🚨 Here's what you must now do:
Any issues? notify your editorial technical team... |
Congratulations to @castelao on your new publication! Many thanks to reviewers @jessicaaustin and @evanleeturner — we have relied on your time and expertise! |
🎉🎉🎉 Congratulations on your paper acceptance! 🎉🎉🎉 If you would like to include a link to your paper from your README use the following code snippets:
This is how it will look in your documentation: We need your help! Journal of Open Source Software is a community-run journal and relies upon volunteer effort. If you'd like to support us please consider doing either one (or both) of the the following:
|
Submitting author: @castelao (Guilherme Castelao)
Repository: https://github.com/castelao/CoTeDe
Version: v0.21.3
Editor: @kthyng
Reviewer: @jessicaaustin, @evanleeturner
Archive: 10.5281/zenodo.3733959
Status
Status badge code:
Reviewers and authors:
Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)
Reviewer instructions & questions
@jessicaaustin & @evanleeturner, please carry out your review in this issue by updating the checklist below. If you cannot edit the checklist please:
The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @kthyng know.
✨ Please try and complete your review in the next two weeks ✨
Review checklist for @jessicaaustin
Conflict of interest
Code of Conduct
General checks
Functionality
Documentation
Software paper
Review checklist for @evanleeturner
Conflict of interest
Code of Conduct
General checks
Functionality
Documentation
Software paper
The text was updated successfully, but these errors were encountered: