-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JOSS review - Software paper #15
Comments
Updated branch: joss-fixes-0.2.4.9000 (and main) I've changed the wording to be more explicit that To address your query about state of the field, I've added a new paragraph into the summary section to describe how I see |
Branch: joss-fixes-0.2.4.9000 The emphasis on dtrackr as a utility helps me better understand it's positioning. This and simply signaling awareness of other provenance-related work addresses my main concern. A few comments:
A little wordy, but let me know if you have any questions. |
Thanks again,
I'm waiting for the second reviewer's comments. Once I get them I'll take
another stab at this, in light of your additional suggestions. I'm trying
not to go to far down the rabbit hole as I suspect it's pretty deep.
…On Mon, 10 Oct 2022, 15:30 Craig Willis, ***@***.***> wrote:
The emphasis on dtrackr as a utility helps me better understand it's
positioning. This and simply signaling awareness of other
provenance-related work addresses my main concern.
A few comments:
- I noticed a possible typo in the first sentence ("dtrackr if first
and foremost" -> should be "dtrackr is..." ?)
- Pimentel et al. is not related to C2Metadata. As a broad survey, I
guess it's an example of "other provenance research". My main point in
sharing the reference was to highlight that there's related work on
computational provenance tools -- and even some existing tools that work
with R (e.g., RDataTracker, YesWorkflow, recordr) -- that could be used to
help frame dtrackr. In section 2.1, the authors discuss a few
classifications: prospective (program/experiment structure) v.
retrospective (what actually happened) provenance. In 2.1.4 they discuss
execution provenance approaches including passive monitoring, overriding,
and instrumentation. In this light, I'd classify dtrackr is a retrospective
provenance tool that relies on overriding -- although this probably isn't
important for the JOSS paper. I do think the Pimentel reference is useful
if only to signal awareness of related work in computational provenance
research, but not in the context of C2Metadata.
- C2Metadata is developing a language-independent representation of
data transformations via SDTL, but relies on static parsing of scripts
(prospective provenance), including R. It doesn't document a "pipeline" in
the same sense as dtrackr. In hindsight, since it's a small/nascent
project, and may not something to highlight in the JOSS paper unless it's
of interest to you.
A little wordy, but let me know if you have any questions.
—
Reply to this email directly, view it on GitHub
<#15 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AD6SWICOHAKTCAM3QSMRN2DWCQSA7ANCNFSM6AAAAAAQTHT3D4>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Makes perfect sense. I don't think you need to go down the hole, but maybe just note that it's there. |
I have updated the paper (new version of paper at https://github.com/terminological/dtrackr/actions/runs/3397242636). I got the essence of your comments in there I think without going too far into the details. In this iteration dtrackr is primarily a pragmatic tool, and mainly focussed on the goal of producing a flowchart, but there are lots of ways it could evolve in the future. This discussion, the pimental paper and an emerging need to be able to explain how a particular column in a dataset I'm working on has been derived has prompted me to do some prototyping of a column level tracking feature which I'll look to bring into a new release. |
N.b. please close this issue if you are happy with the updated paper. |
The changes look good to me. |
Review issue: openjournals/joss-reviews#4707
Below is my feedback on the "Software paper" section of the JOSS review checklist:
Summary
I question whether
dtrackr
can be described as a wrapper around thetidyverse
collection of packages and think it may better be described as wrapper arounddplyr
. There may be other opportunities to support data provenance in tidyverse that cannot be done by instrumenting dplyr functions (e.g., read files).State of the field
dtrackr
is positioned as a provenance tool yet the paper includes no references to existing work in the area of computational provenance, transparency, and reproducibility. While I believedtrackr
functionality to be unique, the tool is not positioned relative to other packages in the R community or beyond. I'd suggest looking at the following articles and considering howdtrackr
relates to other efforts.The absence of comparisons to other provenance tools to some degree raises a question about the scholarly aspects of this work (i.e., is this just a utility or intended to contribute to broader work in computational provenance, transparency, reproducibility). I can imagine how a widely-used provenance-aware dyplr might fit into this bigger picture.
The text was updated successfully, but these errors were encountered: