Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add chromPeakSummary function and a fix #772

Open
wants to merge 24 commits into
base: devel
Choose a base branch
from
Open

Add chromPeakSummary function and a fix #772

wants to merge 24 commits into from

Conversation

jorainer
Copy link
Collaborator

This PR adds the chromPeakSummary() method and a first implementation to calculate the peak shape quality from @wkumler on chrom peak results (thanks to @pablovgd for contributing).

In addition, it fixes a bug in the calculation of the beta scores during gap filling.

jorainer and others added 15 commits September 16, 2024 13:49
- Report also a chromatogram's m/z values if `findChromPeaks` is run on a
  `Chromatogram` or `Chromatograms` object (issue #765).
- Peak shape quality (similarity to gaussian shape) calculation is now performed
  using the EIC representation of a chromatographic peak, i.e. with intensities
  of mass peaks for the same retention time (but different m/z) summed.
Add internal function to calculate beta metrics
added chromPeakSummary method
Added peak quality section to xcms vignette.
Copy link
Collaborator

@philouail philouail left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great ! Also thanks for the small vignette ! The code looks good to me, thank @pablovgd, excited to incorporate it in the overall workflow and see how to integrate it with the other preprocessing steps to improve them.

PS: I think the GHA needs to be updated, i they are failing because of warnings.

R/AllGenerics.R Outdated Show resolved Hide resolved
R/AllGenerics.R Outdated Show resolved Hide resolved
R/AllGenerics.R Outdated Show resolved Hide resolved
R/AllGenerics.R Outdated Show resolved Hide resolved
R/AllGenerics.R Outdated Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks for implementing this! I haven't checked it for bugs but the logic looks sound.

R/methods-XChromatogram.R Outdated Show resolved Hide resolved
vignettes/xcms.Rmd Outdated Show resolved Hide resolved
@jorainer
Copy link
Collaborator Author

jorainer commented Oct 1, 2024

Thanks @wkumler for your review! I have now addressed all your suggestions.

Copy link
Owner

@sneumann sneumann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, thanks for the PR !
What I couldn't attach to a specific line, would it make sense to add the paper to https://github.com/sneumann/xcms/blob/devel/inst/CITATION ?
Yours, Steffen

#' @param skews A numeric vector of the skews to try, corresponding to the
#' shape1 of dbeta with a shape2 of 5. Values less than 5 will be increasingly
#' right-skewed, while values greater than 5 will be left-skewed.
#' shape1 of dbeta with a shape2 of 5. Values less than 5 will be
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @wkumler , I needed to read that a few times. So a value of 5 means symmetric ? Why is it not zero centered, so that abs() tells you the skewedness (regardless in what direction) and you can <0 or >0 if you only want the direction ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point, thank you for the review. I'm using the language inherent to the dbeta() function where the values of 2-5 correspond to the "positive parameters" alpha and beta (or, in R, shape1 and shape2). I'm absolutely open to rescaling these and I agree that a positive/negative skew would be more intuitive. If we're open to rescaling and we have a parameter to pass now in chromPeakSummary then we also may want to allow a wider variety of numbers - the values of 5 matched my intuition for peak shape corners but other folks may want more/less tail and more/less skew.

@@ -279,6 +279,7 @@ dropGenericProcessHistory <- function(x, fun) {
valsPerSpect = valsPerSpect, rtrange = rtr,
mzrange = mzr)
if (length(mtx)) {
## mtx: time, mz, intensity
if (any(!is.na(mtx[, 3]))) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I usually recommend avoiding hardcoded column numbers, imagine someone came up with (time, ccs, mz, intensity). Might need that mtx is created with named columns somewhere above. Is that the only occurance of a hardcoded column number in the PR ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree - generally. Here, we load the mtx matrix with a function that is supposed to return a 3 column matrix with the columns exactly in the expected order. Thus, IMHO for this case (as long as there is no user input involved) it's save to use access-by-index. Also, because here we're calling this in a loop thousands of times, the way how we subset could affect the performance.

I did some timings on that:

mtx <- cbind(time = 1:5, mz = 1:5, intensity = c(2342.2, 123.1, 231.1, 23.1, 123.23))
int_col <- 3L

library(microbenchmark)
> microbenchmark(mtx[, 3L], mtx[, "intensity"], mtx[, int_col])
Unit: nanoseconds
               expr min    lq   mean median    uq  max neval cld
          mtx[, 3L] 354 363.5 437.30  380.5 477.5  948   100   a
 mtx[, "intensity"] 389 404.0 543.07  412.0 474.0 5036   100   a
     mtx[, int_col] 381 399.5 469.45  407.0 484.0 1135   100   a

((rtr[2] - rtr[1]) /
max(1, (sum(rtim >= rtr[1] & rtim <= rtr[2]) - 1)))
maxi <- which.max(mtx[, 3])
maxi <- which.max(mtx[, 3L])
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, above was not the only hardcoded 3 :-) and more below ...

@jorainer
Copy link
Collaborator Author

jorainer commented Oct 8, 2024

Regarding the CITATIONS - I added the reference to the respective man page. I think adding that as a CITATION for xcms itself might be bit too much, as the paper deals more with peak shape quality, independently of xcms?

@jorainer
Copy link
Collaborator Author

jorainer commented Oct 9, 2024

For now I only bumped the version and fixed the NEWS.md, but I did not address the other points above - can we maybe discuss them again @sneumann ?

@jorainer jorainer requested a review from sneumann October 9, 2024 07:43
@jorainer
Copy link
Collaborator Author

I'll make the PR a draft and fix the conflicts. After that I'll re-open.

@jorainer jorainer marked this pull request as draft December 16, 2024 07:21
- Add `c()` method to combine `XcmsExperiment` objects.
- Add a method to coerce from `XCMSnExp` to `XcmsExperiment` objects.
- Fix references in documentation.
@jorainer jorainer marked this pull request as ready for review December 17, 2024 06:34
@jorainer
Copy link
Collaborator Author

I've now fixed the conflicts and in addition also added the c() method to combine multiple XcmsExperiment objects into one as well as method to coerce from XCMSnExp to XcmsExperiment objects. Would be ready to review @sneumann :)

@jorainer
Copy link
Collaborator Author

@sneumann , just a little reminder that this PR is still open and we need to address it :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants