Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C-level constructor breaks RMassBank #163

Closed
lgatto opened this issue Oct 7, 2016 · 49 comments
Closed

C-level constructor breaks RMassBank #163

lgatto opened this issue Oct 7, 2016 · 49 comments
Labels

Comments

@lgatto
Copy link
Owner

lgatto commented Oct 7, 2016

Emma reported the following error on the bioc-devel mailing list.

> new("RmbSpectrum2")
Error in initialize(value, ...) : 
  'initialize' method returned an object of classSpectrum2instead of
  the required classRmbSpectrum2

As discussed on the email thread, this seems to be the result of calling the C-level constructors in the Spectrum2 (and Spectrum1) initialize methods. As a consequence, will probably revert back to the old initialize,Spectrum method.

@jorainer
Copy link
Collaborator

jorainer commented Oct 8, 2016

First run of tests went OK:

library(MSnbase)
library(msdata)
mzf <- c(system.file("microtofq/MM14.mzML", package = "msdata"),
         system.file("microtofq/MM8.mzML", package = "msdata"))
f <- msdata::proteomics(full.names = TRUE, pattern = "TMT_Erwinia")

## readMSData2
for (i in 1:7500) {
    if (i %% 200 == 0)
        cat(i, "\n")
    res <- readMSData2(f, verbose = FALSE)
    sp <- spectra(res)
    sp <- res[[12]]
    res <- readMSData2(mzf, verbose = FALSE)
    sp <- spectra(res)
    sp <- res[[12]]
}

Next I'll check the readMSData.

@lgatto
Copy link
Owner Author

lgatto commented Oct 8, 2016

Would you mind using both files in

f <- msdata::proteomics(full.names = TRUE)

The new one has MS1, 2, and 3 spectra.

@jorainer
Copy link
Collaborator

jorainer commented Oct 8, 2016

No, not again!
With

library(MSnbase)
library(msdata)
f <- msdata::proteomics(full.names = TRUE, pattern = "TMT_Erwinia")
## readMSData
for (i in 1:10000) {
    if (i %% 200 == 0)
        cat(i, "\n")
    res <- readMSData(f)
    sp <- spectra(res)
    sp <- res[[12]]
}

I get agani randomly:

Reading 451 MS2 spectra from file TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01.mzML.gz
  |============================================================          |  86%Error in (function (x)  : attempt to apply non-function
  |======================================================================| 100%
Creating 'MSnExp' object

I'll try again readMSData2 on the new test file.

@jorainer
Copy link
Collaborator

jorainer commented Oct 8, 2016

OK, I found a workaround:
I've removed the contains = "Versioned" from Spectrum along with the prototype(new("Versioned")... from Spectrum, Spectrum1 and Spectrum2 as well and it runs without errors - strange indeed.

I also run the readMSData2 with the two test files you suggested and it run without errors.

@lgatto
Copy link
Owner Author

lgatto commented Oct 8, 2016

But we need the versions - see R/methods-updateObjectTo.R

@jorainer
Copy link
Collaborator

jorainer commented Oct 8, 2016

Agree, it's just strange that this solves these random errors

@lgatto
Copy link
Owner Author

lgatto commented Oct 9, 2016

Do you the Spectrum1/Spectrum2 classes in xcms3? If so, do you use them directly or do you create other classes that contain them (as in the example at the origin of this issue)?

I would be compelled to use the C constructor in MSnbase, because we can't really ship code that fails so often, and I doubt we will figure out what happens any time soon. I would then submit a patch to RMassBank that basically copies the relevant code from Spectrum2 into RmbSpectrum2, getting rid of the contains = "Spectrum2". This would be fine because, as far as I see, they don't rely on any methods.

The biggest problem is that the current code stops anyone to re-use these classes :-/.

@jorainer
Copy link
Collaborator

No, I don't use Spectrum1 classes yet in xcms3.

I also prefer the C-constructors, but I would really like to know what's going on there. I'll try some more stuff, e.g. removing the new calls in the prototype as Kasper suggested, but I'm afraid that that won't help.

@lgatto
Copy link
Owner Author

lgatto commented Oct 10, 2016

In an email, you mentioned positive results with moving new("Versioned, ...) out of the prototype into initialize (it would really be fantastic if that was the culprit). Could you push this, so that we can sort out RMassBank without patches.

@jorainer
Copy link
Collaborator

Yes I will, just want to run more tests on that.
A problem that comes along with this is that the C-constructors lack the version information as the prototype does no longer set the version. I'll have to add that to the C-constructor.

@lgatto
Copy link
Owner Author

lgatto commented Oct 10, 2016

I suggest we create a variable in .MSnbaseEnv where we document the class versions (MSnExpVersion, OnDiskMSnExpVersion, Spectrum1Version, ...) and then re-use these in the various initialize/constructors. Do you want me to do that, or is it just easier for you to do it?

@jorainer
Copy link
Collaborator

I would prefer you do that; you've got the better overview over all classes. In the meantime I'll try also a different approach; let's hope that that works as it would be much easier (no hacks in C).

@jorainer
Copy link
Collaborator

The alternative approach didn't work; I thought bypassing the new("Versioned") call in prototype with a direct assignment of new("Versions") to the .__classVersion__ slot would do it... unfortunately not.

So, it will be the C-hack. The tracking of class versions in .MSnbaseEnv would then really help there.

@lgatto
Copy link
Owner Author

lgatto commented Oct 10, 2016

So, it will be the C-hack. The tracking of class versions in .MSnbaseEnv would then really help there.

Versions in .MSnbaseEnv is not that different from versions hard-coded in the class definitions. We must just remember to change them.

It's not only a C-level hack. We also need to set them in the R-level constructors. May be it's worth updating the validity method of all classes to make sure the .__classVersion__ has a proper version (i.e. is not whatever a default/empty Versioned slot it). Do you want me to take care of this?

@jorainer
Copy link
Collaborator

Yes, that would be great. Can you then push the changes as soon as you have the .MSnbaseEnv in place? I'll grab then the version numbers from there and pass them down to C.

@lgatto
Copy link
Owner Author

lgatto commented Oct 10, 2016

Will have to pickup my F1 from school soon, but will get started tonight.

@lgatto
Copy link
Owner Author

lgatto commented Oct 10, 2016

Things do become a bit messy - defining the version in the initialize method when this one is long can be tricky. I have started to do so (see MSnProcess and MIAPE classes and new code at the top of environment.R for details, especially the latter to grep the class version numbers).

@lgatto
Copy link
Owner Author

lgatto commented Oct 11, 2016

I don't think we can check in the validity method for a version number, because the prototype and/or a call to any callNextMethod won't be valid. I might have to drop this check in the validity.

@jorainer
Copy link
Collaborator

It depends when the validity method is called, I guess after all initialize methods are called. In that case it should be OK, because we're setting the classVersion in the initialize methods.

@lgatto
Copy link
Owner Author

lgatto commented Oct 11, 2016

But, from R Programming for Bioinformatics (R Gentleman):

[...] validity is checked when the default initialize method is called; so if there is a user-defined initialize method that does not call callNextMethod, then validity will not be checked.

As in the example of NAnnotatedDataFrame, which depends on AnnotatedDataFrame, I have to callNextMethod, which in turn throws the error that my object-to-be doesn't have a class version.

@lgatto lgatto closed this as completed Oct 11, 2016
@lgatto lgatto reopened this Oct 11, 2016
@lgatto
Copy link
Owner Author

lgatto commented Oct 11, 2016

But, once set in the initialize methods, the class version will be present. I don't think it is a huge issue not to test it in the validity (or at least in some cases, where validity can't be met).

@jorainer
Copy link
Collaborator

Agree.

@lgatto
Copy link
Owner Author

lgatto commented Oct 11, 2016

There are now many unexpected side effects and errors that creep up since I did this last change. I am going to revert back, setting the version in the prototype (I don't want to break everything now), and proceed with setting the version in the initialize where is doesn't lead to a cascade of new issues.

@jorainer
Copy link
Collaborator

That's strange. I'll add it to the initialize of Spectrum1, Spectrum2 and Spectrum. There it worked nicely. I'm now adding some more unit tests to ensure that also the C-constructors return the correct versions and then I'll push the changes.

@lgatto
Copy link
Owner Author

lgatto commented Oct 11, 2016

I'm on it. Yes, spectra classes are the most important, and they don't rely on any other classes.

@lgatto
Copy link
Owner Author

lgatto commented Oct 11, 2016

Just pushed dc5299f

jorainer added a commit that referenced this issue Oct 11, 2016
o Add class versions to Spectrum1 and Spectrum2 objects created with the
  C-level constructors (issue #163).
o Remove new("Versioned") calls from prototypes of Spectrum, Spectrum1
  and Spectrum2.
o Add class version information in initialize methods of Spectrum,
  Spectrum1 and Spectrum2.
o Add unit tests ensuring correct class versions.
@jorainer
Copy link
Collaborator

I've pushed now the fixes for Spectrum1 and Spectrum2 setting class versions in the C-level constructors and in their initialize methods. Still, R CMD check produces lots of errors now, but they seem to be unrelated to my changes.
@lgatto are you on them?

I'm running the torture tests on the new code. I'll update once that's done.

@lgatto
Copy link
Owner Author

lgatto commented Oct 11, 2016

These errors are completely crazy - I get one complaining about [,MSnSet. What I will do is revert back all versions to the prototype except for Spectrum and that it from there.

@lgatto
Copy link
Owner Author

lgatto commented Oct 11, 2016

One error was in .versionToNum

> odmse[[1]]
Error in strsplit(z, split = ".", fixed = TRUE) : non-character argument
Calls: [[ ... tryCatch -> tryCatchList -> tryCatchOne -> <Anonymous>
Execution halted

Changing it to

.versionToNum <- function(z) z@.Data

seems to fix it. Moving on to others...

@lgatto
Copy link
Owner Author

lgatto commented Oct 11, 2016

Quite a few errors because of

> classVersion(aa[[1]])
Spectrum2  Spectrum 
  "0.3.0"   "0.4.0" 
> classVersion(msx[[1]])
 Spectrum Spectrum2 
  "0.4.0"   "0.2.0" 

@jorainer
Copy link
Collaborator

Ah, sorry. The .versionToNum was my fault. Yes, the ordering of the version; I thought calling classVersion(.Object) in the initialize before callNectMethod results always in the ordering Spectrum1, Spectrum or Spectrum2,Spectrum.

@lgatto
Copy link
Owner Author

lgatto commented Oct 11, 2016

I thought calling classVersion(.Object) in the initialize before callNextMethod results always in the ordering Spectrum1, Spectrum or Spectrum2, Spectrum.

I would have thought so too, but after some testing, it doesn't look like it. I now set both Spectrum and Spectrum[1|2] in initialize,Spectrum[1|2] in the right order.

@lgatto
Copy link
Owner Author

lgatto commented Oct 11, 2016

I think we not opened a whole new can of worms:

> classVersion("Spectrum1")
[1] "Versioned; no version string"

which breaks updateObject.

@jorainer
Copy link
Collaborator

I didn't knew that classVersion can be called on a class name, I always thought it is called on class instances:

> classVersion(new("Spectrum1"))
Spectrum1  Spectrum 
  "0.3.0"   "0.4.0" 

It would make more sense to me, although in the classVersion man page it is described differently - but we know that that man page also described using new("Versioned") in the prototype which apparently is not a good thing to do...

@lgatto
Copy link
Owner Author

lgatto commented Oct 11, 2016

The only way out I see now it to replace expect_identical by expect_equivalent in out using tests, to ignore attributes (i.e. class versions, but I don't know what else is ignored!)

Update - apparently, that's every slot!

@lgatto
Copy link
Owner Author

lgatto commented Oct 11, 2016

Actually, easiest is no to bump class version number - pathetic, but that's what I am going to do now :-(

@lgatto
Copy link
Owner Author

lgatto commented Oct 11, 2016

@jotsetung - there's a problem with C-level constructed class versions

> f <- msdata::proteomics(full.names = TRUE, pattern = "TMT_Erwinia_1")
> inmem2 <- readMSData(f, centroided = NA, verbose = FALSE)  ## That's the MS 2 data.
> ondisk2 <- readMSData2(f, msLevel = 2, verbose = FALSE)
> all.equal(inmem2[[1]], ondisk2[[1]])
[1] "Attributes: < Component “.__classVersion__”: Names: 2 string mismatches >"                     
[2] "Attributes: < Component “.__classVersion__”: Component 1: Modes: numeric, list >"              
[3] "Attributes: < Component “.__classVersion__”: Component 1: Lengths: 3, 1 >"                     
[4] "Attributes: < Component “.__classVersion__”: Component 1: target is numeric, current is list >"
[5] "Attributes: < Component “.__classVersion__”: Component 2: Modes: numeric, list >"              
[6] "Attributes: < Component “.__classVersion__”: Component 2: Lengths: 3, 1 >"                     
[7] "Attributes: < Component “.__classVersion__”: Component 2: target is numeric, current is list >"
> classVersion(inmem2[[1]])
 Spectrum Spectrum2 
  "0.4.0"   "0.2.0" 
> classVersion(ondisk2[[1]])
   Spectrum2     Spectrum 
"c(0, 2, 0)" "c(0, 4, 0)" 

Could you have a look?

@jorainer
Copy link
Collaborator

Fixed in cc9ee4e.

@lgatto
Copy link
Owner Author

lgatto commented Oct 11, 2016

Checking package now - hoping to be able to close soon. Where all your torture tests positive?

@jorainer
Copy link
Collaborator

Torture tests look good. they are at iteration 3000 for readMSData and readMSData2 and no problem thus far.

@lgatto
Copy link
Owner Author

lgatto commented Oct 11, 2016

I have added a inst/scripts/torture. R script based on your code above 79372e8

@jorainer
Copy link
Collaborator

well, I still get one error in the test_updateObjects.R; this must be related to the one you reported on the Bioc-devel I guess.

@lgatto
Copy link
Owner Author

lgatto commented Oct 11, 2016

Yes - I will comment that unit test for now and document this in a new issue, so that we can tackle it if/when we hear back for the Bioc experts.

@jorainer
Copy link
Collaborator

Just an update on the torture test:

  • No error in 10,000 iterations for the readMSData2.
  • No error in 10,000 iterations for the readMSData.

@lgatto
Copy link
Owner Author

lgatto commented Oct 12, 2016

And yet...
2016-10-12-212639_744x795_scrot
Today's (2016-10-12) check report

@jorainer
Copy link
Collaborator

Well, that calls for more torture tests... I'll run some more.

@lgatto
Copy link
Owner Author

lgatto commented Oct 13, 2016

At least, it seems that the errors are becoming much rarer. It might also be dependent on the environment - I can imaging the Bioc servers being heavily loaded during these checks.

@lgatto
Copy link
Owner Author

lgatto commented Oct 13, 2016

Have you seen this error report in unit tests:

I get an error that is coming from either xcms::xcmsSet or xcms::group, see
the bottom of the message for full error.

-------------------------------------------------------------------------------------------------------------
Error in signalCondition(e) :
  no function to return from, jumping to top level
Calls: <Anonymous> -> .handleSimpleError -> h -> signalCondition
-------------------------------------------------------------------------------------------------------------

Looks familiar! The package doesn't depend on MSnbase though, but on mzR.

@jorainer
Copy link
Collaborator

well, yes. mzR seems to be quite problematic; also because we had to call gc after closing the filehandles in mzR, otherwise we got these strange random errors. I spent a week in August trying to tackle the problem in mzR(most likely somewhere in the C-code) but couldn't find anything. Eventually we'll find the problem someday...

lgatto pushed a commit that referenced this issue Oct 27, 2016
* master: (24 commits)
  temporarily disable updateObject test - see issue #166
  torture test script
  Fix class version problem for Spectrum1 and Spectrum2
  new lines cleanup
  set version in initialise for Spectum only
  Use getClassVersion function to get class versions.
  Add class version to C-constructors
  set Spectrum class in initialize
  revert to previous NAnnotatedDataFrame
  fix new error quantifying msnexp with empty phenodata
  set class version in NAnnotatedDataFrame initialize
  fix iPQF unit test to take into account new class versions
  fix MSnProcess and MIAPE initialize to set class version earlier
  set class version in MIAPE initialize
  new rcpp attribs
  class version helper functions
  MSnProcess has class set in initialize and validity checks for version
  Initiating removal or new() class version construction inside prototype.
  Tix removeReporters,OnDiskMSnExp
  Reverting to old initialize,Spectrum (see issue #163)
  ...

From: Laurent <[email protected]>

git-svn-id: https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/MSnbase@122190 bc3139a8-67e5-0310-9ffc-ced21a209358
lgatto pushed a commit that referenced this issue Apr 9, 2017
* master: (24 commits)
  temporarily disable updateObject test - see issue #166
  torture test script
  Fix class version problem for Spectrum1 and Spectrum2
  new lines cleanup
  set version in initialise for Spectum only
  Use getClassVersion function to get class versions.
  Add class version to C-constructors
  set Spectrum class in initialize
  revert to previous NAnnotatedDataFrame
  fix new error quantifying msnexp with empty phenodata
  set class version in NAnnotatedDataFrame initialize
  fix iPQF unit test to take into account new class versions
  fix MSnProcess and MIAPE initialize to set class version earlier
  set class version in MIAPE initialize
  new rcpp attribs
  class version helper functions
  MSnProcess has class set in initialize and validity checks for version
  Initiating removal or new() class version construction inside prototype.
  Tix removeReporters,OnDiskMSnExp
  Reverting to old initialize,Spectrum (see issue #163)
  ...

From: Laurent <[email protected]>

git-svn-id: https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/MSnbase@122190 bc3139a8-67e5-0310-9ffc-ced21a209358
lgatto pushed a commit that referenced this issue Sep 7, 2017
* master: (24 commits)
  temporarily disable updateObject test - see issue #166
  torture test script
  Fix class version problem for Spectrum1 and Spectrum2
  new lines cleanup
  set version in initialise for Spectum only
  Use getClassVersion function to get class versions.
  Add class version to C-constructors
  set Spectrum class in initialize
  revert to previous NAnnotatedDataFrame
  fix new error quantifying msnexp with empty phenodata
  set class version in NAnnotatedDataFrame initialize
  fix iPQF unit test to take into account new class versions
  fix MSnProcess and MIAPE initialize to set class version earlier
  set class version in MIAPE initialize
  new rcpp attribs
  class version helper functions
  MSnProcess has class set in initialize and validity checks for version
  Initiating removal or new() class version construction inside prototype.
  Tix removeReporters,OnDiskMSnExp
  Reverting to old initialize,Spectrum (see issue #163)
  ...

From: Laurent <[email protected]>

git-svn-id: file:///home/git/hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/MSnbase@122190 bc3139a8-67e5-0310-9ffc-ced21a209358
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants