Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

random test errors (SSL) ... squeak-4.5 #303

Closed
dalehenrich opened this issue Oct 18, 2014 · 42 comments
Closed

random test errors (SSL) ... squeak-4.5 #303

dalehenrich opened this issue Oct 18, 2014 · 42 comments

Comments

@dalehenrich
Copy link
Member

MetacelloScriptingIssuesTestCase>>testIssue234a has failed again:

testIssue234a: Error: SSL error, code: -5

So the randomness is coming from a random ssl error when trying to connect to https ... I will have to try to understand when this error is not being treated like any other error ... which is supposed to lead to a retry?

Odd that only MetacelloScriptingIssuesTestCase>>testIssue234a seems susceptible.

@dalehenrich
Copy link
Member Author

The root cause of this problem possibly lies in the Gofer implementation on Squeak (i.e., not catching and propagating the error as a GoferRepositoryError) or something else ... of course the response to the error should be: RETRY...

@dalehenrich
Copy link
Member Author

Well, mystery somewhat resolved as testIssue234a isn't the only test susceptible to the problem:

*** ERRORS *******************
    MetacelloScriptingRegisterTestCase debug: #testBaselineRegister.
    MetacelloScriptingIssuesTestCase debug: #testIssue234a.
    MetacelloScriptingIssuesTestCase debug: #testIssue234b.
**************************************************************************************

@dalehenrich dalehenrich changed the title random test errors ... squeak random test errors (SSL) ... squeak Oct 18, 2014
@dalehenrich dalehenrich changed the title random test errors (SSL) ... squeak random test errors (SSL) ... squeak-4.5 Oct 18, 2014
@dalehenrich
Copy link
Member Author

Now it seems that these Squeak-4.5 test failures are not random

@dalehenrich
Copy link
Member Author

@timfel, @frankshearar, @krono, I'm getting consistent SSL errors running these tests against Squeak-4.5, that are not seen in Squeak4.4 or Squeak-Trunk, so it looks like some sort of SSL-related problem that is only affecting Squeak-4.5 and only for those tests ... there are other tests that make use of https: calls so it's a bit of a mystery to me ... For this one, I'll make these expected failures for Squeak-4.5 (if I can) as they appear to fail pretty reliably, but I'm concerned that folks actually using Squeak-4.5 may start running into these issues as well

@dalehenrich
Copy link
Member Author

@timfel, @kronos, another note in case you missed the previous comment (spelling errors on my part)...anyway it looks like the webclient ssl interface is consistently failing for certain github references (and not for others) and the tests are only failing for Squeak-4.5 ... passing for Squeak-4.4 and Squeak-Trunk ... anyway I've put expectedFailures in so that I can at least make sure other tests are passing for Squeak-4.5 ... so green for 4.5 does not mean that things are in good shape:)

dalehenrich added a commit that referenced this issue Oct 18, 2014
…cumentationIssue196TestCase>>testLockCommandReference2)
@dalehenrich
Copy link
Member Author

Another case popped up:

*** ERRORS *******************
    MetacelloScriptingDocumentationIssue196TestCase debug: #testLockCommandReference2.
**************************************************************************************

So the SSL failures are random ...

@dalehenrich
Copy link
Member Author

And the failures are randomly erroring on multiple Squeak-4.4 as well ...and of course sometimes these guys pass and the test fails with unexpected pass ...

I will have to wire out these tests completely for the Squeak platform until this SSL flakiness is fixed

dalehenrich added a commit that referenced this issue Oct 18, 2014
…omly ... just ignore `SSL error, -5` errors
@dalehenrich
Copy link
Member Author

clean sheet ... with SSL test failure ignored

dalehenrich added a commit that referenced this issue Oct 18, 2014
@krono
Copy link
Collaborator

krono commented Oct 19, 2014

Oh dear. Thanks for noticing. I noticed inconsistencies in the SSL code lately.

Btw: not helping at all, but error code -5 is general error. We used to constantly get those on Windows.
The “correct” fix is to fix SqueakSSL. I have no clue yet what the practical solution is.

@dalehenrich
Copy link
Member Author

If SSL isn't stable, perhaps we should consider reverting back to using
curl?

On Sun, Oct 19, 2014 at 4:14 AM, Tobias Pape [email protected]
wrote:

Oh dear. Thanks for noticing. I noticed inconsistencies in the SSL code
lately.

Btw: not helping at all, but error code -5 is general error. We used to
constantly get those on Windows.
The “correct” fix is to fix SqueakSSL. I have no clue yet what the
practical solution is.


Reply to this email directly or view it on GitHub
https://github.com/dalehenrich/metacello-work/issues/303#issuecomment-59646552
.

@krono
Copy link
Collaborator

krono commented Oct 19, 2014

What is used on Pharo?
(because the SqueakSSL plugin (which throws -5) is the same there.)

@dalehenrich
Copy link
Member Author

Hmmm, Pharo's using zinc/zodiac for https and presumably it uses the squeakSSL plugin, but perhaps the internal error handling is different? ... I recently renamed MetacelloSqueakPlatform>>downloadZipArchive:to:, but other than that haven't made any other chages.

In retrospect, the random failures reported in Issue #286 are probably due to randome SSL failures and that bug report dates back to the time when we switched to using WebClient ...

@krono
Copy link
Collaborator

krono commented Oct 19, 2014

@dalehenrich while trying to reproduce the problems, using OS X I remembered that SqueakSSL on OS X is not yet able to verify certifictates resulting in errors like this:

Loading 1.0.0-beta.32.15 of ConfigurationOfMetacelloPreview...
...RETRY->BaselineOfMetacello
...RETRY->BaselineOfMetacello
gofer repository error: 'GoferRepositoryError: No certificate was provided(code: -1)'...ignoring
...FAILED->BaselineOfMetacello
failed ensureMetacello using 'http://smalltalkhub.com/mc/dkh/metacello/main' : 'Could not resolve: BaselineOfMetacello [BaselineOfMetacello] in /Users/tobias/dev/metacello/testing/dalehenrich-builderCI-c334a7a/builds/travisCI/package-cache github://dalehenrich/metacello-work:c1da7f8098b8759f806aaa952817709b767ed590/repository ERROR: ''GoferRepositoryError: No certificate was provided(code: -1)'''...retrying

Probably curl on Linux/OS X is the feasible option. I don't know why things work on Pharo, tho. Probably the errorhandling in Gofer is different there…

@krono
Copy link
Collaborator

krono commented Oct 19, 2014

PS: I just ran the test on OS X and they don't fail. Back to random :(

@dalehenrich
Copy link
Member Author

The problem with SSL and certificates on the mac is a "known problem with mavericks" ... to get the ssl certificates registered you have to hit the site using safari first ... safari and the mavericks ssl client will do the right thing ... I recall having to do something like this to get curl to work on the mac ...

The Ssl error I am seeing now occurs randomly ...

Gofer has nothing to do with the error handling ... I'm assuming that when WebClient makes it's https calls it is not handling the ssl error the same way that Zinc/Zodiac on the web does ... perhaps the Pharo https code does an internal retry if it hits an error like this ...

I don't quite understand why Metacello is not retrying this error, but the fact is I'm not seeing this kind of behavior on any other platform and I'm consistently (randomly) seeing this happen on all Squeak versions including Squeak-4.4 and the only big difference is WebClient ... and I didn't see this issue prior to introducing WebClient ... so if there is a bugfix I would think the fix needs to go into WebClient ... and if no fix is possible there, then I guess we'll have to revert back to curl?

@krono
Copy link
Collaborator

krono commented Oct 19, 2014

On 19.10.2014, at 21:02, Dale Henrichs [email protected] wrote:

The problem with SSL and certificates on the mac is a "known problem with mavericks" ... to get the ssl certificates registered you have to hit the site using safari first ... safari and the mavericks ssl client will do the right thing ... I recall having to do something like this to get curl to work on the mac ...

Apart from that, SSL cert verification is just not implemented.

See SqueakSSLTest>>testFaceBookAPI or SqueakSSLTest>>testYahooOpenID

The Ssl error I am seeing now occurs randomly ...

:/

Gofer has nothing to do with the error handling ... I'm assuming that when WebClient makes it's https calls it is not handling the ssl error the same way that Zinc/Zodiac on the web does ... perhaps the Pharo https code does an internal retry if it hits an error like this ...

Probably.

I don't quite understand why Metacello is not retrying this error, but the fact is I'm not seeing this kind of behavior on any other platform and I'm consistently (randomly) seeing this happen on all Squeak versions including Squeak-4.4 and the only big difference is WebClient ... and I didn't see this issue prior to introducing WebClient ... so if there is a bugfix I would think the fix needs to go into WebClient ... and if no fix is possible there, then I guess we'll have to revert back to curl?

I think reverting back to curl for the time being is the viable option.
I also think it is more a problem in SqueakSSL (the image part, not the Plugin part) than
in WebClient, but this needs more investigation. Probably @frankshearar has an idea how to
proceed?

@dalehenrich
Copy link
Member Author

Well I was hoping to put my Metacello work to bed today/tomorrow and move on to other things so if you guys could figure out which direction you wanted to go and give me a pull request ... I'm not even sure how much usage the github stuff gets in Squeak ... the "flakey" SSL code was released on the master branch at the end of August so perhaps it is better to stick with the flakey ssl rather than revert to curl which is know to be problematic ...

@timfel
Copy link
Contributor

timfel commented Oct 20, 2014

The thing with curl is that that won't work for Windows users. Using WebClient and the fix in dalehenrich/filetree#130 allows me to have Babelsberg/S load on Windows without issues.

@dalehenrich
Copy link
Member Author

I guess I'm inclined to have WebClient work (then I can release Metacello) ... agree that curl is a step backward .. but if random github failures are a problem then we're stuck between a rock and hard place:(

@dalehenrich
Copy link
Member Author

I think that buidlerCI is using a fairly old squeak vm ... for pharo, the vm is downloaded and installed along with the images ... squeak is just downloading the image, changes and sources files ...

That may explain the SSL issues (that only travis sees occasionally) ...

builderCI supports Squeak-4.3 on so perhaps builderCI should download an officially supported vm?

@krono
Copy link
Collaborator

krono commented Oct 20, 2014

Interestingly, builderCI uses the same VM (and SqueakSSL plugin) the pharo image uses. There should not be an issue; there is no blessed vm for Squeak atm. For the all-in-one-images, we use the at that time current official Cog VM, which is naturally different from the Pharo VM.

So, http://mirandabanda.org/files/Cog/VM/ is the right source for the VM.

@dalehenrich
Copy link
Member Author

Well, Pharo2.0 and beyond are using the vm that is downloaded from the
site[1], so they are using different vms...

Pharo provides some fairly convenient download scripts for picking up the
stable vm ... I'm a little gunshy about downloading experimental vms and
being the one to find problems with them ... I get enough of that with the
trunk:) and I really don't want to have to go all the way back to builderCI
and propogate changes back out every time something goes haywire...

Dale

[1]
https://github.com/dalehenrich/builderCI/blob/master/build_client_image.sh#L81

On Mon, Oct 20, 2014 at 3:37 PM, Tobias Pape [email protected]
wrote:

Interestingly, builderCI uses the same VM (and SqueakSSL plugin) the pharo
image uses. There should not be an issue; there is no blessed vm for
Squeak atm. For the all-in-one-images, we use the at that time current
official Cog VM, which is naturally different from the Pharo VM.

So, http://mirandabanda.org/files/Cog/VM/ is the right source for the VM.


Reply to this email directly or view it on GitHub
https://github.com/dalehenrich/metacello-work/issues/303#issuecomment-59851392
.

@krono
Copy link
Collaborator

krono commented Oct 20, 2014

I understand.

My preference would be to (as it currently is) use the Pharo VM which would be used for pharo images

@dalehenrich
Copy link
Member Author

I don't quite understand ... the is code (the reference that I linked)
where the pharo vm is mapped to the downloaded pharo vm...squeak download
do not download a vm so they are using the pretty ancient vm that was built
into builderCI a long time ago ...

I'd prefer a pull request from you where you verify that things are working
than for me to change the system to download pharo vms for squeak only to
find out that they don't work correctl y...

On Mon, Oct 20, 2014 at 3:49 PM, Tobias Pape [email protected]
wrote:

I understand.

My preference would be to (as it currently is) use the Pharo VM which
would be used for pharo images


Reply to this email directly or view it on GitHub
https://github.com/dalehenrich/metacello-work/issues/303#issuecomment-59852915
.

@krono
Copy link
Collaborator

krono commented Oct 20, 2014

Ah, now I get it. Sorry, I am slow today. You are perfectly right. Let's see what I can craft…

@dalehenrich
Copy link
Member Author

I'm still fighting problems raised when I did the simple step of merging
tim's filetree 3 liner this morning ... fixing bugs in builderCI that now
appear to propogate out to a Metacello bug? (in an old version of
Metacello?) ... on Pharo-1.3? ... and all I did was "change the vm being
used by Pharo1.3":)

On Mon, Oct 20, 2014 at 3:56 PM, Tobias Pape [email protected]
wrote:

Ah, now I get it. Sorry, I am slow today. You are perfectly right. Let's
see what I can craft…


Reply to this email directly or view it on GitHub
https://github.com/dalehenrich/metacello-work/issues/303#issuecomment-59853609
.

@dalehenrich
Copy link
Member Author

Course I had a flat time in the middle of the day as well:) So I'm a bit
cranky:)

On Mon, Oct 20, 2014 at 4:00 PM, Dale Henrichs <
[email protected]> wrote:

I'm still fighting problems raised when I did the simple step of merging
tim's filetree 3 liner this morning ... fixing bugs in builderCI that now
appear to propogate out to a Metacello bug? (in an old version of
Metacello?) ... on Pharo-1.3? ... and all I did was "change the vm being
used by Pharo1.3":)

On Mon, Oct 20, 2014 at 3:56 PM, Tobias Pape [email protected]
wrote:

Ah, now I get it. Sorry, I am slow today. You are perfectly right. Let's
see what I can craft…


Reply to this email directly or view it on GitHub
https://github.com/dalehenrich/metacello-work/issues/303#issuecomment-59853609
.

@krono
Copy link
Collaborator

krono commented Oct 20, 2014

Understandable.

@dalehenrich
Copy link
Member Author

Just out of curiosity have you seen the SSL -5 errors coming from github
download errors? or have they only been showing up in travis runs?

On Mon, Oct 20, 2014 at 4:04 PM, Tobias Pape [email protected]
wrote:

Understandable.


Reply to this email directly or view it on GitHub
https://github.com/dalehenrich/metacello-work/issues/303#issuecomment-59854390
.

@krono
Copy link
Collaborator

krono commented Oct 20, 2014

Its incomparable, as I currently only use OS X and travis runs linux.
I should dig out a linux box and try.

I have not encountered a -5 on my local machine…

@dalehenrich
Copy link
Member Author

I just don't know where the bug lies: vm or webClient ssl support... both
of these are variables when comparing to pharo ...

On Mon, Oct 20, 2014 at 4:11 PM, Tobias Pape [email protected]
wrote:

Its incomparable, as I currently only use OS X and travis runs linux.
I should dig out a linux box and try.

I have not encountered a -5 on my local machine…


Reply to this email directly or view it on GitHub
https://github.com/dalehenrich/metacello-work/issues/303#issuecomment-59855107
.

@dalehenrich
Copy link
Member Author

turns out that Pharo-1.3 does not work properly with the newer pharo vms
... so it's not a metacello error:)

On Mon, Oct 20, 2014 at 4:19 PM, Dale Henrichs <
[email protected]> wrote:

I just don't know where the bug lies: vm or webClient ssl support... both
of these are variables when comparing to pharo ...

On Mon, Oct 20, 2014 at 4:11 PM, Tobias Pape [email protected]
wrote:

Its incomparable, as I currently only use OS X and travis runs linux.
I should dig out a linux box and try.

I have not encountered a -5 on my local machine…


Reply to this email directly or view it on GitHub
https://github.com/dalehenrich/metacello-work/issues/303#issuecomment-59855107
.

@dalehenrich
Copy link
Member Author

oddly enough Pharo1.2 works with newer pharo vms (at least it doesn't hit
the same bug as Pharo1.3 does:)

On Mon, Oct 20, 2014 at 4:23 PM, Dale Henrichs <
[email protected]> wrote:

turns out that Pharo-1.3 does not work properly with the newer pharo vms
... so it's not a metacello error:)

On Mon, Oct 20, 2014 at 4:19 PM, Dale Henrichs <
[email protected]> wrote:

I just don't know where the bug lies: vm or webClient ssl support... both
of these are variables when comparing to pharo ...

On Mon, Oct 20, 2014 at 4:11 PM, Tobias Pape [email protected]
wrote:

Its incomparable, as I currently only use OS X and travis runs linux.
I should dig out a linux box and try.

I have not encountered a -5 on my local machine…


Reply to this email directly or view it on GitHub
https://github.com/dalehenrich/metacello-work/issues/303#issuecomment-59855107
.

dalehenrich added a commit that referenced this issue Oct 20, 2014
Merge last of bugfixes for 1.0.0-beta.32.16.

I'm going to go ahead and merge this pull request despite the existence of Issues with SSL on Squeak platforms: Issue #305 and Issue #303 ... the final resolution for these particular issues are still up in the air and I don't want to dely the release of 1.0.0-beta32.16 any longer ... will push bugfixes for those issues when we got things characterized correctly
@krono
Copy link
Collaborator

krono commented Oct 21, 2014

Just found out Zinc/Zodiac does not verify certs at all…
So much for SqueakSSL/OSX failing…

@dalehenrich
Copy link
Member Author

They must be handling those errors internally ... along with possibly Error
-5?

On Tue, Oct 21, 2014 at 4:54 PM, Tobias Pape [email protected]
wrote:

Just found out Zinc/Zodiac does not verify certs at all…
So much for SqueakSSL/OSX failing…


Reply to this email directly or view it on GitHub
https://github.com/dalehenrich/metacello-work/issues/303#issuecomment-60017937
.

@krono
Copy link
Collaborator

krono commented Oct 22, 2014

This error is just not checked.
SqueakSSL does:

  1. connect
  2. verify

Zodiac does:

  1. connect


I'll now investigate on Linux.

@dalehenrich
Copy link
Member Author

Ah, now we are cooking with gas:)

On Tue, Oct 21, 2014 at 5:35 PM, Tobias Pape [email protected]
wrote:

This error is just not checked.
SqueakSSL does:

  1. connect
  2. verify

Zodiac does:

  1. connect


I'll now investigate on Linux.


Reply to this email directly or view it on GitHub
https://github.com/dalehenrich/metacello-work/issues/303#issuecomment-60021087
.

@krono
Copy link
Collaborator

krono commented Oct 28, 2014

I had the -5 once on linux but everytime I enabled logging, it disappeared.
Apparently, a retry can be the practical solution here

@krono
Copy link
Collaborator

krono commented Nov 3, 2014

I tweaked on webclient. can you retry?

dalehenrich added a commit that referenced this issue Nov 3, 2014
@dalehenrich
Copy link
Member Author

testing as we speak

@dalehenrich
Copy link
Member Author

clean sheet on travis ... no problems on this end ...

@dalehenrich dalehenrich removed this from the 1.0.0-beta.32.17 milestone Nov 10, 2014
@krono
Copy link
Collaborator

krono commented Nov 10, 2014

🎈

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants