-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[builds][Slow] openshift pipeline build Pipeline with maven slave [It] should build and complete successfully #13984
Comments
Dupe of #13983 |
/cc @gabemontero |
OK, grepping on the build config
In other words, no POSTs that would have occured if the build config instantiatebinary endpoint had bubbled up to the master's So things to pursue:
I'll start a PR for 2), and start asking around about 1) |
Another failure in the extended tests today. An excerpt from the build logs shows some networking issues
The jenkins logs also has repeated connection exceptions thrown,
|
In chatting with @oatmealraisin and digesting the data he noted, of particular interest to me was:
Is exactly like the error seen previously with I agree the java side exceptions @oatmealraisin noted seem similar. We'll have to consider when diving into this more whether this issue is somehow Jenkins/JVM specific, or if the Jenkins/JVM scenario is surfacing a new instability in openshift/origin. |
Possible correlation ... in debugging some new unit tests, if an api object was invalid, rather than getting a validation error, I got a tls handshake error:
I wonder if at least for the failed |
I am seeing another flavor of this test failing. Specifically, it looks like the I've also reproduced this locally/manually in jenkins deployed in openshift. It is not everytime, but has happened more than once. Also, on the point of "what might have changed" ... we did update the oc client in the jenkins image late last week: see openshift/jenkins@d48c23c I wonder if that is at least part of the instability we've seen, at least in the cases where invocations of @bparees wdyt ^^ UPDATE: switching |
Also note, |
@ncdc could probably weigh in on |
We don't use spdy for retrieving logs. Only exec, attach, and portforward. There is a kube PR to fix an issue where --follow wouldn't stop when the container exited, but as best I can tell, it only applies to the json-file docker log driver. kubernetes/kubernetes#44406. If you start a pod that does something like print out the date every second for five or ten seconds before terminating, and then in another terminal run |
On Thu, May 4, 2017 at 8:19 PM Andy Goldstein ***@***.***> wrote:
We don't use spdy for retrieving logs. Only exec, attach, and portforward.
There is a kube PR to fix an issue where --follow wouldn't stop when the
container exited, but as best I can tell, it only applies to the json-file
docker log driver. kubernetes/kubernetes#44406
<kubernetes/kubernetes#44406>.
If you start a pod that does something like print out the date every
second for five or ten seconds before terminating, and then in another
terminal run oc logs mypod -f, does it stop following, or does it hang
waiting for more logs?
The frequency of logging was similar to what you described, and it hangs
waiting for more logs. In other words The pod was done generating output
(for an openshift build in our case), and all the logs showed where the
--follow was employed.
The build pod / container had exited as well.
… —
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
<#13984 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADbadNX55nY4ysfpQcgCHRz0XR8Pq7omks5r2msYgaJpZM4NNaF8>
.
|
Would you have time to try to reproduce with the smallest example possible,
like what I described above with just a simple pod?
On Thu, May 4, 2017 at 8:34 PM, Gabe Montero <[email protected]>
wrote:
… On Thu, May 4, 2017 at 8:19 PM Andy Goldstein ***@***.***>
wrote:
> We don't use spdy for retrieving logs. Only exec, attach, and
portforward.
>
> There is a kube PR to fix an issue where --follow wouldn't stop when the
> container exited, but as best I can tell, it only applies to the
json-file
> docker log driver. kubernetes/kubernetes#44406
> <kubernetes/kubernetes#44406>.
>
> If you start a pod that does something like print out the date every
> second for five or ten seconds before terminating, and then in another
> terminal run oc logs mypod -f, does it stop following, or does it hang
> waiting for more logs?
>
The frequency of logging was similar to what you described, and it hangs
waiting for more logs. In other words The pod was done generating output
(for an openshift build in our case), and all the logs showed where the
--follow was employed.
The build pod / container had exited as well.
> —
> You are receiving this because you were assigned.
> Reply to this email directly, view it on GitHub
> <#13984 (comment)
>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/
ADbadNX55nY4ysfpQcgCHRz0XR8Pq7omks5r2msYgaJpZM4NNaF8>
> .
>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#13984 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAABYpQw3zW8qfKdOmCQcqBCejUW0ai3ks5r2m6SgaJpZM4NNaF8>
.
|
Separate from the --follow item discussed here, on the debug front For the TLS issues I'm planning to
|
On Thu, May 4, 2017 at 8:36 PM Andy Goldstein <[email protected]>
wrote:
Would you have time to try to reproduce with the smallest example possible,
like what I described above with just a simple pod?
Sure. I've got some diagnostic ideas I just posted for the more prevalent
TLS comm errors, but when I get done with those I'll circle back and try a
simple pod.
…
On Thu, May 4, 2017 at 8:34 PM, Gabe Montero ***@***.***>
wrote:
> On Thu, May 4, 2017 at 8:19 PM Andy Goldstein ***@***.***>
> wrote:
>
> > We don't use spdy for retrieving logs. Only exec, attach, and
> portforward.
> >
> > There is a kube PR to fix an issue where --follow wouldn't stop when
the
> > container exited, but as best I can tell, it only applies to the
> json-file
> > docker log driver. kubernetes/kubernetes#44406
> > <kubernetes/kubernetes#44406>.
> >
> > If you start a pod that does something like print out the date every
> > second for five or ten seconds before terminating, and then in another
> > terminal run oc logs mypod -f, does it stop following, or does it hang
> > waiting for more logs?
> >
> The frequency of logging was similar to what you described, and it hangs
> waiting for more logs. In other words The pod was done generating output
> (for an openshift build in our case), and all the logs showed where the
> --follow was employed.
>
> The build pod / container had exited as well.
>
> > —
> > You are receiving this because you were assigned.
> > Reply to this email directly, view it on GitHub
> > <
#13984 (comment)
> >,
> > or mute the thread
> > <https://github.com/notifications/unsubscribe-auth/
> ADbadNX55nY4ysfpQcgCHRz0XR8Pq7omks5r2msYgaJpZM4NNaF8>
> > .
>
> >
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#13984 (comment)
>,
> or mute the thread
> <
https://github.com/notifications/unsubscribe-auth/AAABYpQw3zW8qfKdOmCQcqBCejUW0ai3ks5r2m6SgaJpZM4NNaF8
>
> .
>
—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
<#13984 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADbadLiv75UVMVJdmkT0DfaDPKsnj9haks5r2m8PgaJpZM4NNaF8>
.
|
@ncdc - here is what I believe is a simple pod example: Details on my locally build origin:
This is at some level I built yesterday (with a debug commit on top). This particular pod ends up error'ing out after running for a bit, but since I've seen this occur repeatedly with both pods that complete after a while and pods that error out after a while, I figured it wasn't worth debugging this sample further. I'm going to rebase / rebuild my origin in a sec from latest master, to see if the error still occurs. I'll report the finding when I have them. thanks |
@ncdc - hey, still happens for me after rebasing this moring. version particulars:
And it is probably obvious, but to be clear, I ran:
thanks |
@gabemontero I haven't tested with a build pod, but I have tested with a simple centos:centos7, and the |
On Fri, May 5, 2017 at 1:22 PM Andy Goldstein ***@***.***> wrote:
@gabemontero <https://github.com/gabemontero> I haven't tested with a
build pod, but I have tested with a simple centos:centos7, and the oc
logs -f command terminates as soon as the container exits.
@ncdc yep ther is an intermittent nature to this. I've seen it
consistently on my fedora 25 laptop but only occasionally in our extended
tests on ci.openshift. Not sure how to pin down the environmental
differences.
… —
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#13984 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADbadE6FCBuM3sAeZLe4ZAPlVp51hvAJks5r21rLgaJpZM4NNaF8>
.
|
On Fri, May 5, 2017 at 1:30 PM Gabe Montero ***@***.***> wrote:
On Fri, May 5, 2017 at 1:22 PM Andy Goldstein ***@***.***>
wrote:
> @gabemontero <https://github.com/gabemontero> I haven't tested with a
> build pod, but I have tested with a simple centos:centos7, and the oc
> logs -f command terminates as soon as the container exits.
>
@ncdc yep ther is an intermittent nature to this. I've seen it
consistently on my fedora 25 laptop but only occasionally in our extended
tests on ci.openshift. Not sure how to pin down the environmental
differences.
Perhaps I should see if I can induce a golang thread dump on the oc logs on
my system ....would that be of any use?
… —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#13984 (comment)>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/ADbadE6FCBuM3sAeZLe4ZAPlVp51hvAJks5r21rLgaJpZM4NNaF8>
> .
>
|
Definitely. `kill -3`. You may need to redirect the output of `oc logs` to
a file in case the thread dump overflows your terminal scrollback.
On Fri, May 5, 2017 at 1:36 PM, Gabe Montero <[email protected]>
wrote:
… On Fri, May 5, 2017 at 1:30 PM Gabe Montero ***@***.***> wrote:
> On Fri, May 5, 2017 at 1:22 PM Andy Goldstein ***@***.***>
> wrote:
>
>> @gabemontero <https://github.com/gabemontero> I haven't tested with a
>> build pod, but I have tested with a simple centos:centos7, and the oc
>> logs -f command terminates as soon as the container exits.
>>
> @ncdc yep ther is an intermittent nature to this. I've seen it
> consistently on my fedora 25 laptop but only occasionally in our extended
> tests on ci.openshift. Not sure how to pin down the environmental
> differences.
>
Perhaps I should see if I can induce a golang thread dump on the oc logs on
my system ....would that be of any use?
> —
>> You are receiving this because you were mentioned.
>> Reply to this email directly, view it on GitHub
>> <#13984#
issuecomment-299524626>,
>> or mute the thread
>> <https://github.com/notifications/unsubscribe-auth/
ADbadE6FCBuM3sAeZLe4ZAPlVp51hvAJks5r21rLgaJpZM4NNaF8>
>> .
>>
>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#13984 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAABYmUc0iHUq1zZyRxsHEGnsaCEt62mks5r214xgaJpZM4NNaF8>
.
|
@ncdc here is the thread dump: Other than goroutine 72, everything else is blocked on IO or semaphores down in golang code. The comment before the channel select in the apimachinery code was ominous at first glance:
Though that logic has not changed in a while if I'm reading Here is my go version: go version go1.7.3 linux/amd64 first blush, smells like a golang issue (admittedly novice) opinion, I would contend some defensive programming is in order, and that the call to UPDATE (May 11) - upgraded to go1.7.5, no affect on my local runs of |
Older PR test and overnight failures have been pruned, but between the debug runs in #14013, a recent overnight run that resulted in #14093,
I'm not ready to close this issue out in lieu of #14093 until a) we get a good week or so of runs where a failure occurs and this apparent mount issues is not in play ... though admittedly, it is entirely conceivable that these were present in the older runs and I just missed them / wasn't cognizant enough earlier on to look for them b) the |
FYI - broke out the log following thread here to separate issue #14148 |
the latest findings in #14013 are pointing toward a solution for this issue if the trend in the PR continues for another day or so, I'll rebrand it as a fix |
[Fail] [builds][Slow] openshift pipeline build Pipeline with maven slave [It] should build and complete successfully
/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/test/extended/builds/pipeline.go:149
https://ci.openshift.redhat.com/jenkins/job/test_pull_request_origin_extended/285/consoleFull#-86936742456bf4006e4b05b79524e5923
I was unable to reproduce locally.
The text was updated successfully, but these errors were encountered: