You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jan 30, 2020. It is now read-only.
TestScheduleMachineOf fails when enable_grpc is set to true. That failure happens only occasionally. You can see the following logs:
fleetd[94]: transport: http2Client. notifyError got notified that the client transport was broken EOF.
fleetd[94]: DEBUG agent.go:87: HeartbeatJobs tick
fleetd[94]: DEBUG unit_state.go:222: Pushing UnitState(ping.0.service) to Registry: &unit.UnitState{LoadState:"loaded", ActiveState:"active", SubState:"running", MachineID:"10c94346e797450d94d3e9b0f657cae4", UnitHash:"b4f04719a20575e450899ac57097fc6d8ac12cc4", UnitName:"ping.0.service"}
fleetd[94]: DEBUG unit_state.go:222: Pushing UnitState(pong.0.service) to Registry: &unit.UnitState{LoadState:"loaded", ActiveState:"active", SubState:"running", MachineID:"10c94346e797450d94d3e9b0f657cae4", UnitHash:"a38421afebf0062e438f25c03c5f24e8202a3a70", UnitName:"pong.0.service"}
fleetd[94]: DEBUG unit_state.go:222: Pushing UnitState(ping.0.service) to Registry: &unit.UnitState{LoadState:"loaded", ActiveState:"active", SubState:"running", MachineID:"10c94346e797450d94d3e9b0f657cae4", UnitHash:"b4f04719a20575e450899ac57097fc6d8ac12cc4", UnitName:""}
fleetd[94]: DEBUG unit_state.go:222: Pushing UnitState(pong.0.service) to Registry: &unit.UnitState{LoadState:"loaded", ActiveState:"active", SubState:"running", MachineID:"10c94346e797450d94d3e9b0f657cae4", UnitHash:"a38421afebf0062e438f25c03c5f24e8202a3a70", UnitName:""}
fleetd[94]: ERROR registrymux.go:166: Retry to connect to new engine: dial tcp 172.18.1.1:50059: getsockopt: connection refused
fleetd[94]: DEBUG mux.go:56: HTTP GET /fleet/v1/machines?alt=json
fleetd[94]: ERROR registrymux.go:166: Retry to connect to new engine: dial tcp 172.18.1.1:50059: getsockopt: connection refused
fleetd[94]: ERROR registrymux.go:166: Retry to connect to new engine: dial tcp 172.18.1.1:50059: getsockopt: connection refused
And finally:
--- FAIL: TestScheduleMachineOf (fleet.conf=[enable_grpc=true]) (31.66s)
scheduling_test.go:132: failed to find 6 active units within 15s (last found: 2)
To reproduce this issue, several conditions should be satisfied.
First, functional test must be running on a host, where little memory remains free. When there's much free memory, it becomes pretty hard to reproduce.
And even with such an environment, the functional test must run at least 3~4 times, until it hits the error. Running only once is not sufficient.
Running only a particular test is normally not sufficient. You should run a whole test without a '--run' option to make it easier to reproduce.
Sometimes TestScheduleReplace fails, instead of TestScheduleMachineOf. Though I'm not sure these 2 kinds of failures are the same.
After a long investigation, I think this issue looks like a regression from ecb121a ("registry/rpc: use simpleBalancer instead of ClientConn.State()"). Still I'm not completely sure. Let me prepare a fix.
The text was updated successfully, but these errors were encountered:
When gRPC turned on, TestScheduleMachineOf fails sometimes, as the
engine becomes unreachable with the following error messages:
====
transport: http2Client. notifyError got notified that the client transport was broken EOF.
ERROR registrymux.go:166: Retry to connect to new engine: dial tcp 172.18.1.1:50059: getsockopt: connection refused
ERROR registrymux.go:166: Retry to connect to new engine: dial tcp 172.18.1.1:50059: getsockopt: connection refused
ERROR registrymux.go:166: Retry to connect to new engine: dial tcp 172.18.1.1:50059: getsockopt: connection refused
====
This must have been a regression from commit ecb121a ("registry/rpc: use
simpleBalancer instead of ClientConn.State()"). Remove the additional
checking with IsRegistryReady, in order to avoid the occasional case of
engine being unreachable.
Fixescoreos#1712
When gRPC turned on, TestScheduleMachineOf fails sometimes, as the
engine becomes unreachable with the following error messages:
====
transport: http2Client. notifyError got notified that the client transport was broken EOF.
ERROR registrymux.go:166: Retry to connect to new engine: dial tcp 172.18.1.1:50059: getsockopt: connection refused
ERROR registrymux.go:166: Retry to connect to new engine: dial tcp 172.18.1.1:50059: getsockopt: connection refused
ERROR registrymux.go:166: Retry to connect to new engine: dial tcp 172.18.1.1:50059: getsockopt: connection refused
====
This must have been a regression from commit ecb121a ("registry/rpc: use
simpleBalancer instead of ClientConn.State()"). Remove the additional
checking with IsRegistryReady, in order to avoid the occasional case of
engine being unreachable.
Fixescoreos#1712
When gRPC turned on, TestScheduleMachineOf fails sometimes, as the
engine becomes unreachable with the following error messages:
====
transport: http2Client. notifyError got notified that the client transport was broken EOF.
ERROR registrymux.go:166: Retry to connect to new engine: dial tcp 172.18.1.1:50059: getsockopt: connection refused
ERROR registrymux.go:166: Retry to connect to new engine: dial tcp 172.18.1.1:50059: getsockopt: connection refused
ERROR registrymux.go:166: Retry to connect to new engine: dial tcp 172.18.1.1:50059: getsockopt: connection refused
====
This must have been a regression from commit ecb121a ("registry/rpc: use
simpleBalancer instead of ClientConn.State()"). Remove the additional
checking with IsRegistryReady, in order to avoid the occasional case of
engine being unreachable.
Fixescoreos#1712
TestScheduleMachineOf
fails when enable_grpc is set to true. That failure happens only occasionally. You can see the following logs:And finally:
To reproduce this issue, several conditions should be satisfied.
First, functional test must be running on a host, where little memory remains free. When there's much free memory, it becomes pretty hard to reproduce.
And even with such an environment, the functional test must run at least 3~4 times, until it hits the error. Running only once is not sufficient.
Running only a particular test is normally not sufficient. You should run a whole test without a '--run' option to make it easier to reproduce.
Sometimes
TestScheduleReplace
fails, instead ofTestScheduleMachineOf
. Though I'm not sure these 2 kinds of failures are the same.After a long investigation, I think this issue looks like a regression from ecb121a ("registry/rpc: use simpleBalancer instead of ClientConn.State()"). Still I'm not completely sure. Let me prepare a fix.
The text was updated successfully, but these errors were encountered: