Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build(gql_websocket_link): upgrade web_socket_channel to v3.0.1, rxdart to >=0.26.0 <= 0.28.0 #475

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

agufagit
Copy link
Collaborator

@agufagit agufagit commented Nov 6, 2024

upgrade web_socket_channel to v3.0.1, rxdart to >=0.26.0 <= 0.28.0

not sure what this code does, just copied it from pr #462, it makes sink errors go away though, I can't handle StreamSink is closed error even with try...catch block

channel.stream.asBroadcastStream().listen(
  ...
)

@knaeckeKami
Copy link
Collaborator

I believe there is a regression (or breaking change) in web_socket_channel which breaks the auto reconnect. Or, at least, the tests.
The implementation in graphql has similar issues:
zino-hofmann/graphql-flutter#1463

@agufagit
Copy link
Collaborator Author

agufagit commented Nov 7, 2024

looks like completeError is causing it, how do you handle error thrown by completeError?

00:01 +1 -1: test/gql_websocket_link_test.dart: WebSocketLink transport ws sub-protocol Auto reconnect [E]
  LikeCloseEvent(code: 1001, reason: , wasClean: null)
  dart:async                                                        _Completer.completeError
  package:gql_websocket_link/src/graphql_transport_ws.dart 663:13   _ConnectionState.errorOrClosed.<fn>
  package:gql_websocket_link/src/graphql_transport_ws.dart 82:32    TransportWsEvent.execute
  package:gql_websocket_link/src/graphql_transport_ws.dart 1143:13  _Emitter.emit
  package:gql_websocket_link/src/graphql_transport_ws.dart 717:66   _ConnectionState._startConnecting.<fn>.<fn>
  package:gql_websocket_link/src/graphql_transport_ws.dart 873:32   _ConnectionState._startConnecting.<fn>.<fn>
  package:stream_channel                                            _GuaranteeSink.close
  package:web_socket_channel/adapter_web_socket_channel.dart 91:36  new AdapterWebSocketChannel.<fn>.<fn>
  ===== asynchronous gap ===========================
  dart:async                                                        _StreamImpl.listen
  package:gql_websocket_link/src/graphql_transport_ws.dart 779:36   _ConnectionState._startConnecting.<fn>
  ===== asynchronous gap ===========================
  dart:async                                                        _CustomZone.registerUnaryCallback
  package:gql_websocket_link/src/graphql_transport_ws.dart 698:18   _ConnectionState._startConnecting.<fn>
  package:gql_websocket_link/src/graphql_transport_ws.dart 886:7    _ConnectionState._startConnecting
  package:gql_websocket_link/src/graphql_transport_ws.dart 900:20   _ConnectionState.connect
  package:gql_websocket_link/src/graphql_transport_ws.dart 986:34   _Client.subscribe.<fn>
  package:gql_websocket_link/src/graphql_transport_ws.dart 1054:7   _Client.subscribe
  package:gql_websocket_link/src/graphql_transport_ws.dart 1246:26  TransportWebSocketLink.request.<fn>
  dart:async                                                        _ForwardingStream.listen
  test/gql_websocket_link_test.dart 1529:29                         _testLinks.<fn>
  ===== asynchronous gap ===========================
  dart:async                                                        _CustomZone.registerUnaryCallback
  test/gql_websocket_link_test.dart 1482:16                         _testLinks.<fn>

@knaeckeKami
Copy link
Collaborator

completeError does not throw an error, it just makes the future of the completer complete with an error. so it depends on where the future of the completer is used.

The logs look like the come from toString() of the LikeCloseEvent class. probably logged by the log() function passed to TransportWsClientOptions. might be useful for debugging.
I don't have resources to dig deeper into this for now though.

Also, see #435, If someone wants to take ownership of gql_websocket_link and is committed to keeping it open and community-driven, I am willing to add them as maintainer.

@agufagit
Copy link
Collaborator Author

agufagit commented Nov 8, 2024

it looks like a bug from "dart or web_socket_channel or web_socket" linux implementation, somehow completeError throws same error twice, and one of them is not caughtable by try/catch block, while the other is caughtable by try/catch block

I made a workaround, but I'm not sure what the exact cause of this is, I don't really have time to dig into it

yes, I can take ownership of gql_websocket_link, but I do get quite busy sometimes

@knaeckeKami
Copy link
Collaborator

Hint:

It might be easier to check what's going on if you add

   TransportWsClientOptions(
              log: print,

to _testLinks when debugging.

Yes, it definitely is a change in behavior in we_socket_channel ^3.0.0.

However, it suspect this is a real bug in our implementation which is just now uncovered.
It might be related to #471.

I can reproduce it locally as well on macos, it is not only happening on linux.

I think there's a race condition somewhere. If I change the GraphQLSocketMessageDecoder and GraphQLSocketMessageEncoder to be synchronous (without FutureOr), and remove the await before encoding and decoding everywhere, then the test completes successfully.

I am not sure if this workaround reliably fixes this issue or just makes the test complete successfully, can you elaborate a bit on that?

Might also be worth checking if this goes away with the changes in #471.

I also think that we could drastically reduce the complexity of the state logic by making the encoding and decoding synchronous, but that would be a breaking change.
Then again, I doubt a lot of users use custom encoders/decoders which are asynchronous anyway.

@agufagit
Copy link
Collaborator Author

agufagit commented Nov 9, 2024

I just tried #471, it doesn't resolve this.

Here is how I reached to the conclusion `completeError' throws the same error twice

in void errorOrClosed(void Function(Object errOrEvent) cb) function, add a log

closed: (Object event) {
  listening.forEach((unlisten) => unlisten());
  print("errorOrClosed closed");
  cb(event);
},

During "Auto Reconnect" test, this log is printed twice, both calling codes are in Future<_Connected> _startConnecting() function

errorOrClosed((errOrEvent) {
        options.log?.call("errorOrClosed $errOrEvent");
        connecting = null;
        isOpen = false;
        connectionAckTimeout?.cancel();
        queuedPing?.cancel();
        print(_comp.isCompleted); // this returns true during "Auto Reconnect" test
        if (!_comp.isCompleted) {
          denied(errOrEvent);
        }

        if (errOrEvent is LikeCloseEvent && errOrEvent.code == 4499) {
          // close event is artificial and emitted manually, see `Client.terminate()` below
          socket.sink.close(4499, "Terminated");
          onError = null;
          onClose = null;
        }
      });
bool acknowledged = false;
late final StreamSubscription _messageSubs;
_messageSubs = socket.stream.listen(
   (Object? msg) async {
     ...
     errorOrClosed(_completer.completeError);
   }
);

The first calling code doesn't do anything. The 2nd calling code will throw uncaughtable LikeCloseEvent(code: 1001, reason: , wasClean: null) error and then continue on to finish the test process successfully (you can put print statements in subscribe(...) function to see execution flow, this time LikeCloseEvent error is caught by try/catch block).

My workaround is to complete normally, and catch the first uncaughtable LikeCloseEvent(code: 1001, reason: , wasClean: null) error in _completer.future.catchError(...) , then handle that error.

I don't mind if we change encoder/decoder to be synchronous because I'm using websocket connection in a separate isolate, but I think it should stay asynchronous if it has even the slightest chance to avoid screen flickering. My first rule of business, "don't sacrifice user experience for a better developer experience".

I think the trend in the future, will be to move away from json and grpc data format to cbor format. So encoder/decoder probably will be used in a couple of years

@knaeckeKami
Copy link
Collaborator

I'm using websocket connection in a separate isolate

That's a much better solution anyway IMO! Receiving the bytes on the main isolate, decoding them to a String, then copying that string to a new isolate, decoding the string to JSON on the new isolate and then copying that JSON back to the main isolate is a lot of overhead compared to running the whole websocket in a long-lived isolate that just copies the parsed responses.

My first rule of business, "don't sacrifice user experience for a better developer experience".

In principle, I agree. It makes sense to bite the bullet in low-level packages to move complexity away from applications. If if we don't have the resources to ensure correctness however, it might be better to keep the code simpler.

Here is how I reached to the conclusion `completeError' throws the same error twice

Seems plausible.

@agufagit
Copy link
Collaborator Author

you want me to merge this in, or wait until I have time to find the cause? I can dig into it in mid december

@knaeckeKami
Copy link
Collaborator

If you're confident that there are no regressions induced by the update, go ahead ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants