Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

source_id should default to having a non-empty value #93

Closed
cboulay opened this issue Dec 10, 2024 · 4 comments · Fixed by #94
Closed

source_id should default to having a non-empty value #93

cboulay opened this issue Dec 10, 2024 · 4 comments · Fixed by #94

Comments

@cboulay
Copy link
Contributor

cboulay commented Dec 10, 2024

Having an empty source_id is undesirable. We want to improve the naive experience by enabling auto-reconnect and also not causing the inlet to crash when the outlet is closed -- which seems to happen when source_id="".

I propose that we change to source_id: typing.Optional[str] = None and, when it is None we generate a new source_id that is built from a hash of name, type, channel_count, nominal_srate, and channel_format.


I would like to include the hostname in the hash to prevent collisions. However, while a manually-created StreamInfo object is most likely going to be used to initialize a StreamOutlet where grabbing the hostname is easy, it could also be used to initialize a StreamInlet when the hostname is unknown.

Note: A valid StreamInlet can be created from a manually-created StreamInfo, rather than one returned by a stream-resolution function, if the name, type, channel_count, and channel_format are all known.

Instead, when creating the outlet, we can examine the source_id and if it matches perfectly with the expected hash then we further add the hostname to the hash.

@cboulay cboulay changed the title source_id should default to having a legitimate value source_id should default to having a non-empty value Dec 10, 2024
@chkothe
Copy link
Member

chkothe commented Dec 10, 2024

I'd suggest giving this proposal some more time for refinement. There's a foundational assumption baked into LSL that, if there's a source ID set (eg in an inlet) and it matches some stream on the network then the streams are certified (by whoever set the id) to be one and the same stream. If we always set it to a value, then use cases where two instances of some data source are running that were resolved with whatever query the user chose to identify them (e.g., most recently created, some other desc/ property) then the moment one of them gets lost, the inlet can silently flip over to the other source without the user knowing, and so they can't be sure they're not silently getting the wrong data from that point onward.

Obviously it's not great that one can get a LostError, but then that's meant to give the application an opportunity to deal with the case that the source went away and the source didn't provide an (actually) unique identifier to ensure its recovery (if one wants to have a program that will survive even that, at the risk of getting the wrong data, the application could always manually re-resolve the stream and eg print a warning etc), or in many cases it may be necessary for the user to step in to handle it.

One thing one could potentially do is have a config value that always assigns a (pseudo-)unique id from a set of fields, so that power users who have data sources that they know are unique in their setup but which don't set id's (and are thus not recoverable) can still have a robust setup (I suppose in a future incarnation one could also make that more fine-grained than a true/false, e.g., it could be a dictionary that allows the power users to do surgery on the ids of their streams, like say forcibly overriding the id for a stream named BioSemi to '123' even though that one may currently set a different id that may have other uniqueness characteristics, say if the app developer used a fresh random number each time).

edit: though actually I misread this as a git issue on liblsl rather than pylsl: with the latter one could potentially make it explicit enough to the user in documentation or logging etc that they don't fall into that trap, like always logging that a putative default of source_id=True caused a pseudo-unique id being generated and scolding them for not setting it to either None (if they really don't want one because they can't guarantee uniqueness) or a string of their own choosing...

@cboulay
Copy link
Contributor Author

cboulay commented Dec 10, 2024

make it explicit enough to the user in documentation or logging etc

Gah, I forgot to update the docstring in the PR!

@cboulay
Copy link
Contributor Author

cboulay commented Dec 10, 2024

actually I misread this as a git issue on liblsl rather than pylsl

I don't think this is appropriate on liblsl. Developers using the C / C++ API are somewhat forced to reckon with this concept.

In Python-land however, there are many more users blindly using source_id="" which IMO is a worse experience than having a non-empty default that is mostly unique.

Power users with multiple streams with the same name-type-channelcount-srate-format can use source_id="" to get the original behaviour back or, preferably, set the source_id to something truly uniquely identifying in their setup.

@chkothe
Copy link
Member

chkothe commented Dec 12, 2024

Agreed that in Python one can have a developer UX that strikes maybe a better middle ground between convenience and strictness, and I saw in your PR that you're printing in that case. I guess that's enough of a nag to get users to choose one by themselves (or nulling it) rather than enduring that showing up in their console...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants