-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Async/Await Prototype #70
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
multiple threading / possible concurrency errors (Future isn't threadsafe)
For which purpose do you want use threading ?
Yes you are right, that Future
is not threadsafe, but in my opinion this is problem of customer not problem of library. Because customer can easy to shoot yourself in the foot
by using one event loop in a few threads and forget
to call run_coroutine_threadsafe
.
gmqtt/aioclient.py
Outdated
self.loop = loop | ||
|
||
self.cbclient = inner_client | ||
self.message_queue = asyncio.Queue() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is dangerous, because if your "reading-code" will be blocked for long time - it will produce high memory usage and we will get memory overflow. So it will be nice to limit queue by option in __init__
method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I just looked at the flow control section of the MQTT spec. The broker won't send more messages than the client's send quota, initially set to receive_maximum
. I imagine this is so the client's buffer doesn't overflow. The quota is incremented by a PUBACK, should we send this only once the message has been delivered to the user? (popped off the queue)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There might already be something like this in the codebase. I'll have a look around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, receive_maximum
may be a good solution, but it's works only for qos1
, qos2
not for qos0
.
The quota is incremented by a PUBACK, should we send this only once the message has been delivered to the user?
In the current implementation you may choose when send PUBACK
(by setting optimistic_acknowledgement
to True or False) - after we received message (but before user process if by on_message
callback) or after, it will be processed by on_message
callback and will use result of it for PUBACK
.
So in my opinion, it's good to save this behaviour and allow to user choose when PUBACK
will be sent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sweet, I chased down the optimistic_acknowledgement
setting and found that what you described; the PUBACK is sent before or after the on_message
callback has been run. I think I understand the usecase for optimistic acknowledgements now.
Without optimistic acknowledgments, sub1
will continue to receive messages but will never respond with a PUBACK:
async with gmqtt.connect(...) as client:
sub1 = client.subscribe(...)
async for message in client.subscribe(...):
print(message)
I used to think the lack of PUBACK's would prompt a retransmission, but that doesn't seem to be the case. Section 4.4 of the spec says:
When a Client reconnects with Clean Start set to 0 and a session is present, both the Client and Server MUST resend any unacknowledged PUBLISH packets (where QoS > 0) and PUBREL packets using their original Packet Identifiers. This is the only circumstance where a Client or Server is REQUIRED to resend messages. Clients and Servers MUST NOT resend messages at any other time.
This means the broker won't retransmit packets like I thought. PUBACK is a notification of the client "taking ownership" of the message; the point where we are happy to let the broker forget the message. To me, that means once the message is delivered to the user. Optimistic acknowledgements just disable the flow control feature of MQTT (designed to keep our buffer from overflowing), and saves the broker from storing the messages while they sit in our buffer; An extremely light burden for most cases.
I'm happy to add the option for optimistic acknowledgments, it should just be passing the parameter through to the standard client. I don't think I'd recommend the feature to most people though. Perhaps you can convince me otherwise? 😛
Currently, the __handle_publish_callback
is handled after on_message
has returned:
Lines 329 to 330 in 4fb92a9
run_coroutine_or_function(self.on_message, self, print_topic, packet, 2, properties, | |
callback=partial(self.__handle_publish_callback, qos=2, mid=mid)) |
For non-optimistic acknowledgements, the library should send the PUBACK as soon as the message is delivered to the user.
My wrapper will need a special case where the __handle_publish_callback
is passed to the on_message
handler rather than run after it completes.
Subclassing Client
is one option.
@Mixser Thanks for the review, I agree with all of your points, and corrected for most of them. ThreadingI was unsure if threading was going to be an issue or not (I have trust issues with callback functions). It sounds like threads aren't something we need to worry about, which is great. Flow ControlYou raised a good point with the infinitely-sized queue causing a possible memory overflow. MQTT has a flow-control method to prevent client buffer overflow, but depends on sending the PUBACK response once the message is removed from the queue and delivered to the user. Here's the solutions I can think of:
Option 1 sounds preferable as this is a wrapper class, but I'm unsure if it can be implemented with the callback API you have.
If not, I imagine it will be easier to implement Option 2. In my opinion, Option 3 should not be considered as it breaks the spec and could lead to qos>0 packets being lost 😬. We will also have to throw out a qos=0 packet on an overfull queue, as MQTT's flow control doesn't account for them. This might be implemented elsewhere in this project. SubscriptionsI rethought the subscription part of the API and implemented an object-oriented alternative that better reflects the long-lived nature of subscriptions, and removes the ambiguity you demonstrated. Here it's obvious what topic subscription async with gmqtt.connect('iot.eclipse.org') as client:
sub = await client.subscribe('test1/#')
sub2 = await client.subscribe('test2/#')
message = await sub2.receive()
await sub2.unsubscribe() # optional I also implemented the subscribe method as a context manager that unsubscribes on exit: async with gmqtt.connect('iot.eclipse.org') as client:
async with client.subscribe('test1/#') as sub:
print(await sub.receive()) And made subscription a generator, as I imagined in my initial issue #69 🎉 import gmqtt.aioclient as gmqtt
async with gmqtt.connect('iot.eclipse.org') as client:
async with client.subscribe('test1/#') as sub:
async for message in sub:
print(message.payload) Next StepsAdditional functionsI'd like to wrap up everything connection related into the
Is there any other functionality I'm missing out? Flow control
TestingI imagine I can mock the existing client and test that my wrapper calls the right functions.
Documentation
|
if "+" in level: | ||
raise ValueError("Single-level wildcard (+) only allowed by itself") | ||
|
||
def match(self, topic: str) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about using regexp ?
For example if a user passes a topic ("root/sub1/+/sub3") we can transform it in regexp ("root/sub1/([^/]+)/sub3") and code will be more clear.
Replacements: +
-> ([^/]+)
, #
->(.+)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I am considering switching this to a regexp - I wonder how much faster it'll be.
I'm also considering using a tree structure to store the subscriptions, which I think will find all matching subscriptions with less work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tree structure will give us a speed boost if only you will have hundred of subscriptions. For a less then ten (in my opinion) it will be only overhead for iterating and managing this structure in memory.
But it's interesting and you may implement it and compare with for-loop and matching by regexp
implementation.
# if over, attempt to drop qos=0 packet using `drop_policy` | ||
|
||
# if under, add to appropiate queues: | ||
for subscription_topic in self.subs: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's all good, but we can reduce count of indents and make the code more beautiful.
for topic in filter(lambda x: x.match(message.topic)):
for sub in self.subs[topic]:
subs.put_nowait(message)
or
for subscription_topic in self.subs:
if not subscription_topic.match(message.topic):
continue
for subscription in self.subs[subscription_topic]:
subscription.put_nowait(message)
if "+" in level: | ||
raise ValueError("Single-level wildcard (+) only allowed by itself") | ||
|
||
def match(self, topic: str) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tree structure will give us a speed boost if only you will have hundred of subscriptions. For a less then ten (in my opinion) it will be only overhead for iterating and managing this structure in memory.
But it's interesting and you may implement it and compare with for-loop and matching by regexp
implementation.
receive_maximum = receive_maximum or 65665 # FIXME: Sane default? | ||
|
||
self.subscription_manager = SubscriptionManager(receive_maximum) | ||
# self.message_queue = asyncio.Queue(maxsize=receive_maximum or 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's remove all old comments
class TopicFilter: | ||
def __init__(self, topic_filter: str): | ||
self.levels = topic_filter.split("/") | ||
TopicFilter.validate_topic(self.levels) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have some reason to call validate_topic
by naming TopicFilter
?
It will generate some unexpected behavior if you will inherit from it (if you will inherit and will want to overwrite validate_topic, also you will need to overwrite __init__
and call super
and call validate_topic
manually):
class B(TopicFilter):
@staticmethod
def validate_topic(*args):
print("Never happens")
return super().validate_topic(*args)
B("my-awesome-topic/bla")
Hi! Is there still any interest in getting this PR ready to be merged? I'm currently implementing my own asyncio wrapper around this library. My main motivations are to be able to:
I use this wrapper to implement the command-response pattern using the MQTT5 response topic/correlation data headers. For 1) I'm simply tracking the I think the two goals are common enough that this library could support its users by providing an appropriate (async) interface. I may be able to contribute some of the necessary changes either to this PR or separately if you are open to alternative approaches. I'd just like to ask you about your general opinion on the overall style (i.e. full asyncio wrapper as in this PR or additional |
Hi, thanks for interest in GMQTT 🎉
Yes, this is the right way how it could be implemented, but you need to be careful how to manage reconnection and other staff because there is the possibility of leaving leftovers
It's quite tricky because even if the server implements v5, it might not support subscription identifiers at all (see About style, @Lenka42 could you support 😉 |
Yeah that's definitely something to consider. I'm using
Yes, that's another thing that could be tested when connecting. I understand that there is a potential conflict between flexibility and ease-of-use. Users should be still able to decide, which headers/properties are sent to the broker. I also can't tell how many v5 capable brokers do or do not support subscription IDs, at least mosquitto does 😁 |
Work in progress - Don't merge yet!
This PR presents and alternative API that makes full use of Python's async/await feature like other well-known python async libraries such as websockets and aiohttp. The client connection is handled by an async context manager, and event handling is achieved with the
await
keyword instead of callback functions:Converting callbacks to
await
-ablesCallbacks can be converted to awaitable objects, which simplifies the user's code significantly:
Before
After
I also spotted a few areas in the codebase that makes heavy use callbacks that can be simplified using this method.
How
The various callback functions are made awaitable with a
asyncio.Future
object. This is how theon_connect
callback can be adaptedProgress
multiple threading / possible concurrency errors (Future
isn't threadsafe)publish
andsubscribe
gmqtt.connect
Arguments: Encryption, Authentication, Reconnection