Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

generic topic validation and parsing #36

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Conversation

Tieske
Copy link
Contributor

@Tieske Tieske commented Nov 3, 2021

see tests for parsing examples.

@Tieske
Copy link
Contributor Author

Tieske commented Nov 3, 2021

not sure where this should go, it's now in init.lua. Should it go into client.lua maybe?

possibly every subscription could be parsed by the client, and the parsed results passed along in the event handler.

thoughts?

@Tieske Tieske force-pushed the topics branch 4 times, most recently from 5ba0510 to 525351a Compare November 4, 2021 07:54
@xHasKx
Copy link
Owner

xHasKx commented Nov 4, 2021

not sure where this should go, it's now in init.lua. Should it go into client.lua maybe?

init.lua is ok as you've added a whole-module functions.

possibly every subscription could be parsed by the client, and the parsed results passed along in the event handler.

thoughts?

Actually I'm not sure these validation functions is required at all.
I mean, for subscribe and publish packets the topic should be just a string, some broker validates it, some not, and some can even provide custom wildcards/filters in that topics. So there is no need for the client-side validation because good broker will do this for us and return appropriate SUBACK/PUBACK code.
And for custom brokers the client-side topic validation can do an obstacle for the library user.

IMO, only the last method to check topic and mask match can be frequently used in the client code - in the incoming message handler. There might be added a new client method like this:

client:add_msg_handler(topic_mask, callback) similar to the client:subscribe() method, adding not a SUBACK callback, but a PUBLISH handler called only for messages with matching topic.

@Tieske
Copy link
Contributor Author

Tieske commented Nov 4, 2021

Actually I'm not sure these validation functions is required at all.

agreed they are not, and I would not implement them in the client itself. They are just convenience functions for user-code that wants to validate user-input.

Btw; the parser function needs to compile, and the compile function needs to validate, so 3 out of the 4 are needed. So I guess the last one (to validate non-wildcarded topics) is then just for completeness sake.

@Tieske
Copy link
Contributor Author

Tieske commented Nov 4, 2021

client:add_msg_handler(topic_mask, callback) similar to the client:subscribe() method, adding not a SUBACK callback, but a PUBLISH handler called only for messages with matching topic.

yeah, I was thinking along the same lines. But in that case, who will be responsible to call the ACK on the received message?

@xHasKx
Copy link
Owner

xHasKx commented Nov 4, 2021

client:add_msg_handler(topic_mask, callback) similar to the client:subscribe() method, adding not a SUBACK callback, but a PUBLISH handler called only for messages with matching topic.

yeah, I was thinking along the same lines. But in that case, who will be responsible to call the ACK on the received message?

This is not related. I think, like in usual message handler, ack should be done explicitly like this: https://github.com/xHasKx/luamqtt/blob/master/examples/simple.lua#L38

@Tieske
Copy link
Contributor Author

Tieske commented Nov 4, 2021

exactly, but then you'd always need 2 handlers, a generic one that does the ACK, and another one matching a specific topic mask.

Still learning on the MQTT stuff; why doesn't the client send the ACK by default? why relying on user-code to do that?

@xHasKx
Copy link
Owner

xHasKx commented Nov 4, 2021

exactly, but then you'd always need 2 handlers, a generic one that does the ACK, and another one matching a specific topic mask.

I think user can use subscribe handlers in 3 ways:

  • set handlers only topic-specific (new hypothetical method client:add_msg_handler(topic_mask, callback))
  • set only one generic handler (current method client:on("message", callback))
  • set both handlers

For now I think it will be better to leave user always responsible for sending ack. And if he will use last case with handlers set in both ways and send ack twice accidentally - this will not be a MQTT protocol violation (as I remember). Maybe some brokers will close connection in this case but I think it's uncommon.

Still learning on the MQTT stuff; why doesn't the client send the ACK by default? why relying on user-code to do that?

For example, MQTT can be used to distribute some tasks between workers, and if worker can't handle some task - it may not send ack to indicate failure (in the other words, worker will not send ack untill task is successfully done). Thus the task will stay in a queue and thus can be handled by some different worker. This scenario requires a persistent mqtt session (clean=false)

I suppose there might be a client option for __init() method enabling auto-ack for all messages, but I think it should be disabled by default.

@xHasKx
Copy link
Owner

xHasKx commented Nov 4, 2021

For example, user can use topic-specific handler to execute it's logic and do ack, and at the same time also use generic client:on() message handler for logging without ack.

@Tieske
Copy link
Contributor Author

Tieske commented Nov 4, 2021

For example, MQTT can be used to distribute some tasks between workers, and if worker can't handle some task - it may not send ack to indicate failure (in the other words, worker will not send ack untill task is successfully done). Thus the task will stay in a queue and thus can be handled by some different worker. This scenario requires a persistent mqtt session (clean=false)

That makes sense, thx for the explanation

@Tieske
Copy link
Contributor Author

Tieske commented Aug 6, 2022

@xHasKx any chance we can move this forward? anything required from me?

I rebased the PR on master to it get up-to-date

.editorconfig Show resolved Hide resolved
-- @param t (string) wildcard topic to validate
-- @return topic, or false+error
function mqtt.validate_subscribe_topic(t)
if type(t) ~= "string" then
Copy link
Owner

@xHasKx xHasKx Aug 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please follow such errors checking behavior:

* passing invalid arguments (like number instead of string) to function in this library will raise exception

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since it is validation, the output of a validation function should be usable as input for an assertion. Hence this function does not throw errors itself, and returns the input value if valid.

This is exactly what I implemented for the other functions, as they should throw errors, as documented.

mqtt/init.lua Outdated Show resolved Hide resolved
mqtt/init.lua Show resolved Hide resolved
-- @param t (string) topic to validate
-- @return topic, or false+error
function mqtt.validate_publish_topic(t)
if type(t) ~= "string" then
Copy link
Owner

@xHasKx xHasKx Aug 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please follow such errors checking behavior:

* passing invalid arguments (like number instead of string) to function in this library will raise exception

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same; result is input for assertions

mqtt/init.lua Outdated
local pattern = opts.pattern
if not pattern then
local ptopic = opts.topic
if not ptopic then
Copy link
Owner

@xHasKx xHasKx Aug 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please follow such errors checking behavior:

* passing invalid arguments (like number instead of string) to function in this library will raise exception

example:
assert(value_type == "string", "expecting uri to be a string")

mqtt/init.lua Outdated
end
end
end
if not pattern:find("%(%.[%-%+]%)%$$") then -- pattern for "#" as last char
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please move that pattern string (and its related strings from compile_topic_pattern function) into the local variables in the init.lua file to look like constants as they are related very implicitly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. moved them into narrow scoped module locals.

@xHasKx
Copy link
Owner

xHasKx commented Aug 8, 2022

@xHasKx any chance we can move this forward? anything required from me?

I rebased the PR on master to it get up-to-date

@Tieske, I posted some comments (change requests) on the commits in the PR, please take a look

@xHasKx
Copy link
Owner

xHasKx commented Aug 8, 2022

And I'm still concerned about mqtt.topic_match function. It's arguments are complicated and it's doing two tasks at the same time:

  1. check that the topic matches the subscribe-topic-filter string
  2. parse part of the topic at wildcards places

IMO, we can break that into two functions, first one is just compile_topic_pattern+string.match call, and the second one can be just a helper to split a particular topic string with / char into an array (table)

What do you think?

@xHasKx
Copy link
Owner

xHasKx commented Aug 8, 2022

IMO, we can break that into two functions, first one is just compile_topic_pattern+string.match call, and the second one can be just a helper to split a particular topic string with / char into an array (table)

Actually string.match will return the words at the wildcards of the topic, so your topic_match function is just adding keys to them. Does it actually required?

@Tieske
Copy link
Contributor Author

Tieske commented Aug 8, 2022

Pushed a commit with the review comments (probably easiest to review that commit without whitespace changes).

  • the validators don't ever throw errors, see comments above. Let me know if you want me to change that behaviour anyway.
  • throw errors on bad input when matching or compiling patterns
  • updated ldoc comments
  • moved patterns into narrow scoped module locals
  • made the error messages more descriptive

@Tieske
Copy link
Contributor Author

Tieske commented Aug 8, 2022

Actually string.match will return the words at the wildcards of the topic, so your topic_match function is just adding keys to them. Does it actually required?

No, it is not required. It is just convenience to get named arguments/parameters, so user code can be more readable. I have a number of 'id' type fields, which I hate to have in code as array indices, using names is so much more convenient for that. That's the whole purpose I guess (and caching the compiled pattern).

IMO, we can break that into two functions, first one is just compile_topic_pattern+string.match call, and the second one can be just a helper to split a particular topic string with / char into an array (table)

compile + string_match would not work, since there would be no structure where the compiled pattern can be stored (cached) for future use. So that would have to be implemented in user code, and then this function would be reduced to only a string.match call, so a function solely for that would not make sense to me.

That said, I do have doubts about the return values, or the entire interface. Currently I do not like the 2 return values, makes it cumbersome to handle.
Also the function call with the topic and options, since the options table stores the cached pattern. Maybe the options table should become a class, from which a "topic" can be instantiated, that would hold all the named values. That would make the usercode more natural to retain the "class" instead of an options table.

something like;

local topic_matcher = mqtt.create_topic_matcher {
    topic = "homes/+/+/#",
    "homeid",
    "roomid",
    "varargs",
}

local fields, err = topic_matcher("homes/myhome/living/mainlights/brightness")

print(fields.homeid)       --> "myhome"
print(fields.roomid)       --> "living"
print(fields.varargs[1])   --> "mainlights"
print(fields.varargs[2])   --> "brightness"

This would enable adding a new incoming message callback per topic-pattern, where the msg parameter would already contain the parsed fields from the topic.

@xHasKx
Copy link
Owner

xHasKx commented Aug 9, 2022

@Tieske, I wrote some comments and continue review tomorrow

@xHasKx
Copy link
Owner

xHasKx commented Aug 10, 2022

@Tieske, what do you think about that form:

local topic_matcher = mqtt.topic_matcher { -- without "create_" it looks like a class constructor calling
    topic = "homes/+/+/#",
    "homeid",
    "roomid",
}

local fields, err = topic_matcher("homes/myhome/living/mainlights/brightness")

print(fields.homeid)       --> "myhome"
print(fields.roomid)       --> "living"
print(fields[1])           --> "myhome"
print(fields[2])           --> "living"
print(fields[3])           --> "mainlights"
print(fields[4])           --> "brightness"

It will simplify mqtt.topic_matcher constructor args validation so you don't have to check that amount of number-indexed keys are equals to the /+/ topic wildcards.

Another variant is to explicitly provide a key to store array of all unnamed topic words like that:

local topic_matcher = mqtt.topic_matcher { -- without "create_" it looks like a class constructor calling
    topic = "homes/+/+/#",
    "homeid",
    "roomid",
    rest = "varargs",
}

local fields, err = topic_matcher("homes/myhome/living/mainlights/brightness")

print(fields.homeid)       --> "myhome"
print(fields.roomid)       --> "living"
print(fields[1])           --> "myhome"
print(fields[2])           --> "living"
print(fields.varargs[1])   --> "mainlights"
print(fields.varargs[2])   --> "brightness"

@Tieske
Copy link
Contributor Author

Tieske commented Aug 10, 2022

It will simplify mqtt.topic_matcher constructor args validation so you don't have to check that amount of number-indexed keys are equals to the /+/ topic wildcards.

if we adjust compile_topic_pattern to return as second and third return values the number of + and # found, that check could actually be very simple.

This option:

print(fields.homeid)       --> "myhome"
print(fields.roomid)       --> "living"
print(fields[1])           --> "myhome"
print(fields[2])           --> "living"
print(fields[3])           --> "mainlights"
print(fields[4])           --> "brightness"

has as a drawback that you can't immediately tell where the varargs start in the array part. Hence I would prefer your second option;

print(fields.homeid)       --> "myhome"
print(fields.roomid)       --> "living"
print(fields[1])           --> "myhome"
print(fields[2])           --> "living"
print(fields.varargs[1])   --> "mainlights"
print(fields.varargs[2])   --> "brightness"

But here the user needs an extra field name for the varargs, to prevent namecollisions between the field names and the varargs field.

Leading me to:

print(fields.homeid)       --> "myhome"
print(fields.roomid)       --> "living"
print(fields[1])           --> "mainlights"
print(fields[2])           --> "brightness"

as the simplest version, clearly separating the known fields from the varargs, whilst making varargs size and traversal easy as the array part. And no extra user specified name required.

Comment on lines +161 to +170
function mqtt.compile_topic_pattern(t)
t = assert(mqtt.validate_subscribe_topic(t))
if t == "#" then
t = MATCH_ALL
else
t = t:gsub("#", MATCH_HASH)
t = t:gsub("%+", MATCH_PLUS)
end
return "^"..t.."$"
end
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to return the number of fields and presence of varargs;

Suggested change
function mqtt.compile_topic_pattern(t)
t = assert(mqtt.validate_subscribe_topic(t))
if t == "#" then
t = MATCH_ALL
else
t = t:gsub("#", MATCH_HASH)
t = t:gsub("%+", MATCH_PLUS)
end
return "^"..t.."$"
end
function mqtt.compile_topic_pattern(t)
t = assert(mqtt.validate_subscribe_topic(t))
if t == "#" then
return MATCH_ALL, 0, true
end
local fields, vararg
t, vararg = t:gsub("#", MATCH_HASH)
t, fields = t:gsub("%+", MATCH_PLUS)
return "^"..t.."$", fields, vararg ~= 0
end

@xHasKx
Copy link
Owner

xHasKx commented Aug 11, 2022

Leading me to:

I suggested rest = "varargs" because of that possible case:

local topic_matcher = mqtt.topic_matcher {
    topic = "homes/+/+/#",
    "homeid",
    -- "roomid", -- note only one name for +
    ...
}

Where the second + wildcard are without given name.
In your case its value from the topic will be lost.

And we have to find a proper form for the case when user don't have names for +-es at all.

Leading me to:

I suggested rest = "varargs" because of that possible case:

local topic_matcher = mqtt.topic_matcher {
    topic = "homes/+/+/#",
    "homeid",
    -- "roomid", -- note only one name for +
    ...
}

Where the second + wildcard are without given name.
In your case its value from the topic will be lost.

And we have to find a proper form for the case when user don't have names for +-es at all.

We can define some default name for the rest = "varargs" to store all +-wildcards in the root returned object and all #-wildcards into it's subfield with that default name.

Or accept that it's ok to return two values from the matcher function - first table with +-es with all values as indexes and some of them with names, and the second table with #-s as an array.
What do you think?
We can define some default name for the rest = "varargs" to store all +-wildcards in the root returned object and all #-wildcards into it's subfield with that default name.

Or accept that it's ok to return two values from the matcher function - first table with +-es with all values as indexes and some of them with names, and the second table with #-s as an array.
What do you think?

@Tieske
Copy link
Contributor Author

Tieske commented Aug 11, 2022

Where the second + wildcard are without given name.
In your case its value from the topic will be lost.

is that expected to be a common case? I’d expect "all fields named" to be the common case. We can just throw an error if there aren’t enough field names?

Or we could inject numbered names field 1 etc

@xHasKx
Copy link
Owner

xHasKx commented Aug 15, 2022

I would say, we can't predict which case will be more common. The only thing I'm sure is that the best interface is the one that hard to use in the wrong way unintentionally.

We can do one step back and ask how that topic_matcher can be used?
I mean, users start their code from the client instance - creating it, then connecting and subscribing it to some topics.
Then somewhere has to be a callback function with arrived PUBLISH message in its argument. And that message's topic can be parsed using the topic_matcher instance.
But who should call the topic_matcher instance - the library before calling callback or the user itself inside the callback code?

I mean, user can call some new method client:add_msg_handler(topic_filter, matcher_options, callback) where matcher_options is an args for the topic_matcher instance creation. Then for the each new PUBLISH messages callback will be called in that way:

callback(publish_message, topic_fields, topic_varargs_fields)

In this case for the the topic_matcher instance will be better to return 2 values - +-wildcards with optional names and #-wildcards as two separate tables (by the way, the second table also can contain user-defined names).

In the case of two tables in resulf of the topic_matcher, it's better to always set array-keys into the table and optionally add user-defined names.

(and of course users can create and use the topic_matcher instances separately, in the only one publish messages callback)

What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants