This repository has been archived by the owner on Apr 26, 2024. It is now read-only.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Require body for read receipts with user-agent exceptions #11157
Require body for read receipts with user-agent exceptions #11157
Changes from 2 commits
a3137f2
e10adca
68e2a02
23c255a
b14d63d
ec2e9c6
779e5f3
db2aa61
af8e8b5
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#11156 forgot to mention, but this should be restricted to user agents that contain
Android
(could use"Android" in user_agent
).I would also be tempted to collapse
Element
andSchildiChat
together:re.search("(Element|SchildiChat)/1.[012].*", user_agent)
Note that
re.search
will match anywhere in the string (this is different tore.match
, which matches only at the beginning).re.search(".*Riot.*", user_agent)
can then bere.search("Riot", user_agent)
instead.But at that point, you may as well just write
"Riot" in user_agent
since we don't need any regex powers to do that :).Also note that in
re.search("(Element|SchildiChat)/1.[012].*", user_agent)
,.
matches any character, so this would also matchElement/110
which may not be what we were after.Out of interest, this would be what a combined pattern looks like:
r"(Riot|(Element|SchildiChat)/1\.[012]\.).*Android"
(This also checks that it's an Android client: here's an example string from my device
Element/1.2.2 (Linux; U; Android 9; ...; MatrixAndroidSDK_X 0.0.1)
)(if you're not aware of this trick: I use
r"
so that I don't have to escape backslashes by writing\\
instead of\
).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooh! Good catch on the
Android
part. So maybe we should wrap that whole thing and do something like:Does that look right? Should we make the pattern a module-level constant to prevent excess calls to compile?
This will exclude a tiny number of weird builds, like
Element dbg/1.1.8-dev
orSchildiChat[f]/1.2.2.sc44
, but they're sufficiently uncommon.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems right to me. Perhaps we ought to have a few test cases to ensure we don't mess this up big time.
I can't remember when I read this, but iirc you're better off using
re.compile
once or not at all:re.match
and friends have an internal cache for compiled patterns, whichre.compile
bypasses (or did at the time).My personal preference is indeed to make it a module-level constant and guarantee that it's compiled only once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one makes me a little suspicious: does the
[f]
perhaps stand for F-Droid?Maybe we ought to be nice and allow some wiggle room:
pattern = re.compile(r"(?:Element|SchildiChat)[^/]*/1\.[012]\.")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked again, and all the
[f]
and[g]
SchildiChats in the logs are version 1.2.2 or later, so they can be excluded from this pattern. There was oneSchildiChat.Beta[f]/1.2.0.sc42-test1
which would presumably break, but again, not worth the complexity for that one weirdo with an old test build :)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I was wondering about the periods and whether those (and other various parts of the string) had to be exact too.
What's the rationale behind using
re.search
vs.re.compile
? It looks likere.search
searches for substrings by default whilere.compile
can match exactly as written, right?Perhaps the weird builds could just be specifically white-listed if push comes to shove and if there aren't too many of them.
Thanks for pointing that out!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you might mean
re.match
where you sayre.compile
.To use a regular expression like
a.*b
, there
module has to first build a "finite state machine". That machine does all the work (working out if there's a match; where it is; what the capture groups captured, ....). Creating the machine has a cost to it. After it's built, the machine can be reset and used again, without paying that build cost.Python offers
re.compile
so you can build the state machine once and re-use it, and that's what @callahad proposes above.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @DMRobertson ! Makes a lot of sense. I do see now that re.search was the incorrect choice for regex function. I had worked out the difference between
re.search
andre.match
but you explained the difference betweenre.search/re.match
andre.compile
, which is what I was looking for.To summarize (to the extent of my knowledge), in
re.compile
, there
module builds the finite state machine and doesn't have to be built again (but its state can be changed when called), while each call ofre.match
includes the building of a finite state machine each time (likere.compile
, thenpattern.match
), taking up runtime/memory."at the time?" Like during recent testing? (Apologies in advance if I'm asking too many questions about this.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @reivilibre meant "at the time I read the article".
FWIW there's https://docs.python.org/3/howto/regex.html#module-level-functions which writes:
https://stackoverflow.com/a/61603344/5252017 has more to say. Apparently the cache size is currently 512 patterns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. That's helpful!