Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MM-34926] Short-circuit link processing if the regex is non-terminal #166

Merged
merged 2 commits into from
Aug 2, 2021

Conversation

mickmister
Copy link
Contributor

@mickmister mickmister commented Jul 21, 2021

Summary

When the autolink's pattern contains .*, the process indefinitely hangs due to continuously processing an empty non-nil match. #166 (comment)

Ticket Link

Fixes https://mattermost.atlassian.net/browse/MM-34926

@codecov-commenter
Copy link

codecov-commenter commented Jul 21, 2021

Codecov Report

Merging #166 (e4d873a) into master (9f8f884) will increase coverage by 0.17%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #166      +/-   ##
==========================================
+ Coverage   41.40%   41.57%   +0.17%     
==========================================
  Files           6        6              
  Lines         669      671       +2     
==========================================
+ Hits          277      279       +2     
  Misses        375      375              
  Partials       17       17              
Impacted Files Coverage Δ
server/autolink/autolink.go 52.08% <100.00%> (+1.01%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9f8f884...e4d873a. Read the comment docs.

@mickmister
Copy link
Contributor Author

@srkgupta Do you have any other regexes that will reproduce this issue that I can add to the unit test?

@@ -108,6 +108,12 @@ func (l Autolink) Replace(message string) string {
if submatch == nil {
break
}

// The beginning of the submatch is equal to the end of the submatch here. The regex pattern is non-terminal.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this true? Can you explain what is happening here? I don't quite understand.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can also use an explanation of the if below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@crspeller @levb

Before the loop begins, the post's message is converted to a []byte and stored in the local variable in. Another []byte is created to store the output of the replace process.

In the loop, the regexp package's FindSubmatchIndex function is called with the input array, and the first submatch found is processed. The FindSubmatchIndex function returns a []int that contains indexes of submatches. Each pair of two entries in the array represents the start and end index of each submatch (exclusive, meaning the end index will be equal to the length of the match if the start index is 0). The exit condition of the loop is when FindSubmatchIndex returns nil, which only happens when no match is found.

For the autolink's algorithm, only the first submatch matters at any given time, because the algorithm processes the submatch, and then moves its cursor forward in the in array and sees if there are more submatches to process in the next iteration of the loop. Because of this, only the first two elements in FindSubmatchIndex's return value are ever important to the autolink replace algorithm at any given time.

When the autolink algorithm is provided a regex pattern such as .*:

The first run through the loop causes the entire input string to match. The first element in FindSubmatchIndex's return value is 0 and second is the length of the message. Line 119 results in the in array being empty now, since the whole string was matched and submatch[1] == len(in) in this case. Note that there is no condition checked at the end of the loop to see if we should leave the loop for any reason. The only condition to leave is "did FindSubmatchIndex return nil?" at the beginning of the loop.

When we run through the loop again, the argument given to FindSubmatchIndex is an empty array, and it returns the value [0, 0], meaning "the match starts at index zero, and ends at index zero", which means it matched on an empty string. Since submatch is not nil here, the loop moves on, though results in a no-op for everything else below in the loop since in is empty. The loop continues and does the same calculation over and over, finding the same empty submatch from the .* pattern indefinitely.

The solution I've made here is "if a submatch is found but has no length, we have nothing to process, so we should exit the loop." A better solution may be to check if len(in) == 0 at the end of the loop.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation! I think you are right that checking the length would be better. Maybe more intuitive at the beginning of the loop?

@srkgupta
Copy link
Contributor

@srkgupta Do you have any other regexes that will reproduce this issue that I can add to the unit test?

Hi @mickmister
I am checking other regexes. I will test and let you know if I find anything else breaking the logic.

Copy link
Contributor

@srkgupta srkgupta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested the changes and the reported issue is working fine now. Tested with multiple other regexes and no other issues were found. Thanks @mickmister. LGTM

@mickmister mickmister requested review from crspeller and levb and removed request for levb July 27, 2021 14:42
Copy link
Contributor

@levb levb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

@hanzei hanzei added the 4: Reviews Complete All reviewers have approved the pull request label Aug 2, 2021
@mickmister mickmister merged commit 4506199 into master Aug 2, 2021
@mickmister mickmister deleted the wildcard-regex-timeout branch August 2, 2021 16:01
@hanzei hanzei added this to the v1.3.0 milestone Aug 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
4: Reviews Complete All reviewers have approved the pull request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants