Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix:Polynomial-time regular expressions #2814

Merged
merged 2 commits into from
Jul 25, 2023
Merged

Fix:Polynomial-time regular expressions #2814

merged 2 commits into from
Jul 25, 2023

Conversation

marcotc
Copy link
Member

@marcotc marcotc commented Apr 25, 2023

This PR fixes two cases where there was a possibility of polynomial-time being required to process regular expressions in ddtrace.

These regular expressions have been modified but still match their desired input.

The main changes were:

  • Use Possessive Quantifiers (*+ in this case): this means that * will not allow backtracing after it has matched a character, even if this means that the complete regex match will fail.
  • Reduce the character matching set from "anything" (.) to "not ]" ([^\]]) when matching IPv6 addresses. This is import because the next character match in the regular expression is ], which means . might try to match it but not [^\]].

@marcotc marcotc requested a review from a team April 25, 2023 18:28
@github-actions github-actions bot added core Involves Datadog core libraries tracing labels Apr 25, 2023
Copy link
Contributor

@delner delner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine. Where are the unit tests for these functions? Just want to make sure we're already covering their expected inputs/outputs (such that we're not breaking behavior.)

Secondarily, does it make sense to put any benchmarks/perf tests around these as well? To ensure additional alterations in the future would make regressions in performance visible.

@marcotc marcotc changed the title Fix:Polinomial-time regular expressions Fix:Polynomial-time regular expressions Jun 26, 2023
@marcotc
Copy link
Member Author

marcotc commented Jun 26, 2023

Looks fine. Where are the unit tests for these functions? Just want to make sure we're already covering their expected inputs/outputs (such that we're not breaking behavior.)

I double checked, and yes, these 2 cases are appropriately covered by unit testing.

Secondarily, does it make sense to put any benchmarks/perf tests around these as well? To ensure additional alterations in the future would make regressions in performance visible.

Any regular expression can be pathological if it includes a polynomial-time pattern.
So, technically, all regexs would have to undergo performance testing.
Because it's possible know if a regular expression takes polynomial-time to process by analyzing the pattern itself, we use CodeQL to protect us.
CodeQL recently introduced polynomial-time regular expression protection for Ruby, thus the reason these regular expressions have existed in our code base, despite CodeQL being enabled for a while.

From now on, it should not be possible to commit polynomial-time regular expressions without CodeQL catching it.

@marcotc marcotc requested review from delner and a team June 26, 2023 23:13
@marcotc marcotc added this to the 1.13.0 milestone Jul 20, 2023
@marcotc marcotc merged commit defd6bc into master Jul 25, 2023
@marcotc marcotc deleted the regex-fix branch July 25, 2023 21:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Involves Datadog core libraries tracing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants