Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Puppet highlighter: Fix over-greedy regexp detection #978

Merged
merged 1 commit into from
Jun 16, 2016

Conversation

nfagerlund
Copy link
Contributor

The regex token was allowing /regexps/ to start after the word node or after
any non-word character. This ends up being too greedy! For example, take this
common code, using variable interpolation in a string that represents a file
path:

file { "${master_config_dir}/.ssh":
  ensure => file,
  owner  => 'jenkins',
  group  => 'jenkins',
  mode   => '0600',
}

The highlighter will start a regexp at /.ssh, in the middle of the string, and
continue for however many lines it takes to reach another slash.

So instead, let's be more conservative about where we might find a regexp. I
suggest:

  • After node.
  • After one of the following characters: ~=([{,
    • This catches usage in variable assignment, the =~/!~ operators,
      function calls, case statement and selector blocks, arrays, and data type
      objects that accept parameters (Pattern[/.../], etc.).
  • After => (and the lesser-used +>) for hashes and resource attributes.
  • At the start of a line, for, e.g., the LHS of the in operator. (I'm not 100%
    sure we should cover this case for a simple highlighter like Prism, and it's
    the one I'd most expect to cause problems later, but... I think it's ok...)

This commit appears to fix the worst of the mid-string blowouts.

The `regex` token was allowing /regexps/ to start after the word `node` or after
any non-word character. This ends up being too greedy! For example, take this
common code, using variable interpolation in a string that represents a file
path:

	file { "${master_config_dir}/.ssh":
	  ensure => file,
	  owner  => 'jenkins',
	  group  => 'jenkins',
	  mode   => '0600',
	}

The highlighter will start a regexp at `/.ssh`, in the middle of the string, and
continue for however many lines it takes to reach another slash.

So instead, let's be more conservative about where we might find a regexp. I
suggest:

- After `node`.
- After one of the following characters: `~=([{,`
	- This catches usage in variable assignment, the `=~`/`!~` operators,
	  function calls, case statement and selector blocks, arrays, and data type
	  objects that accept parameters (`Pattern[/.../]`, etc.).
- After `=>` (and the lesser-used `+>`) for hashes and resource attributes.
- At the start of a line, for, e.g., the LHS of the `in` operator. (I'm not 100%
  sure we should cover this case for a simple highlighter like Prism, and it's
  the one I'd most expect to cause problems later, but... I think it's ok...)

This commit appears to fix the worst of the mid-string blowouts.
@nfagerlund
Copy link
Contributor Author

Pinging @Golmote b/c you submitted the original highlighter component.

@Golmote
Copy link
Contributor

Golmote commented Jun 16, 2016

Hi @nfagerlund!
Your PR looks ok to me, and it does not break any existing tests. Merging now. ;)

@Golmote Golmote merged commit 105be25 into PrismJS:gh-pages Jun 16, 2016
@nfagerlund
Copy link
Contributor Author

yaaaay, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants