Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: internal/FormatWriter simplify two regexen; fix pipe char bug #1924

Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -368,7 +368,9 @@ class FormatWriter(formatOps: FormatOps) {
tupleOpt.fold(text) {
case (pipe, indent) =>
val spaces = getIndentation(indent)
getStripMarginPattern(pipe).matcher(text).replaceAll(spaces)
getStripMarginPattern(pipe)
.matcher(text)
.replaceAll(s"\n${spaces}$$1")
}
}

Expand Down Expand Up @@ -729,13 +731,30 @@ object FormatWriter {
}
}

private val trailingSpace = Pattern.compile("\\h+$", Pattern.MULTILINE)
/**
* No Unicode is necessary in the three (3) Pattern.compile() below.
* scalafmt parser will have already have objected.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you saying that nothing other than space and tab are allowed by scalac or scalameta parser that precedes formatting?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't be allowed according to specs.

A cursory examination of the scalameta sources led to this. So, it matches.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After another long day of creating test cases and trying to be perverse, it turns out
that trailing whitespace with Unicode whitespace is allowed in comments. As cited,
it is rejected in non-comment code.

I am about to do a commit which handles, tests, & succinctly (I hope) documents.

The discussion here helps.

* https://github.com/scala/scala/blob/2.11.x/spec/01-lexical-syntax.md
* gives the recognized scala whitespace characters.
* "Whitespace characters. \U0020 | \U0009 | \U000D | \U000A"
* (original text has lowercase u. U substituted here to allow scalafmt
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this comment relevant to the code? :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had thought so, but that section is superseded now. After I commit, let me know if
I can improve the new text. I try to get enough across so that I can remember six months
down the road and so that a colleague can read it without complaining about "War & Peace".

* to format this file.)
*
* Of the characters which the parser will pass, only space (\u0020)
* and tab (\u0009) are horizontal whitespace.
* Save execution time by not checking what the character could not be.
*
* */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be */ or **/ instead of * */?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Being a visitor and it not being my usual coding style, for consistency I did a
"monkey see, monkey do" of several other comments in the same file.

Yes, the trailing "* */" is not a variant I have seen in recent memory,

That comment is re-worked in the commit I am about to submit, using my
usual "//" style. If the concept of the comment is good, I can adapt the
comment style to anything you suggest. Small dragons.


private val trailingSpace =
Pattern.compile("[\\x20\\t]+$", Pattern.MULTILINE)

private def removeTrailingWhiteSpace(str: String): String = {
trailingSpace.matcher(str).replaceAll("")
}

private val leadingAsteriskSpace =
Pattern.compile("(?<=\\n)\\h*(?=[*][^*])")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why allowing match on multiple asterisks is equivalent? I tried yesterday, there was a test which failed but looks like for you it didn't.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took the requirement that " *X" gets replaced and " **" (double asterisk) does not as
"Do not even think about breaking this" lore. The match in the capture group
should match anything but what would be a second asterisk [^*].

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, was looking at the code on my phone, didn't see the [^*]. never mind.

\n vs \\n, any difference?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

\n vs \\n, any difference?

thanks for adding tests. this question above is the last one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: \n vs \\n, any difference?

Pardon me for being concrete, no disrespect intended. The concrete answer
is more of my inability to organize my thoughts above that. I have spent
hours, days, or, more probably weeks, being hosed by exactly this issue.
The recurrence interval between episodes of my being hosed is just slightly
longer than my memory of the solution. Mea culpa!

It is my understanding that for 'normal' strings \n is always used and gets translated
to one standard Java two-octet Character.

In s-interpolated strings (and possibly/probably other interpolations) things get
a bit more confusing, at least for me. I believe that a \n gets translated as for
a simple string. In \\n the first backslash quotes the second backslash, which
then becomes \n which Pattern.compile interprets as though the original
string had been \n.

So, both work in interpolated strings, unless you have evidence or experience to
the contrary. I tend to prefer the single \n because it has fewer characters and transformations to parse.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: \n vs \\n, any difference?

So, both work in interpolated strings, unless you have evidence or experience to
the contrary. I tend to prefer the single \n because it has fewer characters and transformations to parse.

like \\s, \\h or other special escape sequences, i was curious whether \\n is interpreted by Pattern as anything different than simple 0xA. clearly, \n is directly substituted as 0xA so pattern only sees that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re: tests

I found StripMargin.stat and added tests for only the align aspect. I did not address
the no align case, which is a weasel, driven by limited time.

I had expected to find tests for leading and trailing spaces. I could not find anything relevant
after a number of trials: using find for filenames and/or grepping for "whitespace" and
combinations.

I did find a Unicode.stat and could slip in a case for leading and trailing horizontal
whitespace containing Unicode there. It would seem more integrated to have
it in a leadingWhitespace.spec if such a thing exists. Would, IMHO, make it easier
for future visiting devos, such as myself, to find.

Off the top of your head, do you have any clues or suggestions.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Off the top of your head, do you have any clues or suggestions.

I added one tab in the default/Comment test, see at the bottom:

https://github.com/scalameta/scalafmt/pull/1911/files

you can create a dedicated whitespace test but i considered it part of handling comments and strings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kitbellew The info you gave helped me to get to the next round. Thank you.

All tests, new & previous, pass on my system. I will check CI later.

The tests are still sensitive to some future devo removing the now invisible whitespace
especially the tabs.

I think the three regex expressions are as simple & fast as the requirements and reasonable
end use allows. I left a comment in FormatWriter.scala explaining why. Figured that might
endure long after the present conversation is forgotten.

I suspect that there are a couple of bugs remaining in scalafmt relating to tabs. I will see in
my mainline daily work if I can trick them from coming out of the dim shadows.

Unit tests prove their worth, much as they are a pain to write. The hard time I had
getting unit tests to accept Ogham marks and unicode \uMumble convinced me
that a simple "space or tab" test is all that is needed. There is nothing wrong with
"\h", just that it will spend a lot of time testing against characters which can not be
in the input. Sorry if I lead you down a loosing path.

If you like the current work, I can update & buff the commit message and change the
title.

private val leadingAsteriskSpace = Pattern.compile("\n[\\x20\\t]*\\*([^*])")

private def formatComment(
comment: T.Comment,
indent: Int
Expand All @@ -746,7 +765,9 @@ object FormatWriter {
val isDocstring = comment.syntax.startsWith("/**") && style.scalaDocs
val spaces: String =
getIndentation(if (isDocstring) (indent + 2) else (indent + 1))
leadingAsteriskSpace.matcher(comment.syntax).replaceAll(spaces)
leadingAsteriskSpace
.matcher(comment.syntax)
.replaceAll(s"\n${spaces}*$$1")
} else {
comment.syntax
}
Expand All @@ -756,9 +777,11 @@ object FormatWriter {
@inline
private def getStripMarginPattern(pipe: Char) =
if (pipe == '|') leadingPipeSpace else compileStripMarginPattern(pipe)

@inline
private def compileStripMarginPattern(pipe: Char) =
Pattern.compile(s"(?<=\\n)\\h*(?=[$pipe])")
Pattern.compile(s"\n[\\x20\\t]*(\\${pipe})")

private val leadingPipeSpace = compileStripMarginPattern('|')

}
14 changes: 13 additions & 1 deletion scalafmt-tests/src/test/resources/default/Comment.stat
Original file line number Diff line number Diff line change
Expand Up @@ -125,7 +125,7 @@ object a {
/*

/**
* comment
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you remove this because you create a separate test case for that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly. The case seemed to be multiplexed in a non-obvious (to me way).
I de-multiplexed the concerns.

Creating a second test with a title which described its primary (and sole) concerned
seemed to be a gift to future people trying to suss this stuff.
The added complexity of a second test seemed worth the benefit.

* comment
*/

def undefined = ???
Expand All @@ -142,3 +142,15 @@ object a {
def undefined = ???
*/
}
<<< remove spaces and tabs in leading horizontal whitespace
object A {
/*
* remove spaces & 2 tabs (\t) in leading horizontal whitespace
*/
}
>>>
object A {
/*
* remove spaces & 2 tabs (\t) in leading horizontal whitespace
*/
}
25 changes: 25 additions & 0 deletions scalafmt-tests/src/test/resources/spaces/TrailingWhiteSpace.stat
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
maxColumn = 80
# Devos: be careful this file contains essential but invisible spaces.
<<< Remove trailing spaces from variables
object A {
val a = 1
val b = 2
}
>>>
object A {
val a = 1
val b = 2
}
<<< Remove trailing tab-blank-tab combinations from variables
object B {
val a = 1
val b = 2
}
>>>
object B {
val a = 1
val b = 2
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would you mind remove the 3 blank lines at the end?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thank you. So you have something against trailing whitespace ;-)
Guess I had expected scalafmt to remove those for me. Must not be set in
the project .scalafmt.conf or I bungled it.



66 changes: 66 additions & 0 deletions scalafmt-tests/src/test/resources/test/StripMargin.stat
Original file line number Diff line number Diff line change
Expand Up @@ -208,3 +208,69 @@ object a {
| updated_at = now()
| where id in (${audienceIds.mkString(",")})""".stripMargin
}
<<< align, pipe character: RIGHT PARENTHESIS
align.stripMargin = true
===
object a {
s"""
) update targeting_segments
) set status = '$status',
)${lastBuiltOpt.fold("")(dt => s" last_built = '${formatDate(dt)}',")}
)${sizeCondition.fold("")(cond => s" customers = $cond,")}
) updated_at = now()
) where id in (${audienceIds.mkString(",")})""".stripMargin(')')
}
>>>
object a {
s"""
) update targeting_segments
) set status = '$status',
)${lastBuiltOpt.fold("")(dt => s" last_built = '${formatDate(dt)}',")}
)${sizeCondition.fold("")(cond => s" customers = $cond,")}
) updated_at = now()
) where id in (${audienceIds.mkString(",")})""".stripMargin(')')
}
<<< align, pipe character: DOLLAR SIGN
align.stripMargin = true
===
object a {
s"""
$$ update targeting_segments
$$ set status = '$status',
$$${lastBuiltOpt.fold("")(dt => s" last_built = '${formatDate(dt)}',")}
$$${sizeCondition.fold("")(cond => s" customers = $cond,")}
$$ updated_at = now()
$$ where id in (${audienceIds.mkString(",")})""".stripMargin('$')
}
>>>
object a {
s"""
$$ update targeting_segments
$$ set status = '$status',
$$${lastBuiltOpt.fold("")(dt => s" last_built = '${formatDate(dt)}',")}
$$${sizeCondition.fold("")(cond => s" customers = $cond,")}
$$ updated_at = now()
$$ where id in (${audienceIds.mkString(",")})""".stripMargin('$')
}
<<< align, pipe character: REVERSE SOLIDUS a.k.a. backslash
align.stripMargin = true
===
object a {
s"""
\ update targeting_segments
\ set status = '$status',
\${lastBuiltOpt.fold("")(dt => s" last_built = '${formatDate(dt)}',")}
\${sizeCondition.fold("")(cond => s" customers = $cond,")}
\ updated_at = now()
\ where id in (${audienceIds.mkString(",")})""".stripMargin('\\')
}
>>>
object a {
s"""
\ update targeting_segments
\ set status = '$status',
\${lastBuiltOpt.fold("")(dt => s" last_built = '${formatDate(dt)}',")}
\${sizeCondition.fold("")(cond => s" customers = $cond,")}
\ updated_at = now()
\ where id in (${audienceIds.mkString(",")})""".stripMargin('\\')
}