Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Line order of comments and code is not preserved #348

Closed
rileyschack opened this issue Jan 6, 2023 · 9 comments · Fixed by #375
Closed

Line order of comments and code is not preserved #348

rileyschack opened this issue Jan 6, 2023 · 9 comments · Fixed by #375
Labels
enhancement New feature or request style for issues that change the sqlfmt style
Milestone

Comments

@rileyschack
Copy link

Describe the bug

The line order of comments and code is not preserved after formatting.

  • When using leading commas, comments can be "shifted" upwards to the preceding CTE/column.
  • For short select statements, all comments are moved above the select keyword instead

To Reproduce

with table_a as (
    select
        /* Notice that this select statement can fit on a single line without comments */
        col1,
        col2, -- col2
        /* Special column */
        special_column,
    from {{ ref("table_a" )}}
)

/* Some interesting comments above a CTE with a leading comma */
, table_b as (
    select *
    from {{ ref("table_b") }}
)

select *
from table_a, table_b

Expected behavior

The columns in the table_a CTE should continue to stay on different lines to preserve the meaning of the comments. The comment above the table_b CTE should also stay above it, despite using a leading comma.

with
    table_a as (
        select 
            /* Notice that this select statement can fit on a single line without comments */
            col1,
            col2, -- col2 
            /* Special column */
            special_column
        from {{ ref("table_a") }}
    ),
    /* Some interesting comments above a CTE with a leading comma */
    table_b as (select * from {{ ref("table_b") }})

select *
from table_a, table_b

Actual behavior

Because the select statement in the table_a CTE can fit on a single line, all comments were pushed above the CTE (including in-line comments).

The comment above the table_b CTE was also pushed above the table_a CTE (presumably because of the leading comma).

with
    /* Notice that this select statement can fit on a single line without comments */
    -- col2
    /* Special column */
    /* Some interesting comments above a CTE with a leading comma */
    table_a as (select col1, col2, special_column, from {{ ref("table_a") }}),
    table_b as (select * from {{ ref("table_b") }})

select *
from table_a, table_b

Additional context
What is the output of sqlfmt --version?

❯ sqlfmt --version
sqlfmt, version 0.14.3
@tconbeer
Copy link
Owner

tconbeer commented Jan 6, 2023

Thanks for the report. I actually think this is two separate issues. I'll address leading commas in my next comment

For the column comments, I agree this is confusing, and not very desirable. I'm not sure I like the alternatives more, though? Alternatives being:

  1. Don't merge lines if they contain comments.
  2. Don't merge lines if they contain an inline comment (i.e., ok to merge across standalone comments, but not comments that trail code on a line)
  3. Don't merge lines if they contain a standalone comment (i.e., ok to merge across inline comments, like black)
  4. Don't merge lines if they contain more than one comment
  5. Something more complicated that takes the length of comments into account?

Of these options, (1) seems best to me, but that can lead to some undesirable outcomes, also.

FWIW, Black's behavior here is weird, and hard to predict, but for short code and inline comments it'll one-line everything. Generally, it's very conservative around comments:

$ black -
a = (
1 # one
+ 2 # two
)
a = 1 + 2  # one  # two
reformatted -

All done! ✨ 🍰 ✨
1 file reformatted.

$ black -
a = ( # a
1 # one
+ 2 # two
)
a = 1 + 2  # a  # one  # two
reformatted -

All done! ✨ 🍰 ✨
1 file reformatted.

$ black -
a = (
# a
1 # one
+ 2 # two
)
a = (
    # a
    1  # one
    + 2  # two
)
reformatted -

All done! ✨ 🍰 ✨
1 file reformatted.

$ black -
a = ( # a can have a longer comment that might take up a whole line
1 # one short comment
+ 2 # another comment
)
a = (  # a can have a longer comment that might take up a whole line
    1 + 2  # one short comment  # another comment
)
reformatted -

All done! ✨ 🍰 ✨
1 file reformatted.

$ black -
a = (
1 # one longer comment that will prevent this whole thing from being one-lined
+ 2 # two
)
a = (
    1  # one longer comment that will prevent this whole thing from being one-lined
    + 2  # two
)
reformatted -

All done! ✨ 🍰 ✨
1 file reformatted.

$ black -
a = (
    # a longer comment that is on its own line, inside the parens
1 # one
+ 2 # two
)
a = (
    # a longer comment that is on its own line, inside the parens
    1  # one
    + 2  # two
)
reformatted -

All done! ✨ 🍰 ✨
1 file reformatted.

@tconbeer
Copy link
Owner

tconbeer commented Jan 6, 2023

Re: leading commas, I'm wondering how special this case is? If it's just commas, we could "attach" the comment to the next token, but I'm wondering if there are other situations where we get this wrong?

@tconbeer tconbeer added style for issues that change the sqlfmt style bug Something isn't working labels Jan 9, 2023
@IanEdington
Copy link

IanEdington commented Jan 16, 2023

Came here with the same issue. For anyone dealing with this issue right now, the very bad, no good, temporary workaround that I found was to use fmt: off :(

with table_a as (
    select
        -- fmt: off
        /* Notice that this select statement can fit on a single line without comments */
        col1,
        col2, -- col2
        /* Special column */
        special_column
        -- fmt: on
    from {{ ref("table_a" )}}
)

/* Some interesting comments above a CTE with a leading comma */
, table_b as (
    select *
    from {{ ref("table_b") }}
)

select *
from table_a, table_b

becomes

with
    table_a as (
        select
        -- fmt: off
            /* Notice that this select statement can fit on a single line without comments */
        col1,
        col2,  -- col2
            /* Special column */
        special_column
        -- fmt: on
        from {{ ref("table_a") }}
    /* Some interesting comments above a CTE with a leading comma */
    ),
    table_b as (select * from {{ ref("table_b") }})

select *
from table_a, table_b

@tconbeer
Copy link
Owner

@IanEdington of the options I listed above, do you have a preference for how this should work?

@IanEdington
Copy link

I personally would prefer "Don't merge lines if they contain comments." Both the existence of the new line and the comment are extra work that indicate the writer wants to convey more context on those lines.

Writing this required me to write out 3 additional and 27 additional characters. This would indicate that these lines are a lot more complex than the code communicates.

a = ( # num apples
1 # eating time
+ 2 # bugs consume
)

Neither of these communicate the same thing as the above code block

a = 1 + 2  # num apples # eating time # bugs consume

# num apples
# eating time
# bugs consume
a = 1 + 2 

I could see the argument for doing this (if it was sql) but I still prefer the multiple lines.

a /* num apples */ in ( 1, /* eating time */  2 /* bugs consume */ )

This would be nice

a = ( # num apples
	1 # eating time
	+ 2 # bugs consume
)

@tconbeer
Copy link
Owner

Ok, I'm agreeing with you. We're definitely not doing this:

a /* num apples */ in ( 1, /* eating time */  2 /* bugs consume */ )

Will need to think through edge cases, like this:

myfunc(a, b, # comment on myfunc or b?
c,
d
)

Because both of these could make sense to me:

myfunc( # comment on myfunc or b?
    a, b, c, d
)
myfunc (
    a,
    b, # comment on myfunc or b?
    c,
    d
)

And this will also require special-casing leading commas.

TBD if this can be done with the current Line-based implementation of Comment (I hope so), or if we have to refactor so Comments are attached to Nodes.

@rileyschack
Copy link
Author

I think you need to isolate if the comment is in-line or not. I imagine that's where the complexity is added. FWIW, I'm a proponent of avoiding in-line comments unless absolutely necessary (though I can understand not everyone shares that same philosophy).

If a comment is on its own line, then it needs to stay on its own line and preserve the original line order. Then you can apply the formatting for in-line comments afterwards--which may include combining comments on multiple lines.

@tconbeer
Copy link
Owner

@rileyschack

I think you need to isolate if the comment is in-line or not. I imagine that's where the complexity is added.

No, we already do this, actually.

If a comment is on its own line, then it needs to stay on its own line and preserve the original line order. Then you can apply the formatting for in-line comments afterwards--which may include combining comments on multiple lines.

This is kind of the opposite of the direction I would have gone in, which is telling me that we probably shouldn't be messing with comments at all.

@IanEdington
Copy link

If a comment is on its own line, then it needs to stay on its own line and preserve the original line order. Then you can apply the formatting for in-line comments afterwards--which may include combining comments on multiple lines.

I 💯 agree with this. If I took the time to put a comment on a new line. I want it on a new line and I'm going to make sure it's on a new line even if I have to do something ridiculous like this

where --
    -- if the anchovy becomes deactivate the data will stop flowing
    a.is_active --
    -- this allows the a.active state to be updated
    or a.last_deactivated_at >= getdate() - interval '1 months' --
    -- when an anchovy is newly created keep updating their data for 6 months
    or b.created_at >= getdate() - interval '6 months'
;

Actual code (except we don't actually work with anchovies) in our codebase in order to get around our current autoformatter that puts all comments as inline comments regardless of how long they are. The trailing -- keeps it from doing this.

@tconbeer tconbeer added enhancement New feature or request and removed bug Something isn't working labels Jan 25, 2023
tconbeer added a commit that referenced this issue Jan 25, 2023
@tconbeer tconbeer added this to the v0.16.0 milestone Jan 26, 2023
tconbeer added a commit that referenced this issue Jan 27, 2023
* feat: revisit comments, close #348

* chore: improve unit testing and docs

* fix: allow merging if lines end in trailing inline comment

* fix: allow merging ahead of inline comments'

* fix: allow merging of blank lines after inline comments

* fix: keep inline comments inline

* fix: strip whitespace when splitting comments onto multiple lines

* fix: merge through comments if directly after standalone operator

* chore: bump primer refs

* fix: rename split param to no_tail

* fix: improve comment placement with trailing operators; self-review

* chore: bump primer ref for http
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request style for issues that change the sqlfmt style
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants