Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Codeblock Syntax Highlighting using Pygments #1012

Closed
wants to merge 7 commits into from

Conversation

Rohitth007
Copy link
Collaborator

@Rohitth007 Rohitth007 commented Apr 27, 2021

This PR implements syntax highlighting for markdown code blocks by creating a padded "box" which would
visually separate as code. Currently, only looks good in dark theme.

Commit 1: themes: codehilite: Adding Pygments style to the palette.

  • This commit uses Pygments internal dictionaries and Pygments
    Material Style and adds it into the palette after the Screen is
    initialized. This is done inside add_codehilite_style and called
    in Controller.__init__.

Commit 2: boxes: codehilite: Implement syntax highlighting functionality.

  • Basic syntax highlighting functionality was implemented in
    soup2markup by extracting the css_style of each span tag in the
    soup and using it as urwid style attributes. The remaining
    NavigableStrings are given a style attribute of w which is the
    whitespace css class in Pygments.

Commit 3: boxes: codehilite: Create a visual "box" by padding whitespaces.

  • This commit moves the previous code into parse_codeblock, a static
    method which creates a visual "box" by:
    • Finding the longest line using code_soup.text.
    • Appending span text into markup
    • Adding code_indent part of the NavigableString to markup.
    • Adding right_padding to each line.
    • Making sure multiple empty lines are padded using padded_line.
    • Adding the remaining NavigableStrings not contributing to line
      breaks into markup.
  • The NavigableString is added in parts and not all at once because
    though it would produce the same output, it would later help while
    implementing hl_lines preventing the case where half the previous
    line is highlighted along with the line expected.

Commit 4: boxes: codehilite: Implement padding for Plain Codeblocks.

  • Plain Codeblock have to be treated differently as they don't come in
    the usual multiple span's in a pre tag package. The whole code
    is a NavigableString which does not start with \n.
  • Hence, if this String has multiple lines it is split and padded
    accordingly.

Commit 5: boxes: codehilite: Restrict padding to avoid wrapping.

  • This commit sets a limit to how much padding is to be added. Many a
    times most of the code has smaller lines and there is one line which
    is very long which causes all the lines to wrap and this makes the
    "box" look very messy. To avoid this a MAX_NO_WRAP_WIDTH constant
    of 80 is used which is a reasonable limit for in a full screen
    no autohide mode.

@zulipbot zulipbot added the size: L [Automatic label added by zulipbot] label Apr 27, 2021
@neiljp
Copy link
Collaborator

neiljp commented Apr 27, 2021

@Rohitth007 This doesn't run for me - you don't seem to have added the dependency information in a commit? I could guess, but please add it as a separate commit for now based on what you have installed locally.

@Rohitth007
Copy link
Collaborator Author

@neiljp Strange, can I know what the error was because I haven't installed anything separately. Pygments installs as a dependency to pudb

@Rohitth007 Rohitth007 force-pushed the codehilite branch 2 times, most recently from 2997db6 to cdd0a40 Compare April 28, 2021 14:03
@Rohitth007
Copy link
Collaborator Author

Currently I have added the colors into the palette after the screen is initialized. I think it's better to do it inside run.py like how monochrome is generated.

@Rohitth007
Copy link
Collaborator Author

I have added Pygments as a dependency into setup.py and also fixed the edge case for old messages without code tag.

@Rohitth007
Copy link
Collaborator Author

@zulipbot add "feedback wanted"

@zulipbot zulipbot added size: XL [Automatic label added by zulipbot] and removed size: L [Automatic label added by zulipbot] labels Jun 5, 2021
@zulipbot zulipbot added size: L [Automatic label added by zulipbot] and removed size: XL [Automatic label added by zulipbot] labels Jun 5, 2021
Copy link
Collaborator

@neiljp neiljp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Rohitth007 This works much better for me now 👍

I can still observe some bugs, which might be clearer where they come from once you add some tests and if you can perhaps simplify the padding algorithm: (these are more/only visible somewhere after the latter 3 commits)

  • See first line of last message by 'Mehdi Hamidi' in #test here>codeblock (user list is disrupted)
  • See message by 'Vishwesh JainKuniya' in #test here>code on July 31 2017 (bottom line content is absent)

I think the first three commits are close to merging as they are, pending at least the following:

  • given we're modifying the behavior for code, I'd like to see the soup2markup test updated for this
  • using the pygments: namespace to avoid accidental style name conflicts
  • improving the commit title/text to match the styles we have (eg. see the requirements lines we use for dependencies)
  • clarifying what the 'old' vs 'new' messages categories are
  • I'm not sure what we can do about inline code styles, but we likely want them to match in some way at least

zulipterminal/ui_tools/boxes.py Outdated Show resolved Hide resolved
zulipterminal/ui_tools/boxes.py Outdated Show resolved Hide resolved
zulipterminal/config/themes.py Outdated Show resolved Hide resolved
max_line_len = len(max(code_text.split("\n"), key=lambda x: len(x)))

line_len = 0
for code_element in code_soup.contents:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic here looks very complex, and I can't tell if it's doing the correct thing. Can you loop over lines instead?

Copy link
Collaborator Author

@Rohitth007 Rohitth007 Jun 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@neiljp No that is not possible, we have to parse element wise. Sometimes that element can contain only one character only. I think if we split lines using code_text we would loose out on tag information(css class, element). I can write a docstring explaining the codeblock format?

@neiljp neiljp added PR awaiting update PR has been reviewed & is awaiting update or response to reviewer feedback and removed feedback wanted labels Jun 6, 2021
@zulipbot zulipbot added size: XL [Automatic label added by zulipbot] and removed size: L [Automatic label added by zulipbot] labels Jun 22, 2021
@Rohitth007 Rohitth007 changed the title [WIP] Add Codeblock Syntax Highlighting using Pygments Add Codeblock Syntax Highlighting using Pygments Jun 22, 2021
@Rohitth007 Rohitth007 requested a review from neiljp June 22, 2021 15:57
@Rohitth007
Copy link
Collaborator Author

Rohitth007 commented Jun 22, 2021

@neiljp
PR updated, tests added and ready for another review.
I have simplified the code a bit and added comments and doc-strings to help understand the HTML structure.

Minor addition: Used Pygments style for inline code

@Rohitth007 Rohitth007 added PR needs review PR requires feedback to proceed and removed PR awaiting update PR has been reviewed & is awaiting update or response to reviewer feedback labels Jun 22, 2021
Copy link
Collaborator

@neiljp neiljp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Rohitth007 I'm focusing on the first three first, since they should be able to be merged together first, possibly with the new last commit if we reorder them?

The tests are a good improvement, and also document what tags are doing (see minor inline comment).

My biggest concern for the first 3 (and maybe last one?) is the style behavior, including at different depths - both ensuring they're consistent with what we have now (monochrome; see inline comment) and have good contrast with the surrounding text (like now). Things like python, js or bash tagged code look colored great, but with no language tag it just looks like regular text, not a code block - and so it doesn't stand out from the surrounding text that well. I understand we expect to have theme-specific pygments styles, but we want it to keep working well close to how it is for now.

I'd also like to have a reference for that change in syntax (<code>) in a comment - various parts of the community may know the origin of that if it's not obvious in discussions/logs.

In the next iteration, improving the commit title/text to match the styles we have will make it easier and quicker to merge commits (I think I mentioned that above)

zulipterminal/config/themes.py Outdated Show resolved Hide resolved
Comment on lines 1402 to 1374
"<code><span>def</span> <span>func</span><span>():</span>\n"
' <span class="pg">print</span><span>()</span>\n'
"\n"
"<span>class</span> <span>New</span><span>:</span>\n"
' <span>name</span> <span>=</span> <span>"name"</span>\n'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only one class here looks odd - there are many others in code from the server?

For the second test you could have a smaller code example, if you're concerned about space.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to test multiple empty lines and spaces, hence the longer tests. I don't think it matters what "text/syntax" is used for testing here as pygments is not directly involved.

[
("pygments:w", "This is a\n Plain\n\n Codeblock\n"),
],
id="codehilite-plain-codeblock",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be useful to clarify "plain" - no language?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have used plain-text-codeblock instead

@Rohitth007 Rohitth007 force-pushed the codehilite branch 4 times, most recently from 01ea552 to b1203ea Compare July 15, 2021 16:03
@Rohitth007 Rohitth007 added PR needs review PR requires feedback to proceed and removed PR awaiting update PR has been reviewed & is awaiting update or response to reviewer feedback labels Jul 16, 2021
@Rohitth007 Rohitth007 force-pushed the codehilite branch 2 times, most recently from 6575c89 to e65b748 Compare July 16, 2021 06:25
This commit adds Pygments dependency (>=2.81) to `setup.py`.
NOTE: Pygments is also a dependency for Pudb.
This commit uses Pygments internal dictionaries and PygmentsStyles
to add it into the palette right after theme generation. For this a
META dict is defined in each theme file which contains syntax
highlighting styles and overrides.

The conversion to urwid compatible theme format is done inside
`add_pygments_style` and called in `generate_theme`. The function
does the following:
* Checks if any overrides are necessary.
* Checks if any parent pygment style can be used i it is not defined
* Adds new_style to urwid_theme.

Tests added.
* REQUIRED_META added to test completeness.
This commit creates a custom 16 color syntax theme for use in all
16 color depth ZT themes.

Tests updated.
This commit extracts the `code_soup` to be further parsed for css
styles. Each `codehilite` associated tag has a `pre` tag wrapped
around a `code` tag which contains `span` tags and NavigableStrings.
In some old messages, the `code` tag doesn't exist which is also
dealt with.

Basic syntax highlighting functionality was implemented in
soup2markup by extracting the `css_style` of each `span` tag in the
soup and using it as urwid style attributes. The remaining
NavigableStrings are given a style attribute of `w` which is the
whitespace css class in Pygments.

Tests amended.
@Rohitth007 Rohitth007 force-pushed the codehilite branch 2 times, most recently from d9d6855 to 6ae3ac3 Compare August 9, 2021 16:18
This commit moves the previous code into `parse_codeblock`, a static
method which creates a visual "box" by:
* Finding the longest line using `code_soup.text`.
* Appending `span` text into markup
* Adding the NavigableStrings not contributing to line
breaks into markup.
* Adding `right_padding` to the end of each line.
* Making sure multiple empty lines are padded using `padded_line`.
* Adding `code_indent` part of the NavigableString to markup at the
start of a new line.

The NavigableString is added in parts and not all at once because
though it would produce the same output, it would later help while
implementing `hl_lines` preventing the case where half the previous
line is highlighted along with the line expected.

Tests amended.
This commit sets a limit to how much padding is to be added. Many a
times most of the code has smaller lines and there is one line which
is very long which causes all the lines to wrap and this makes the
"box" look very messy. To avoid this a `MAX_NO_WRAP_WIDTH` constant
of 70 is used which is a reasonable limit for in a full screen
no autohide mode.
This comment introduces the same style used for plain codeblocks
for inline code.

Tests amended.
@zulipbot
Copy link
Member

Heads up @Rohitth007, we just merged some commits that conflict with the changes your made in this pull request! You can review this repository's recent commits to see where the conflicts occur. Please rebase your feature branch against the upstream/main branch and resolve your pull request's merge conflicts accordingly.

@zulipbot
Copy link
Member

Heads up @Rohitth007, we just merged some commits that conflict with the changes your made in this pull request! You can review this repository's recent commits to see where the conflicts occur. Please rebase your feature branch against the upstream/main branch and resolve your pull request's merge conflicts accordingly.

@neiljp
Copy link
Collaborator

neiljp commented Aug 23, 2022

@Rohitth007 I'm thinking of close this, since some of these commits were merged already. I'd suggest you open a new PR with the remaining work after fetching and rebasing, so we can focus on those.

That said, I know it wasn't clear what to do regarding long lines and we ended up with various options based on spaces, which wraps badly. Do you think a widget-based approach might work better?

neiljp pushed a commit to neiljp/zulip-terminal that referenced this pull request Mar 21, 2023
This comment introduces the same style used for plain codeblocks
for inline code.

Test amended.

Adapted from the last commit of zulip#1012 by neiljp.
neiljp pushed a commit to neiljp/zulip-terminal that referenced this pull request Mar 21, 2023
This ensures similar rendering of the two code formats, by using the style
used for plain codeblocks for inline code: "pygments:w".

Test amended.

Adapted from the last commit of zulip#1012 by neiljp.
neiljp pushed a commit that referenced this pull request Mar 22, 2023
This ensures similar rendering of the two code formats, by using the style
used for plain codeblocks for inline code: "pygments:w".

Test amended.

Adapted from the last commit of #1012 by neiljp.
@neiljp
Copy link
Collaborator

neiljp commented Mar 22, 2023

@Rohitth007 Thanks for all the work on the syntax highlighting! 👍

I've just merged #1346 with a version of the last commit from here, which sets the inline style to match codeblocks, though I adjusted the existing theme style name rather than removing it, since we still use it elsewhere.

We discussed flattening the edges of the code blocks in a separate PR, but I don't believe anything came of that, so I've opened #1347 for that remaining element. That points back here for reference, so I'm going to close this now, since I recall that was the remaining element we hadn't merged.

@neiljp neiljp closed this Mar 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: colors/styles/themes area: message rendering has conflicts PR needs review PR requires feedback to proceed size: XL [Automatic label added by zulipbot]
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants