Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Sixel images in conhost #17421

Merged
merged 21 commits into from
Jul 1, 2024

Conversation

j4james
Copy link
Collaborator

@j4james j4james commented Jun 11, 2024

Summary of the Pull Request

This PR introduces basic support for the Sixel graphics protocol in
conhost, limited to the GDI renderer.

References and Relevant Issues

This is a first step towards supporting Sixel graphics in Windows
Terminal (#448), but that will first require us to have some form of
ConPTY passthrough (#1173).

Detailed Description of the Pull Request / Additional comments

There are three main parts to the architecture:

  • The SixelParser class takes care of parsing the incoming Sixel DCS
    sequence.
  • The resulting image content is stored in the text buffer in a series
    of ImageSlice objects, which represent per-row image content.
  • The renderer then takes care of painting those image slices for each
    affected row.

The parser is designed to support multiple conformance levels so we can
one day provide strict compatibility with the original DEC hardware. But
for now the default behavior is intended to work with more modern Sixel
applications. This is essentially the equivalent of a VT340 with 256
colors, so it should still work reasonably well as a VT340 emulator too.

Validation Steps Performed

Thanks to the work of @hackerb9, who has done extensive testing on a
real VT340, we now have a fairly good understanding of how the original
Sixel hardware terminals worked, and I've tried to make sure that our
implementation matches that behavior as closely as possible.

I've also done some testing with modern Sixel libraries like notcurses
and jexer, but those typically rely on the terminal implementing certain
proprietary Xterm query sequences which I haven't included in this PR.

@DHowett
Copy link
Member

DHowett commented Jun 11, 2024

holy shit

@DHowett
Copy link
Member

DHowett commented Jun 11, 2024

James, I just derailed a team meeting to direct everyone's attention over here :D

bool _eraseCells(const til::CoordType columnBegin, const til::CoordType columnEnd);

til::size _cellSize;
std::vector<RGBQUAD> _pixelBuffer;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that I'm using an RGBQUAD here just because it was convenient for the GDI renderer, and I figured the Atlas renderer might not care what format it's given. But if this turns out to be a problem, it shouldn't be that big a deal to change to something more generic.

@DHowett
Copy link
Member

DHowett commented Jun 11, 2024

image

@zadjii-msft
Copy link
Member

okay the bar for the best PR of the year is now very, very high

Comment on lines 226 to 231
auto eraseIterator = _pixelBuffer.begin() + eraseOffset;
for (auto y = 0; y < _cellSize.height; y++)
{
std::fill_n(eraseIterator, eraseLength, RGBQUAD{});
std::advance(eraseIterator, _pixelWidth);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The for loop advances the iterator by _cellSize.height * _pixelWidth in total which is equal to the _pixelBuffer size. If the eraseOffset is greater than 0, the final loop iteration will leave the iterator past the end of the _pixelBuffer. Since the MSVC STL uses checked iterators this results in a debug assertion.

One way to fix this:

auto eraseIterator = _pixelBuffer.begin();
for (auto y = 0; y < _cellSize.height; y++)
{
    std::fill_n(eraseIterator + eraseOffset, eraseLength, RGBQUAD{});
    std::advance(eraseIterator, _pixelWidth);
}

FYI fill_n isn't super duper optimal for clearing bytes in this loop and you may be better off using memset directly. Mostly because it skips an unnecessary emptiness check. Something like this:

auto eraseIterator = _pixelBuffer.data() + eraseOffset;
for (auto y = 0; y < _cellSize.height; y++)
{
    memset(eraseIterator, 0, eraseLength * sizeof(RGBQUAD));
    eraseIterator += _pixelWidth;
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for bringing this up. It never occurred to me that it would be a problem using std::advance passed the end of the buffer, but it makes perfect sense in retrospect. I think I only starting using it because I was getting complaints from the auditor about pointer arithmetic in some places, so I'll be happy to go back to +=. If there are still pointer arithmetic warnings I'll find another way to deal with that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out += isn't any better than std::advance since it still has the debug asserts. But I found that switching to raw pointers - and continuing to use the std::advance for incrementing - was enough to keep everyone happy. Debug build doesn't appear to assert anymore, and audit still passes.

src/terminal/adapter/SixelParser.cpp Show resolved Hide resolved
@DHowett
Copy link
Member

DHowett commented Jun 11, 2024

image

I went and dug up "xserver-SIXEL", which needed some help to build (it was last touched ~10 years ago), but... it uh works

@lhecker
Copy link
Member

lhecker commented Jun 12, 2024

Same day delivery: d7b002f

After working with this branch for a bit, I think the most important feedback I can give is that I think it'd be preferable if we shared a single shared_ptr<ImageSlice> across multiple rows:

  • GPUs have a minimum allocation granularity of usually 64KiB (hence the usage of sprite atlasses). The D2D for that is somewhat awkard. Popular texture allocation algorithms also usually work better for textures with an aspect ratio closer to 1. Not a big deal, but maybe an annoyance we can avoid?
  • More importantly, if a slice with a height of 20px gets stretched to a row height of 25px, then the first and last pixel will be aligned with the top and bottom of the row. Same for the neighboring rows and slices atop and below. This means that the "pixel density" at the edges of rows will be higher than within them. This leads to an "unevenness" in the image quality.
  • My understanding of the kitty image protocol is that it submits complete images (raw RGB, PNG, etc.). If we don't share a single buffer between rows, I imagine that the PNG support in particular would be awkward to implement.

I'm happy to help with any parts of this implementation as needed. 🙂


As an aside, I continue to think that the usage of std::function for DCS paired with the character-wise parsing is not quite ideal. To be clear, I don't think it's urgent. It's more like a hunch. I'm not basing this on any benchmarks, and what's not measured is not real (or whatever the actual saying was). 😅

Reason being, we never parse the same type of DCS simultaneously per VT parser instance, right? So, couldn't we call the parse method of each corresponding implementation directly and keep the state as a struct around forever? This would turn the dynamic into static dispatch. Additionally, it would be excellent for performance if we could pass entire strings of input to each parser (tighter = faster parsing loops). The parser could for instance indicate to the caller whether it's done parsing and at which character offset.

@j4james
Copy link
Collaborator Author

j4james commented Jun 13, 2024

Same day delivery: d7b002f

Works like a dream! It's hard to tell, because I think most of the processing time is spent in the parser, but this does seem faster to me than the GDI renderer.

I think the most important feedback I can give is that I think it'd be preferable if we shared a single shared_ptr<ImageSlice> across multiple rows:

I expected you'd probably want to rewrite a lot of the buffer management code, and you're welcome to do so, but I'm not sure this particular change makes sense, and I fear you may not understand exactly how sixel graphics are intended to work. Most importantly, the kitty image protocol is fundamentally incompatible with sixel. If you intend to implement the kitty protocol, you'll need to manage its image content separately. It works in a totally different way.

The way the hardware graphics terminals work, the screen is essentially one large bitmap, and everything writes to that bitmap. You output some text, that's drawn onto the bitmap. You output some image content, that's drawn on top of it. Because you don't want a garbled mix of text when you write over existing text, it always clears the cell that it's writing to. But that also means that text output over an area of image content, will first punch a hole in that image.

This behavior makes it easy for us to emulate the giant bitmap concept with a combination of text that is overlaid with an image only where needed. But it's important to understand that the text and image are representing one and the same thing. When you scroll the text, you're scrolling the image too. If you insert a line in the middle of the screen, the image content below that line is going to move down along with the text. This behavior just works automatically when you have image slices attached to rows.

The kitty protocol is completely different, though. Each graphics sequence generates a new image object (potentially at least), and these images are completely separate from the text (and from each other). If you insert a line in the middle of the screen, it'll shift the text down, without necessarily having any affect on images that cover that text.

And kitty images can appear both above and below the text, and can also move independently of each other. That means you need to keep track of every kitty sequence that is output, and have some sort of protocol to manage when they're deleted if you're running low on memory. With sixel you don't have this problem, because multiple sixel sequences will just keep writing to the same image buffer.

@j4james
Copy link
Collaborator Author

j4james commented Jun 13, 2024

As an aside, I continue to think that the usage of std::function for DCS paired with the character-wise parsing is not quite ideal.

I forgot to reply to this. And I definitely agree that the current implementation isn't ideal, but the performance also isn't terrible, so it's not a huge priority for me. I personally care more about the missing VT functionality than I do about speed. For me the competition is 20 year old hardware running at 19200 baud. 😀 But I certainly wouldn't object to anyone else rewriting the DCS handler. I actually thought it might come up as part of #17336.

@lhecker
Copy link
Member

lhecker commented Jun 13, 2024

[...] and I fear you may not understand exactly how sixel graphics are intended to work.

This is definitely true. In particular I'm not sure I really understand yet how the "cell size" for sixels work. After my initial, cursory reading of the PR my understanding of _imageOriginCell and _imageCursor was that the SixelParser does effectively process a rectangular image that may span multiple (potentially many) ROWs. Is that incorrect? Based on that I had assumed that it would similarly be possible to store the sixel image as a single RGBA blob shared across multiple rows.

When you scroll the text, you're scrolling the image too. If you insert a line in the middle of the screen, the image content below that line is going to move down along with the text.

I wasn't aware it worked like that! I guess we could slice up the image whenever we encounter a VT/Console scroll sequence/command - we do the same for ROWs after all (via GetScratchpadRow(), etc.). But... hmm. I think I somewhat understand the problem now! It would be great if there was something we could do, but it's non-trivial. BTW if the renderer had access to all ImageSlices at once, I suppose it could stitch them together? That would also work for me... But it's overall a minor issue, I think.

(BTW thanks for explaining the differences with the kitty protocol!)


More importantly, it'd be great if we could still add an "ID" to the ImageSlice. This isn't very important for local rendering, but for Remote Desktop for instance: If we don't properly cache sixels there, it will stream the entire RGBA contents over the internet on every frame.

One option would be to just have a static std::atomic<uint64_t> counter. Then we bump the counter whenever we call GetMutableImageSlice or construct a new instance? (Or maybe some other place - I haven't read the PR in complete detail yet.)

With this ID we can then key the ID2D1Bitmap cache. 🙂

@j4james
Copy link
Collaborator Author

j4james commented Jun 14, 2024

I'm not sure I really understand yet how the "cell size" for sixels work.

The cell size is necessary to emulate the behavior of a real sixel terminal on which the character cell is a specific size, typically 10x20 pixels. Software intended to run on those terminals would often have predefined sixel images which would be expected to occupy a fixed area of the screen. For example, an image that is 100x200 pixels would be expected to occupy exactly 10x10 character cells.

Linux terminals can't handle that, and as a result are mostly useless as terminal emulators. And to cope with terminals like that, modern sixel software is forced to query the terminal cell size and regenerate images on the fly to match whatever font the user happens to have selected at the time. No other software works this way, but for some reason people seem to think this is a good idea for terminals.

So my expectation was that at some point, someone would request we do the same thing in Windows Terminal, and we might need an option to match the behavior of Linux terminals (i.e. have the cell size exactly match the active font). Although once initialized at a given size, it should remain at that size for the duration of the session if you expect it to behave sensibly (it could potentially readjust after an RIS, though, or possibly even when clearing the screen).

the SixelParser does effectively process a rectangular image that may span multiple (potentially many) ROWs. Is that incorrect?

The sixel protcol itself doesn't necessarily produce rectangular output, but in most cases it probably would be a fixed width, and the way I've currently implemented it we typically reserve an area of the buffer that is as wide as the widest row for a given sequence (there are some edge cases where that isn't necessarily true though).

But the problem with thinking of sixel as a simple image format is that it's often not used that way. Imagine for example that you're creating a painting application that allows the user to scribble all over the screen with their mouse. The way you'd implement that with sixel is that almost every pixel plot would be a new sixel sequence.

For a standard 80x24 screen, that's going to use up 24 image slices at most. But if you're storing each sixel sequence as a separate image object, that's potentially 384000 images! And that's assuming your scribbling only touched each pixel once. In practice it'll likely be much worse than that. Any terminal with that architecture is either going to rapidly die, or drop content. It's just not practical.

BTW if the renderer had access to all ImageSlices at once, I suppose it could stitch them together?

That's definitely an option. I actually tried that at one point in the GDI renderer, because I thought fewer GDI calls might be faster, but it didn't seem to make any difference for me. If anything I think the overhead of combining the slices might have made it slightly slower, so in the end I just stuck with the simpler solution. But you may still find it's worthwhile in the Atlas renderer.

More importantly, it'd be great if we could still add an "ID" to the ImageSlice.

Once this is merged (assuming it does get merged), you're welcome to make whatever changes you think are necessary. And if you still think you can make it work with larger images spanning rows, that's fine too. I just wanted to make sure you were aware of the potential challenges of that approach.

@hackerb9
Copy link

No other software works this way, but for some reason people seem to think this is a good idea for terminals.

@j4james is, as usual, completely correct, but I believe he may be speaking a little tongue-in-cheek.

Click here for some boring history

Back in the 1980s, full-screen software programs had to know each sixel hardware terminal's specific character cell size and overall dimensions. I think even back then people knew it was a bad idea to do it that way; both the ReGIS and Tektronix vector graphics protocols, which were contemporaneous, use virtual grids relative to the screen size so images can appear the same on any terminal. Sixel was never given that ability and so software that wanted to align graphics and text nicely had to be aware of the terminal type.

Modern terminal software allows apps to work the same way but instead of a look-up table based on $TERM, the app can request the size directly from the terminal. If ever the font or terminal window size changes, the app receives a SIGWINCH interrupt and knows to redraw itself. Ugly, yes, but easy enough and it works.

Although once initialized at a given size, it should remain at that size for the duration of the session if you expect it to behave sensibly (it could potentially readjust after an RIS, though, or possibly even when clearing the screen).

Probably a dumb question, but couldn't you just clear everything when the size changes and send SIGWINCH?

@j4james
Copy link
Collaborator Author

j4james commented Jun 18, 2024

couldn't you just clear everything when the size changes and send SIGWINCH?

  1. SIGWINCH doesn't work everywhere.
  2. A lot of software (arguably most software) won't respond to SIGWINCH anyway, so clearing everything just means you end up with no images at all. A common example would be when you cat an image. No shell is going to magically redraw that output when the terminal sends a SIGWINCH.

@j4james
Copy link
Collaborator Author

j4james commented Jun 18, 2024

These are some examples I've been using for testing which demonstrate a few things you can do with sixel besides just blitting an image onto the screen.

Paging animation


If you've got an animation with only a few frames, you can load each frame into a separate page, and then cycle through those pages to play back the animation. The VT340 only supported two pages of sixel, which rather limited what you could achieve with this technique, but we support up to six pages, which is slightly more useful.

This example includes music to control the timing of the frames, so you can just cat the file: horse.vt

Horse.mp4
Color cycling (plasma)


By changing the palette colors of a sixel image after it has been output, you can give the impression that it is animating, when it is essentially just a static image. This example is intended to produce a plasma effect, reminiscent of the 90s demo scene.

Run with python: plasma.py

Sixel.Plasma.mp4
Color cycling (waterfall)


This is using the same palette cycling technique as the plasma effect above, but with an image that is specially designed to produce the effect of water flowing over a waterfall. For more examples of this kind of thing by the same artist (Mark J. Ferrari), see http://www.effectgames.com/demos/canvascycle/

Waterfall.mp4
Scrolling margins


Although this wouldn't have been possible on the original VT340, a terminal which supports sixel as well as horizontal margins can use the margin scrolling area to manipulate the image content. This test case also demonstrates the use of rectangular area operations to copy parts of an image between pages.

Run with python: shuffle.py

Sixel.Shuffle.mp4

The observant among you may have noticed that some of these examples are running in Windows Terminals. This is because I was testing a merge of Leonard's passthrough branch and Atlas renderer. It's still a bit crashy at times, but it works!

@hackerb9
Copy link

hackerb9 commented Jun 18, 2024

@j4james Okay, that was fantastic, particularly the Sixel Shuffle. Has any other terminal implemented rectangular copy capabilities yet?

I love that you are interpreting the sixel colors immediately to allow for palette shifts. Are you going to allow for a shared palette between images, similar to xterm*privateColorRegisters: False? That functionality actually was used by Sixel programs back in the day (see, WordPerfect).

@DHowett
Copy link
Member

DHowett commented Jun 18, 2024

It would be IMPOSSIBLE for me to overstate how much I love this.

@j4james
Copy link
Collaborator Author

j4james commented Jun 19, 2024

Has any other terminal implemented rectangular copy capabilities yet?

@hackerb9 In regards to the shuffle test specifically, MLTerm and RLogin are the only terminals I'm aware of that have all the required functionality, because that needs paging, rectangular copy, and horizontal margins. I'm not sure there are any other terminals that can do all of those things as well as sixel. But if we're just considering rectangular copy, then I believe Contour supports that too, and there may well be others.

Are you going to allow for a shared palette between images

Yes, the palette is always shared between images, at least in the sense that it's inherited. Palette changes in the second image won't affect the first though, so it's not a global palette like you have on a hardware terminal. But that's still something I'm hoping we might support one day.

Copy link
Member

@DHowett DHowett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 34/34 - I would say I understand roughly 85% of the Sixel parser, and I think the image slice design is fine--readable and straightforward.

I'm provisionally signing off given that this is a draft (I'll review incremental changes) despite the capture question.

src/buffer/out/textBuffer.cpp Outdated Show resolved Hide resolved
src/terminal/adapter/SixelParser.cpp Show resolved Hide resolved
Copy link
Member

@lhecker lhecker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Still need to review the sixel parser.)

src/buffer/out/ImageSlice.cpp Show resolved Hide resolved
src/buffer/out/ImageSlice.cpp Show resolved Hide resolved
src/buffer/out/ImageSlice.cpp Show resolved Hide resolved
src/host/_output.cpp Show resolved Hide resolved
src/buffer/out/ImageSlice.cpp Show resolved Hide resolved
src/renderer/gdi/paint.cpp Show resolved Hide resolved
src/terminal/adapter/SixelParser.hpp Outdated Show resolved Hide resolved
Comment on lines +82 to +91
bool _textCursorWasVisible;
til::CoordType _availablePixelWidth;
til::CoordType _availablePixelHeight;
til::CoordType _maxPixelAspectRatio;
til::CoordType _pixelAspectRatio;
til::CoordType _sixelHeight;
til::CoordType _segmentHeight;
til::CoordType _pendingTextScrollCount;
til::size _backgroundSize;
bool _backgroundFillRequired;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that these members all have undefined values after construction. Should we initialize them here? (Some more below).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking through my comments again, I think this is the only one where I'd prefer waiting for your response. All the other ones were just my thoughts while I was reading the code.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I wasn't really sure what to do here. All of the values that aren't initialized here are ones that are required to be reinitialized on every DefineImage call (the initialization is handled in the various _initXXX methods). So anything we initialize here would be meaningless - it's never going to be used.

At one point I just had everything set to 0, but that seemed confusing, because you start wondering what it means to have a zero aspect ratio, or a zero sixel height. And the answer is that it doesn't mean anything because those fields aren't supposed to be initialized there!

So in the end I figured it might make more sense to just leave them unset. But if you think it's better to clear all the fields, I'm happy to do that. I don't feel strongly about it either way.

Copy link
Member

@lhecker lhecker Jun 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to initialize them. Assuming we do have a bug somewhere, perhaps only in the future if we modify this code, this would allow us to consistently reproduce the bug whereas without the initialization the member values would all be random and so they may result in random behavior which would make finding the root cause more difficult (or at least less consistent).

src/terminal/adapter/SixelParser.hpp Outdated Show resolved Hide resolved
Copy link
Member

@lhecker lhecker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done reviewing it now. I don't have anything major to say, mostly nits.

One thing that'd be nice is if you had any sixel test files that contain potential edge cases that you'd be willing to contribute. For instance, for semi-transparency, etc., just so that we can test any changes we make to this code or the renderers in the future.

src/terminal/adapter/SixelParser.cpp Show resolved Hide resolved
src/terminal/adapter/SixelParser.cpp Outdated Show resolved Hide resolved
src/terminal/adapter/SixelParser.cpp Show resolved Hide resolved
@PhMajerus
Copy link

One thing that'd be nice is if you had any sixel test files that contain potential edge cases that you'd be willing to contribute. For instance, for semi-transparency, etc., just so that we can test any changes we make to this code or the renderers in the future.

Here is a test file with transparency:
SixelTestFileWithTransparency.txt
(Not drawing all pixels of its canvas, successfully tested in mlterm and background properly shows through. xterm -ti vt340 on the other hand doesn't handle transparency)

And the same but with a hyperlink on the sixel image, which I think should be a supported scenario as well:
SixelTestFileWithTransparencyAndHyperlink.txt
(I couldn't find an existing terminal that supports both sixels and hyperlinks to test this one)

@j4james
Copy link
Collaborator Author

j4james commented Jun 29, 2024

Here is a test file with transparency:
SixelTestFileWithTransparency.txt

That file doesn't actually have transparency! That's why it doesn't work in XTerm.

And the same but with a hyperlink on the sixel image, which I think should be a supported scenario as well:
SixelTestFileWithTransparencyAndHyperlink.txt

I'm glad you brought this up, because it is worth considering, but it's probably best discussed in a followup issue if/when we have sixel in Windows Terminal itself, because conhost doesn't support hyperlinks.

Although I'll say now that I personally don't think hyperlinks should be associated with the sixel output. If you want to attach a link to the image, you can just apply it to the text area behind (easily doable with a DECFRA call). And that gives you the flexibility to apply different links to different parts of an image, similar to an HTML image map. There are other complications as well, but we can discuss all that later.

@j4james
Copy link
Collaborator Author

j4james commented Jun 30, 2024

One thing that'd be nice is if you had any sixel test files that contain potential edge cases that you'd be willing to contribute.

Most of the important compatibility tests are in hackerb9's vt340test repo. My contributions are in the j4james directory.

In an ideal world I would have liked to convert those into units tests of some kind, but I have no idea how that could be made to work.

@PhMajerus
Copy link

PhMajerus commented Jun 30, 2024

@j4james My understanding is sixels work pretty much like a dot-matrix printer with 6 needles and several colors of ink ribbons.
So it's possible to have several colors by changing the color at any time, and if you need pixels of different colors in the same 6 pixels vertical band, you need to return to the beginning of the line and overstrike using each color.
This means it may even be possible to mix colors by overstriking the same pixel on some printers depending on the ink they use, and also means if a pixel isn't output in any of the color, the page's paper is left as it.

The pixels shown in magenta in this picture are not set in any color in the sixel:
image

In mlterm, it properly shows as leaving those pixels as background color:
image

This technique of not outputting a pixel in any of the colors to keep the original background color works fine in mlterm, and is what I used in the test image. Are you saying the specs and real hardware do not handle that as transparency? and then, how do they achieve transparency if at all possible?

@j4james
Copy link
Collaborator Author

j4james commented Jun 30, 2024

if a pixel isn't output in any of the color, the page's paper is left as it.

@PhMajerus Your understanding is correct as far as it goes. What you're missing is that sixel terminals had a feature called "background select", which by default filled the background area before plotting any pixels. In that case, when you don't output a pixel, it's just going to end up with the background fill color, so won't be transparent.

On a real terminal, the background fill color comes from entry 0 in the color table, and by default that should be black unless it's redefined. But it look to me like Mlterm is possibly filling with the active text background color, which gives you the impression that it's transparent, but it's really not, and it's technically incorrect.

Xterm, as far as I can recall, will fill the background with a shade of gray, because your image defines color number 0 as 78;78;78. However, that's also not correct. Color number 0 in the sixel content is not necessarily the same thing as color table entry 0, but Xterm doesn't handle color definitions correctly.

how do they achieve transparency if at all possible?

Parameter 2 of the DCS sequence specifies whether "background select" should be applied or not. If it's 0 or 2, the background is filled (the default behavior). If it's 1, the background won't be filled, and anywhere that you haven't plotted a pixel should be transparent (as you originally expected). So if you want your image to be transparent, it should start with something like \eP0;1q.

And the best way to test transparency is to output your image over some actual text content, and make sure you can see the underlying text showing through. Most Linux terminals are unlikely to support it though. Last I tested I think Xterm was the only one that did. But that was a while back, so it's possible some of the others have improved since then.

@zadjii-msft zadjii-msft added this pull request to the merge queue Jul 1, 2024
Merged via the queue into microsoft:main with commit 236c003 Jul 1, 2024
15 checks passed
@PhMajerus
Copy link

PhMajerus commented Jul 2, 2024

@j4james

On a real terminal, the background fill color comes from entry 0 in the color table, and by default that should be black unless it's redefined. But it look to me like Mlterm is possibly filling with the active text background color, which gives you the impression that it's transparent, but it's really not, and it's technically incorrect.

Xterm, as far as I can recall, will fill the background with a shade of gray, because your image defines color number 0 as 78;78;78. However, that's also not correct. Color number 0 in the sixel content is not necessarily the same thing as color table entry 0, but Xterm doesn't handle color definitions correctly.

I think you're right:
mlterm unset pixels of the sixel are white, but the terminal palette color#0 is black and the sixel color#0 is gray, so apparently it fills the sixel background with the current text background color, not the sixel's color#0, nor the terminal current palette's color#0.
xterm fills with gray, which is the sixel color#0.

Parameter 2 of the DCS sequence specifies whether "background select" should be applied or not. (...)

You are absolutely right, I overlooked the P2 parameter in the VT330/340 reference manual. Changing P2 to 1 properly makes the background transparent in all terminals, so it works in mlterm, xterm, and conhost canary.
Thanks for taking the time to look into it and explain it to me.

So here is the corrected file:
SixelTestFileWithTransparency.txt

I still noticed a difference between conhost canary and mlterm and xterm. In your implementation, the next line of text overwrites the last line of the sixel image, while both mlterm and xterm are doing a newline before printing the next echo text.

conhost:
image

mlterm:
image

xterm:
image

I suspect that is part of the problem you mentioned with sixels overlapping text.

github-merge-queue bot pushed a commit that referenced this pull request Jul 2, 2024
This PR add supports for two query sequences that are used to determine
the pixel size of a character cell:

* `CSI 16 t` reports the pixel size of a character cell directly.
* `CSI 14 t` reports the pixel size of the text area, and when divided
  by the character size of the text area, you can get the character cell
  size indirectly (this method predates the introduction of `CSI 16 t`).

These queries are used by Sixel applications that want to fit an image
within specific text boundaries, so need to know how many cells would be
covered by a particular pixel size, or vice versa. Our implementation of
Sixel uses a virtual cell size that is always 10x20 (in order to emulate
the VT340 more accurately), so these queries shouldn't really be needed,
but some applications will fail to work without them.

## References and Relevant Issues

Sixel support was added to conhost in PR #17421.

## Validation Steps Performed

I've added some unit tests to verify that these queries are producing
the expected responses, and I've manually tested on [XtermDOOM] (which
uses `CSI 16 t`), and the [Notcurses] library (which uses `CSI 14 t`).

[XtermDOOM]: https://gitlab.com/AutumnMeowMeow/xtermdoom
[Notcurses]: https://github.com/dankamongmen/notcurses

## PR Checklist
- [x] Tests added/passed
@j4james
Copy link
Collaborator Author

j4james commented Jul 2, 2024

In your implementation, the next line of text overwrites the last line of the sixel image, while both mlterm and xterm are doing a newline before printing the next echo text.

@PhMajerus The final text cursor position is meant to end up on top of the image, otherwise you wouldn't be able to output images on the bottom row of the display. If you want it below the image you need to add a linefeed yourself.

Xterm and mlterm both use non-standard algorithms for positioning the cursor, but they don't actually match each other. The fact that they both ended up with the text below the image in your example is just a coincidence. In some cases the text will overlap the image and in other cases it won't. It depend on image size and font size.

@PhMajerus
Copy link

PhMajerus commented Jul 2, 2024

Thanks again @j4james, both for the time explaining how it is supposed to work and for implementing it. I can't wait for it to work in wt as well!
So here is a test file with both transparency and text alignment around a sixel image:
Pepelogoo.txt
It uses 9 colors and transparent background.

image
(Yeah, I know, this is conhost for now, but I'm sure we'll get it in Terminal soon as well)

@j4james j4james deleted the feature-sixel branch August 3, 2024 15:46
DHowett pushed a commit that referenced this pull request Aug 15, 2024
When we have a series of image slices of differing widths, which also
don't align with the cell boundaries, we can get rounding errors in the
scaling which makes the different slices appear misaligned.

This PR fixes the issue by removing the 4 pixel width alignment that was
enforced in the `ImageSlice` class, since that's not actually necessary
when the pixels themselves are already 4 bytes in size. And without
that, the widths should be correctly aligned with the cell boundaries.

## References and Relevant Issues

The initial Sixel implementation was added in PR #17421.

## Validation Steps Performed

I've confirmed that this fixes the rendering glitches reported in
#17711, and all my existing Sixel tests still work as expected.

Closes #17711
@Giessen
Copy link

Giessen commented Aug 20, 2024

Any plan to make a new release with the great feature? Thank you very much.

@DHowett
Copy link
Member

DHowett commented Aug 20, 2024

🔜

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants