Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rendering glitches on path filling #23

Closed
oleid opened this issue Jun 15, 2020 · 7 comments
Closed

Rendering glitches on path filling #23

oleid opened this issue Jun 15, 2020 · 7 comments

Comments

@oleid
Copy link

oleid commented Jun 15, 2020

With great interest, I read your blog post on the new code in piet-gpu and gave it a try on my Radeon RX 470 GPU:

08:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480/570/570X/580/580X/590] (rev cf)

I got the example images from:
[1] http://w3.impa.br/~diego/projects/GanEtAl14/

It would seem there are a few rendering glitches, most notably:

  1. Some images are mirrored
  2. A few font glitches

Please note, I used the default radav driver on ArchLinux. There is an experimental new shader backend, which is faster, but has the same output (or at least I didn't notice any difference). It can be activated by exporting export RADV_PERFTEST=aco.

Also, there is another vulkan driver for AMD, namely amdvlk. That seems to be a little slower for most operations, but the output seems to be the same as well.

How where the benchmarks from the blog post performed? Merely run the CLI tool without any additional options? Given the tiger svg from this repo, my old RX 470 seems to be faster than a 1060, which seems odd.

paper-1

cargo run --release --bin=cli --  inputs/svg/paper-1.svg
    Finished release [optimized] target(s) in 0.06s
     Running `target/release/cli inputs/svg/paper-1.svg`
parsing time: 44.684545ms
flattening and encoding time: 5.586778ms
scene: 104322 elements
Element kernel time: 0.258ms
Tile allocation kernel time: 0.016ms
Coarse path kernel time: 0.199ms
Backdrop kernel time: 0.074ms
Binning kernel time: 0.106ms
Coarse raster kernel time: 0.070ms
Render kernel time: 0.220ms

paper-1

At scale=2

cargo run --release --bin=cli -- --scale 2  inputs/svg/paper-1.svg
    Finished release [optimized] target(s) in 0.06s
     Running `target/release/cli --scale 2 inputs/svg/paper-1.svg`
parsing time: 43.184138ms
flattening and encoding time: 3.919508ms
scene: 104322 elements
Element kernel time: 0.259ms
Tile allocation kernel time: 0.014ms
Coarse path kernel time: 0.157ms
Backdrop kernel time: 0.055ms
Binning kernel time: 0.024ms
Coarse raster kernel time: 0.111ms
Render kernel time: 0.693ms

paper-1-scale2

tiger

cargo run --release --bin=cli
    Finished release [optimized] target(s) in 0.07s
     Running `target/release/cli`
parsing time: 1.110947ms
flattening and encoding time: 100.067µs
scene: 2655 elements
Element kernel time: 0.032ms
Tile allocation kernel time: 0.091ms
Coarse path kernel time: 0.162ms
Backdrop kernel time: 0.242ms
Binning kernel time: 0.021ms
Coarse raster kernel time: 0.126ms
Render kernel time: 0.290ms

tiger

The tiger from [1], however, is upside-down as well.

 cargo run --release --bin=cli -- --scale 2  inputs/svg/tiger.svg 
    Finished release [optimized] target(s) in 0.05s
     Running `target/release/cli --scale 2 inputs/svg/tiger.svg`
parsing time: 1.601394ms
flattening and encoding time: 53.149µs
scene: 2835 elements
Element kernel time: 0.031ms
Tile allocation kernel time: 0.035ms
Coarse path kernel time: 0.118ms
Backdrop kernel time: 0.091ms
Binning kernel time: 0.012ms
Coarse raster kernel time: 0.115ms
Render kernel time: 0.251ms

tiger-2

paris-30k

 cargo run --release --bin=cli --  inputs/svg/paris-30k.svg 
    Finished release [optimized] target(s) in 0.05s
     Running `target/release/cli inputs/svg/paris-30k.svg`
parsing time: 234.354676ms
flattening and encoding time: 40.583878ms
scene: 732418 elements
Element kernel time: 1.533ms
Tile allocation kernel time: 0.142ms
Coarse path kernel time: 0.423ms
Backdrop kernel time: 0.282ms
Binning kernel time: 0.161ms
Coarse raster kernel time: 0.355ms
Render kernel time: 0.637ms

paris-30k

At scale=2

    Finished release [optimized] target(s) in 0.07s
     Running `target/release/cli --scale 2 inputs/svg/paris-30k.svg`
parsing time: 234.821258ms
flattening and encoding time: 39.913438ms
scene: 732418 elements
Element kernel time: 1.524ms
Tile allocation kernel time: 0.204ms
Coarse path kernel time: 0.501ms
Backdrop kernel time: 0.329ms
Binning kernel time: 0.147ms
Coarse raster kernel time: 0.897ms
Render kernel time: 2.888ms

paris-30k-scale2

@raphlinus
Copy link
Contributor

Heh, I'm not parsing the transform properly; the --flip flag is present on the command line to work around this. If nothing else, the documentation should be improved.

The font glitches are a known issue, some numerical robustness issues I haven't gotten around to fixing yet.

Thanks for your interest!

@oleid
Copy link
Author

oleid commented Jun 15, 2020

Just discovered flip myself, now I feel stupid :D

How where the benchmarks from the blog post performed? Merely run the CLI tool without any additional options? Given the tiger svg from this repo, my old RX 470 seems to be faster than a 1060, which seems odd.

@raphlinus
Copy link
Contributor

Yes, by running the cli tool. This tends not to trigger a "boost" mode, I've notice it run about 20% faster running with continuous presentation. Regarding relative performance, the code uses shared memory (LDS) extensively, which is considerably faster (relative to other metrics) on AMD than other cards. I think between those two things is probably explains it, though you always learn things when you explore deeply into performance measurement.

@oleid
Copy link
Author

oleid commented Jun 26, 2020

Would you like to keep this issue open as source of information for others or rather close it?

@eliasnaur
Copy link
Collaborator

The font glitches are a known issue, some numerical robustness issues I haven't gotten around to fixing yet.

I'd like to work on this. If you have pointers to where the issues are, I'd appreciate it.

@raphlinus raphlinus changed the title Rendering glitches on AMD hardware on Linux Rendering glitches on path filling Aug 29, 2020
@raphlinus
Copy link
Contributor

This is a good starter issue - it probably won't require a lot of code to fix (and that will be likely localized to path_coarse.comp), but it does require some insight and understanding.

The source of the glitch is a numerical inconsistency between two "orientation problems". One is to detect where the path segment crosses the top edge of a tile (that's "xray" on line 215 of current master), and the other is where it crosses the left edge ("y_edge" on line 234). The answer is reworking the slope calculation so that these two test are always consistent with each other. Obviously one thing that makes it a little tricky is that the segment can be vertical or horizontal, in which case the slope is infinite.

I suggest taking a look at the Pathfinder codebase, as Patrick has changed it to use a similar algorithm (also inspired by RAVG) and has addressed the numerical robustness issues. Patrick might be able to point you to the relevant commit in the code if you don't easily find it yourself.

Best of luck!

@eliasnaur
Copy link
Collaborator

I have some progress on this. The diff below empirically fixes all visible glitches from a setup where I carefully watch a slowly rotating tiger. The approach is sound in that the changes steer the intersection decisions away from inconsistencies. Unfortunately, the code is complicated enough that deficiencies are no longer obvious; I'm going to attempt a rewrite of the algorithm to tackle that.

diff --git gpu/shaders/path_coarse.comp gpu/shaders/path_coarse.comp
index 658af0eb..67f79871 100644
--- gpu/shaders/path_coarse.comp
+++ gpu/shaders/path_coarse.comp
@@ -174,10 +174,20 @@ void main() {
                 b = invslope; // Note: assumes square tiles, otherwise scale.
                 a = (p0.x - (p0.y - 0.5 * float(TILE_HEIGHT_PX)) * b) * SX;
 
-                int x0 = int(floor((xmin) * SX));
-                int x1 = int(ceil((xmax) * SX));
-                int y0 = int(floor((ymin) * SY));
-                int y1 = int(ceil((ymax) * SY));
+                float sxmax = xmax * SX;
+                float symax = ymax * SY;
+
+                int x0 = int(floor(xmin * SX));
+                int x1 = int(ceil(sxmax));
+                int y0 = int(floor(ymin * SY));
+                int y1 = int(ceil(symax));
+
+                if (sxmax == ceil(sxmax)) {
+                    x1 += 1;
+                }
+                if (symax == ceil(symax)) {
+                    y1 += 1;
+                }
 
                 x0 = clamp(x0, bbox.x, bbox.z);
                 y0 = clamp(y0, bbox.y, bbox.w);
@@ -191,19 +201,38 @@ void main() {
                 // Consider using subgroups to aggregate atomic add.
                 uint tile_offset = atomicAdd(alloc, n_tile_alloc * TileSeg_size);
                 TileSeg tile_seg;
+
+                int p0_xray = int(floor(p0.x*SX));
+                int p1_xray = int(floor(p1.x*SX));
+                int xray, last_xray;
+                if (p0.y < p1.y) {
+                    xray = p0_xray;
+                    last_xray = p1_xray;
+                } else {
+                    xray = p1_xray;
+                    last_xray = p0_xray;
+                }
                 for (int y = y0; y < y1; y++) {
                     float tile_y0 = float(y * TILE_HEIGHT_PX);
-                    if (tag == PathSeg_FillCubic && min(p0.y, p1.y) <= tile_y0) {
-                        int xray = max(int(ceil(xc - 0.5 * b)), bbox.x);
-                        if (xray < bbox.z) {
-                            int backdrop = p1.y < p0.y ? 1 : -1;
-                            TileRef tile_ref = Tile_index(path.tiles, uint(base + xray));
-                            uint tile_el = tile_ref.offset >> 2;
-                            atomicAdd(tile[tile_el + 1], backdrop);
-                        }
+                    int xbackdrop = max(xray + 1, bbox.x);
+                    if (tag == PathSeg_FillCubic && y > y0 && xbackdrop < bbox.z) {
+                        int backdrop = p1.y < p0.y ? 1 : -1;
+                        TileRef tile_ref = Tile_index(path.tiles, uint(base + xbackdrop));
+                        uint tile_el = tile_ref.offset >> 2;
+                        atomicAdd(tile[tile_el + 1], backdrop);
+                    }
+
+                    int xx0 = int(floor(xc - c));
+                    int xx1 = int(ceil(xc + c));
+                    xx1 = max(xx1, xray + 1);
+                    xx0 = clamp(xx0, x0, x1);
+                    xx1 = clamp(xx1, x0, x1);
+
+                    int next_xray = xray;
+                    if (y == y1 - 1) {
+                        next_xray = last_xray;
                     }
-                    int xx0 = clamp(int(floor(xc - c)), x0, x1);
-                    int xx1 = clamp(int(ceil(xc + c)), x0, x1);
+
                     for (int x = xx0; x < xx1; x++) {
                         float tile_x0 = float(x * TILE_WIDTH_PX);
                         TileRef tile_ref = Tile_index(path.tiles, uint(base + x));
@@ -214,12 +243,36 @@ void main() {
                         float y_edge = 0.0;
                         if (tag == PathSeg_FillCubic) {
                             y_edge = mix(p0.y, p1.y, (tile_x0 - p0.x) / dx);
-                            if (min(p0.x, p1.x) < tile_x0 && y_edge >= tile_y0 && y_edge < tile_y0 + TILE_HEIGHT_PX) {
-                                if (p0.x > p1.x) {
+                            bool intersects = min(p0.x, p1.x) < tile_x0 && y_edge >= tile_y0 && y_edge < tile_y0 + TILE_HEIGHT_PX;
+
+                            if (y < y1 - 1) {
+                                if (intersects && x > next_xray && next_xray >= xray) {
+                                    next_xray = x;
+                                } else if (intersects && x <= next_xray && next_xray <= xray) {
+                                    next_xray = x - 1;
+                                }
+                            }
+                            if (next_xray < xray) {
+                                intersects = next_xray < x && x <= xray;
+                            } else if (next_xray > xray) {
+                                intersects = xray < x && x <= next_xray;
+                            } else {
+                                intersects = false;
+                            }
+
+                            float s = sign(p0.x - p1.x);
+                            if (min(p0.x, p1.x) < tile_x0 && max(p0.x, p1.x) >= tile_x0) {
+                                if (s > 0) {
                                     tile_seg.end = vec2(tile_x0, y_edge);
                                 } else {
                                     tile_seg.start = vec2(tile_x0, y_edge);
                                 }
+                            }
+
+                            if (intersects) {
+                                if (tile_seg.end.x == tile_seg.start.x) {
+                                    tile_seg.start.x += s*1e-4;
+                                }
                             } else {
                                 y_edge = 1e9;
                             }
@@ -231,6 +284,7 @@ void main() {
                     }
                     xc += b;
                     base += stride;
+                    xray = next_xray;
                 }
 
                 n_out += 1;

eliasnaur added a commit to eliasnaur/piet-gpu that referenced this issue Dec 1, 2020
Eliminates the precision loss of the subtraction in the sign(end.x - start.x)
expression in kernel4. That's important for the next change that avoids
inconsistent line intersections in path_coarse.

Updates linebender#23

Signed-off-by: Elias Naur <[email protected]>
eliasnaur added a commit to eliasnaur/piet-gpu that referenced this issue Dec 12, 2020
The previous attempt to fix inconsistent intersections because of floating
point inaccuracy[0], missed two cases.

The first case is that for top intersections with the very first row would fail
the test

tag == PathSeg_FillCubic && y > y0 && xbackdrop < bbox.z,

specifically y is not larger than y0 when y0 has been clipped to 0.

Fix that by re-introducing the min(p0.y, p1.y) < tile_y0 check that does work
and is just as consistent.

The second case is that the tracking left intersections in the [xray, next_xray]
range of tiles may fail when next_xray is forced to last_xray, the final xray value.

Fix that case by computing next_xray explicitly, before looping over the
x tiles. The code is now much simpler.

Updates linebender#23

[0] linebender@29cfb8b

Signed-off-by: Elias Naur <[email protected]>
eliasnaur added a commit to eliasnaur/piet-gpu that referenced this issue Dec 12, 2020
The previous attempt to fix inconsistent intersections because of floating
point inaccuracy[0] missed two cases.

The first case is that for top intersections with the very first row would fail
the test

tag == PathSeg_FillCubic && y > y0 && xbackdrop < bbox.z,

specifically y is not larger than y0 when y0 has been clipped to 0.

Fix that by re-introducing the min(p0.y, p1.y) < tile_y0 check that does work
and is just as consistent.

The second case is that the tracking left intersections in the [xray, next_xray]
range of tiles may fail when next_xray is forced to last_xray, the final xray value.

Fix that case by computing next_xray explicitly, before looping over the
x tiles. The code is now much simpler.

Updates linebender#23

[0] linebender@29cfb8b

Signed-off-by: Elias Naur <[email protected]>
eliasnaur added a commit to eliasnaur/piet-gpu that referenced this issue Dec 12, 2020
The previous attempt to fix inconsistent intersections because of floating
point inaccuracy[0] missed two cases.

The first case is that for top intersections with the very first row would fail
the test

tag == PathSeg_FillCubic && y > y0 && xbackdrop < bbox.z

In particular, y is not larger than y0 when y0 has been clipped to 0.

Fix that by re-introducing the min(p0.y, p1.y) < tile_y0 check that does work
and is just as consistent.

The second case is that the tracking left intersections in the [xray, next_xray]
range of tiles may fail when next_xray is forced to last_xray, the final xray value.

Fix that case by computing next_xray explicitly, before looping over the
x tiles. The code is now much simpler.

Updates linebender#23

[0] linebender@29cfb8b

Signed-off-by: Elias Naur <[email protected]>
eliasnaur added a commit to eliasnaur/piet-gpu that referenced this issue Dec 12, 2020
The previous attempt to fix inconsistent intersections because of floating
point inaccuracy[0] missed two cases.

The first case is that for top intersections with the very first row would fail
the test

tag == PathSeg_FillCubic && y > y0 && xbackdrop < bbox.z

In particular, y is not larger than y0 when y0 has been clipped to 0.

Fix that by re-introducing the min(p0.y, p1.y) < tile_y0 check that does work
and is just as consistent.

The second case is that the tracking left intersections in the [xray, next_xray]
range of tiles may fail when next_xray is forced to last_xray, the final xray value.

Fix that case by computing next_xray explicitly, before looping over the
x tiles. The code is now much simpler.

Updates linebender#23

[0] linebender@29cfb8b

Signed-off-by: Elias Naur <[email protected]>
eliasnaur added a commit to eliasnaur/piet-gpu that referenced this issue Dec 17, 2020
The previous attempt to fix inconsistent intersections because of floating
point inaccuracy[0] missed two cases.

The first case is that for top intersections with the very first row would fail
the test

tag == PathSeg_FillCubic && y > y0 && xbackdrop < bbox.z

In particular, y is not larger than y0 when y0 has been clipped to 0.

Fix that by re-introducing the min(p0.y, p1.y) < tile_y0 check that does work
and is just as consistent.

The second case is that the tracking left intersections in the [xray, next_xray]
range of tiles may fail when next_xray is forced to last_xray, the final xray value.

Fix that case by computing next_xray explicitly, before looping over the
x tiles. The code is now much simpler.

Finally, ensure that xx0 and xx1 doesn't overflow the allocated number of tiles
by clamping them *after* setting them. Adjust xx0 to include xray, just as xx1
is adjusted; I haven't seen corruption without it, but it's not obvious xx0
always includes xray.

While here, replace a "+=" on a guaranteed zero value to just "=".

Updates linebender#23

[0] linebender@29cfb8b

Signed-off-by: Elias Naur <[email protected]>
eliasnaur added a commit to eliasnaur/piet-gpu that referenced this issue Dec 19, 2020
The previous attempt to fix inconsistent intersections because of floating
point inaccuracy[0] missed two cases.

The first case is that for top intersections with the very first row would fail
the test

tag == PathSeg_FillCubic && y > y0 && xbackdrop < bbox.z

In particular, y is not larger than y0 when y0 has been clipped to 0.

Fix that by re-introducing the min(p0.y, p1.y) < tile_y0 check that does work
and is just as consistent. Add similar check, min(p0.x, p1.x) < tile_x0, for
deciding when to clip the segment to the left edge (but keep consistent xray check
for deciding left edge *intersections*).

The second case is that the tracking left intersections in the [xray, next_xray]
range of tiles may fail when next_xray is forced to last_xray, the final xray value.

Fix that case by computing next_xray explicitly, before looping over the
x tiles. The code is now much simpler.

Finally, ensure that xx0 and xx1 doesn't overflow the allocated number of tiles
by clamping them *after* setting them. Adjust xx0 to include xray, just as xx1
is adjusted; I haven't seen corruption without it, but it's not obvious xx0
always includes xray.

While here, replace a "+=" on a guaranteed zero value to just "=".

Updates linebender#23

[0] linebender@29cfb8b

Signed-off-by: Elias Naur <[email protected]>
eliasnaur added a commit to eliasnaur/piet-gpu that referenced this issue Dec 24, 2020
The previous attempt to fix inconsistent intersections because of floating
point inaccuracy[0] missed two cases.

The first case is that for top intersections with the very first row would fail
the test

tag == PathSeg_FillCubic && y > y0 && xbackdrop < bbox.z

In particular, y is not larger than y0 when y0 has been clipped to 0.

Fix that by re-introducing the min(p0.y, p1.y) < tile_y0 check that does work
and is just as consistent. Add similar check, min(p0.x, p1.x) < tile_x0, for
deciding when to clip the segment to the left edge (but keep consistent xray check
for deciding left edge *intersections*).

The second case is that the tracking left intersections in the [xray, next_xray]
range of tiles may fail when next_xray is forced to last_xray, the final xray value.

Fix that case by computing next_xray explicitly, before looping over the
x tiles. The code is now much simpler.

Finally, ensure that xx0 and xx1 doesn't overflow the allocated number of tiles
by clamping them *after* setting them. Adjust xx0 to include xray, just as xx1
is adjusted; I haven't seen corruption without it, but it's not obvious xx0
always includes xray.

While here, replace a "+=" on a guaranteed zero value to just "=".

Updates linebender#23

[0] linebender@29cfb8b

Signed-off-by: Elias Naur <[email protected]>
eliasnaur added a commit to eliasnaur/piet-gpu that referenced this issue Dec 27, 2020
The previous attempt to fix inconsistent intersections because of floating
point inaccuracy[0] missed two cases.

The first case is that for top intersections with the very first row would fail
the test

tag == PathSeg_FillCubic && y > y0 && xbackdrop < bbox.z

In particular, y is not larger than y0 when y0 has been clipped to 0.

Fix that by re-introducing the min(p0.y, p1.y) < tile_y0 check that does work
and is just as consistent. Add similar check, min(p0.x, p1.x) < tile_x0, for
deciding when to clip the segment to the left edge (but keep consistent xray check
for deciding left edge *intersections*).

The second case is that the tracking left intersections in the [xray, next_xray]
range of tiles may fail when next_xray is forced to last_xray, the final xray value.

Fix that case by computing next_xray explicitly, before looping over the
x tiles. The code is now much simpler.

Finally, ensure that xx0 and xx1 doesn't overflow the allocated number of tiles
by clamping them *after* setting them. Adjust xx0 to include xray, just as xx1
is adjusted; I haven't seen corruption without it, but it's not obvious xx0
always includes xray.

While here, replace a "+=" on a guaranteed zero value to just "=".

Updates linebender#23

[0] linebender@29cfb8b

Signed-off-by: Elias Naur <[email protected]>
eliasnaur added a commit to eliasnaur/piet-gpu that referenced this issue Dec 27, 2020
The previous attempt to fix inconsistent intersections because of floating
point inaccuracy[0] missed two cases.

The first case is that for top intersections with the very first row would fail
the test

tag == PathSeg_FillCubic && y > y0 && xbackdrop < bbox.z

In particular, y is not larger than y0 when y0 has been clipped to 0.

Fix that by re-introducing the min(p0.y, p1.y) < tile_y0 check that does work
and is just as consistent. Add similar check, min(p0.x, p1.x) < tile_x0, for
deciding when to clip the segment to the left edge (but keep consistent xray check
for deciding left edge *intersections*).

The second case is that the tracking left intersections in the [xray, next_xray]
range of tiles may fail when next_xray is forced to last_xray, the final xray value.

Fix that case by computing next_xray explicitly, before looping over the
x tiles. The code is now much simpler.

Finally, ensure that xx0 and xx1 doesn't overflow the allocated number of tiles
by clamping them *after* setting them. Adjust xx0 to include xray, just as xx1
is adjusted; I haven't seen corruption without it, but it's not obvious xx0
always includes xray.

While here, replace a "+=" on a guaranteed zero value to just "=".

Updates linebender#23

[0] linebender@29cfb8b

Signed-off-by: Elias Naur <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants