Skip to content

Rendering Repeatability

Ben Heasly edited this page Oct 18, 2016 · 9 revisions

Rendering algorithms used by PBRT and Mitsuba are stochastic because they sample light rays from an enormous space of possible light rays. This means the output renderings will always contain some noise. Usually we don't notice the noise: if a rendering looks "good" then a repeat rendering of the same scene should also look "good", and the two renderings should look the same.

To check the repeatability of renderings, we rendered all of the RenderToolbox Example Scenes twice and compared all the pairs of outputs.

This used the utilities rtbTestAllExampleScenes() and rtbCompareAllExampleScenes().

Each set of renderings contained 205 distinct renderings. Each distinct rendering represents a particular variant of a particular parent scene, as rendered by a particular renderer.

Generally, each rendering and its repeated counterpart looked the same by visual inspection. Using two simple comparison metrics, we found that pairs or repeated renderings generally contained small overall differences and small per-pixel differences.

Summary

rtbCompareAllExampleScenes() produced a summary of comparisons between repeated renderings.

The summary plot sorts these renderings using the Relative Difference metric described below. Only the 26 renderings with the largest Relative Differences are shown. These were the "least repeatable" renderings.

Summary of the "least repeatable" scenes.

Correlation Coefficient

The plot on the left shows the overall correlations between repeated pairs as blue circles. This metric treats the renderings as large arrays of pixel components, and ignores the 3-dimensional structure of the renderings (height x width x wavelengths). The comparison statistic is the correlation coefficient between the large array from the first rendering, and the large array from the second rendering.

Generally, the correlation coefficients are close to 1. The least correlation was about 0.95, which occurred for the glass variant of the Dragon scene, as rendered by Mitsuba.

Relative Difference

The plot on the right compares individual pixel components between repeated renderings. This metric is based on the difference image formed by subtracting pixel components in the second rendering from the corresponding components in the first. For each pixel component, we express the absolute difference as a fraction of the component value from the first rendering.

When component values in the fist rendering (the denominator) are very small, the relative difference may appear unfairly large. So we ignore differences where the components of the first image fall below an arbitrary threshold (1/5 of the max value).

Black circles show the mean relative difference between corresponding pixel components in the first and second renderings. These values are generally close to 0, indicating that corresponding pixel components generally agree across renderings.

Red crosses show the max relative difference between components. This indicates the single spectral component of the single pixel that showed the greatest difference between renderings. Often this component has a relative difference close to 50%. This difference may be tolerable for single pixel components, as long as components in the rest of the rendering have smaller differences -- it might be just a speck.

The worst relative error occurred for a pixel component in the glass variant of the Dragon scene, as rendered by Mitsuba. This agrees with the worst correlation coefficient.

Examples of Rendering Pairs

rtbCompareAllExampleScenes() also produced a detailed comparison figure for each pair of repeated renderings. These show the first and second renderings as converted to sRGB. They also show absolute difference images formed by subtracting the first multi-spectral rendering from the second, and also the the second from the first. The difference images are also converted to sRGB. All four sRGB images are scaled by maximum luminance.

Visual inspection of these comparison figures provides some intuition about rendering repeatability. Here are some of the "least repeatable" renderings, selected from the summary above.

Generally, the first and second renderings look the same (top right and bottom left), and the difference images contain a few noisy specks or strange artifacts (top left and bottom right).

Generally, the Mitsuba renderings are "less repeatable" than the PBRT renderings, according to our correlation and relative difference statistics. This is curious, because the repeated renderings have the same visual appearance.

Glass Dragon

The glass variant of the Dragon scene as rendered by Mitsuba gave the worst overall correlation and the worst relative difference between first and second renderings.

Because these images are scaled by max value, and they contain bright noise artifacts, the useful contrast in these images is low. With better tone mapping they would look better.

Above, Mitsuba rendered the glass variant of the Dragon scene, twice.

Matte Sphere

The matte variant of the MaterialSphere bumps scene as rendered by Mitsuba gave the second-worst relative error for a single pixel component, between first and second renderings.

Above, Mitsuba rendered the matte variant of the MaterialSphere bumps scene, twice.

Metal Dragon

The metal variant of the Dragon scene as rendered by Mitsuba gave the third-worst relative error for a single pixel component, between first and second renderings.

Above, Mitsuba rendered the metal variant of the Dragon scene, twice.

Other Renderings

This test included 180 additional renderings that don't appear in the summary figure above. These renderings were all "at least as repeatable" than the examples shown. They had higher rendering correlations and lower relative errors between pixel components.

Variability of This Test

We found that rendering repeatability has some variability across test runs.

To check the repeatability of this test, we generated three sets of renderings on our test machine like the sets used above. We ran rtbCompareAllExampleScenes() on each of the three pairs of rendering sets.

For all three pairs, the "least repeatable" example was the glass variant of the Dragon scene, as rendered by Mitsuba. This had a correlation score of 0.95 at best, as shown above. In the other two pairs, the correlation score was less than 0.85. This scene consistently had a maximum relative diff score of about 1 -- a 100% change for at least one component of one pixel.

For all other examples, the correlation scores were consistently close to unity, and relative diffs consistently less than 1.

The glass Dragon is probably our noisiest example, which might benefit from additional ray samples per pixel. For noisy examples like this, the rendering repeatability is poor and variable, as measured by our correlation and relative diff statistics.

For less noisy examples, rendering repeatability seems good and consistent.

Rendering Setup

For this test, we rendered all of the RenderToolbox example scenes twice, using the rtbTestAllExampleScenes() utility function.

We ran both rendering sets concurrently using Amazon Web Services, on 14 October 2016:

Clone this wiki locally