Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DCP: Initial support for overlay blending #304

Open
wants to merge 218 commits into
base: bits/200-dcp
Choose a base branch
from

Conversation

chadmed
Copy link

@chadmed chadmed commented Jun 22, 2024

DCP can blend two surfaces, so we should wire this up.

We currently don't really have a valid use case for this functionality, however this will be required to make full use of zero-copy video decode once AVD is merged. Work is ongoing in userspace to properly make use of hardware overlay planes. Given DCP's provenance, the best example of this would be directly scanning out hardware-decoded video to the primary plane, then overlaying subtitles/media controls on a different plane.

Given that DCP dies if we try to blend more than two surfaces, we cannot expose an overlay plane and a hardware cursor plane, as userspace has been designed to expect for the last 30 years. Even if we could, again owing to its provenance DCP cannot deal with the unique requirements of a cursor plane - it crashes if a framebuffer partially clips outside of screenspace. This can be hacked around and it is possible to expose a mostly-working cursor plane, however Kwin seems to be the only compositor that is architecturally capable of gracefully handling this in the same way macOS does (fall back to software when required and go back to hardware once the driver says it's safe). As a part of bringing up support for overlay planes, Kwin will also be able to use a DRM_PLANE_TYPE_OVERLAY for a hardware cursor, negating the need for any hacks. Kwin will safely use a hardware cursor on our overlay plane, and compositors which can't/won't implement a similar feature will just use it as a traditional DRM_PLANE_TYPE_OVERLAY.

This series only implements support for ARGB/ABGR on the overlay plane. This is really only useful for cursors, media player overlays/graphics, etc. and does not magically make video work. The final piece in the zero-copy video playback puzzle will be supporting the direct scanout of planar YUV video formats. This may not be trivial.

drivers/gpu/drm/apple/apple_drv.c Show resolved Hide resolved
drivers/gpu/drm/apple/apple_drv.c Outdated Show resolved Hide resolved
drivers/gpu/drm/apple/apple_drv.c Outdated Show resolved Hide resolved
@@ -318,16 +382,29 @@ static int apple_probe_per_dcp(struct device *dev,
struct apple_crtc *crtc;
struct apple_connector *connector;
struct apple_encoder *enc;
struct drm_plane *primary;
int ret;
struct drm_plane *planes[DCP_MAX_PLANES];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we only need a single plane pointer (since we do not support a cursor plane)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need the pointers to set the immutable zpos property in a sane way.

if (IS_ERR(primary))
return PTR_ERR(primary);
/* Set up our other planes */
for (i = 1; i < DCP_MAX_PLANES; i++) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any advantage of offering more than overlay plane if we can blend only 2?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only reason I did it like this instead of hard-coding two planes was to make it easier to extend if/when we figure out how to get all four surfaces to play nice somehow. I can revert back to hard coded planes if you don't think there's value in doing it like this

* Plane order is nondeterministic for this iterator. DCP will
* almost always crash at some point if the z order of planes
* flip-flops around. Make sure we are always blending them
* in the correct order.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems strange and I wouldn't expect that changing the order is the cause of the crash but something else. We of course need a deterministic order for proper display.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be a race leaving a dangling pointer somewhere from all the surfaces swapping order, but I definitely did isolate the underlying cause to the fact that they were swapping order. If you leave everything as-is and comment out normalize_zpos it will happen.

drivers/gpu/drm/apple/iomfb_template.c Outdated Show resolved Hide resolved
* having 4 surfaces, we can only blend two. Surface 0 is also
* unusable on some machines, so ignore it.
*/
l = MAX_BLEND_SURFACES - new_state->normalized_zpos;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A better approach with the current state of supported planes and formats would be to hard-code l based on plane type

@chadmed chadmed force-pushed the dcp-overlay-no-cursor-hack branch 2 times, most recently from 25cb4ec to db6d1c7 Compare July 1, 2024 08:00
@chadmed
Copy link
Author

chadmed commented Jul 1, 2024

Rebased and addressed feedback

@chadmed
Copy link
Author

chadmed commented Aug 30, 2024

Rebased on top of bits/200-dcp and base for PR changed

alyssarosenzweig and others added 11 commits September 16, 2024 13:45
Add a DRM/KMS driver for Apple system on chips using the DCP
coprocessor, namely the Apple M1. The DCP was added in Apple A14; this
driver does not apply to older iDevices.

This driver targets the DCP firmware API shipped by macOS 12.1.
Currently no incompatibilities with macOS 12.0.1 or 12.2.1 are known.

Signed-off-by: Alyssa Rosenzweig <[email protected]>
Co-developed-by: Janne Grunau <[email protected]>
Signed-off-by: Janne Grunau <[email protected]>
Xorg startup with modesetting driver triggers this. Move vblank
signalling to dcp to avoid a circular dependency between apple_drv
and dcp.

Signed-off-by: Janne Grunau <[email protected]>
Frequency differs between M1 and M1 Pro/Max/Ultra.

Signed-off-by: Janne Grunau <[email protected]>
The framebuffer can be unreferenced by GEM while the display controller
is still using it for scanout resulting in IOVA faults and crashed dcp.
dcp has to hold a reference until the swap is complete.

Signed-off-by: Janne Grunau <[email protected]>
Use the firmware setPowerState callback.

Signed-off-by: Alyssa Rosenzweig <[email protected]>
fixup! WIP: drm/apple: Add DCP display driver

Signed-off-by: Janne Grunau <[email protected]>
jannau and others added 19 commits October 6, 2024 19:52
Two callbacks for IOMFB::MemDescRelay seems to be dropped between 12.3
and 13.5 DCP firmware. This results in the renumbering of
MemDescRelay::release_descriptor from D456 to D454.
Noticed while when switching the display refresh rate to 50 Hz with a
14.5 system firmware on a M1 Max Macbook Pro.

Signed-off-by: Janne Grunau <[email protected]>
The most likely cause for this is an unexpected callback form which the
current driver doesn't recover. Warn only once about it.

Signed-off-by: Janne Grunau <[email protected]>
Reduce log spam while errors are still likely due missing state checks.
Can be still enabled by adding `apple_dcp.hdmi_audio` the kernel command
line.

Signed-off-by: Janne Grunau <[email protected]>
Since we don't init/uses drm's vblank support our page flip timestamps
are CLOCK_MONOTONIC timestamps during the event generation. Since
compositors use the timestamp to schedule their next kms commit this is
timing sensitive sop move it under the drivers control.
Take the timestamp directly in the swap_complete callback.
Framebuffer swaps are unfortunately not fast with DCP. Measured time
from swap_submit to swap_complete is ~1.5 ms for dcp and ~2.3 ms for
dcpext. This warrants further investigation. Presentation timestamps
might help if delay on dcp firmware side occurs after the actual swap.
In the meantime doctor the time stamps and move the page flip completion
up to 1 ms earler. This fixes half rate refresh on external displays
displays using dcpext.

Signed-off-by: Janne Grunau <[email protected]>
This works even before the DMA device probes. Might help deal with
runtime-pm ordering, though it doesn't solve the deferred ordering
problem (since we're creating the link while already probing)...

Signed-off-by: Asahi Lina <[email protected]>
Allow the DMA device driver to probe late, and still create the sound
device upfront. Instead try to request the DMA channel on first PCM
open. This should be safe as long as we bail early and don't allow the
process to continue to configuring buffers (since that requires the DMA
to be configured).

Signed-off-by: Asahi Lina <[email protected]>
This appears to be lane count as "2" is observed for USB-C DP alt mode
in shared DP/USB3 mode.

Signed-off-by: Janne Grunau <[email protected]>
This unfortunately doesn't work relieably with typec-altmode-displayport
since the oob hotplug notification arrives before atc-phy is configured
to the appropiate DP mode.

Signed-off-by: Janne Grunau <[email protected]>
Fixes failure to unmap buffers in dcpep_cb_unmap_piodma() due to the
unaligned size. Further along this causes kernel log splat when DCP
tries to map the buffers again since thye IOVA is still in use.
This causes no apparent issue although map_piodma callback signals an
errror and returns 0 (unmapped as DVA).

It's not clear why this presents only randomly. Possibly some build or
uninitialized memory triggers this unmap/free and immediate allocate/map
cycle in the DCP firmware. I never notices this with a clang-built kernel
on j314c. It showed with gcc build with the Fedora config at least on
6.8.8 based kernels. This did not reproduce on j375d.

Signed-off-by: Janne Grunau <[email protected]>
The dpavserv endpoint uses various EPICProviderClass depending on the
connected display.

Observed values:
- "AppleDCPAgileCDIDPDisplay" (j134c, dcp, panel)
- "AppleDCPMCDP29XX" (j274, dcp, hdmi)
- "AppleDCPPS190" (j474s, dcpext0, hdmi)
- "DCPDPService" (j474s, dcpext1, typec)

So match against against EPICName which is consistent in all cases.
This also allows the distinction between 'dcpav-service-epic' and
'dcpdp-service-epic'. Not sure what the second EPIC service is used for.

Signed-off-by: Janne Grunau <[email protected]>
Known uses EDID retrieval and raw I2C access.

Signed-off-by: Janne Grunau <[email protected]>
External display only since the EDID provided by integrated panels
holds no useful / correct information.

Signed-off-by: Janne Grunau <[email protected]>
The for_each_oldnew_plane_in_state iterator is nondeterministic
in terms of the order of planes. DCP expects surfaces to be
fed to it in the correct order. Relying on the iterator to
lazily increment the index into surf[] means we cannot meet
this expectation. The constant reordering of planes in
the surf[] array seems to cause DCP to crash under certain
circumstances. Cursors will also often be drawn under the main
plane, which is less than ideal.

Populate surf[] in the order everyone expects us to. This fixes
a whole host of odd behaviour when wiring up multiple DRM universal
planes.

Signed-off-by: James Calligeros <[email protected]>
Despite having 4 surfaces, DCP can only blend two of them at once.

Constrain swaps to two surfaces, and warn if userspace somehow
tries to give us more to swap.

Signed-off-by: James Calligeros <[email protected]>
Owing to its origin in mobile devices and the Apple TV, DCP seems to
have been designed under the assumption that no one could possibly want
a rectangle to clip the screen. If a rectangle's bottom-right edge
clips the screen, DCP will instead try to scale the destination rectangle
to the best of its ability... until it can't anymore.

DCP is not tolerant to faults and will crash if the onscreen portion of
the framebuffer ends up smaller than 32x32, or if any dimension ends up
entirely offscreen.

Use apple_plane_atomic_check() to reject requested plane states that
could crash DCP. This is the final piece of the puzzle required to
enable preliminary support for overlay planes on Apple Silicon devices.

Signed-off-by: James Calligeros <[email protected]>
DCP is capable of compositing two surfaces in hardware. This is
important for zero-copy video playback, etc.

Set up an overlay plane so that userspace can do cool things with
it.

Signed-off-by: James Calligeros <[email protected]>
Fix the call to drm_atomic_helper_check_plane_state to use
the correct scaling factors.

Signed-off-by: James Calligeros <[email protected]>
Some userspace may not handle invalid plane checks gracefully
when falling back to a software cursor. This will manifest
as the screen freezing, recoverable by moving the cursor away
from a screen edge. Throw a warning once to let the user know
why this has happened.

Signed-off-by: James Calligeros <[email protected]>
Userspace cannot be trusted to give us a sane zpos value, but given DCP's
requirement that the primary plane always be the bottommost surface, we
can't rely on drm_atomic_normalize_zpos() to do the job for us either.

Make the zpos property immutable, and keep the primary plane at zpos 0.

Signed-off-by: James Calligeros <[email protected]>
@chadmed
Copy link
Author

chadmed commented Oct 14, 2024

kwin now supports using overlay planes for cursors (queued for 6.3, backported to 6.2 in fedora) so we now have a use case!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.