Skip to content

Commit

Permalink
swiotlb: optimize get_max_slots()
Browse files Browse the repository at this point in the history
Use a simple logical shift and increment to calculate the number of slots
taken by the DMA segment boundary.

At least GCC-13 is not able to optimize the expression, producing this
horrible assembly code on x86:

	cmpq	$-1, %rcx
	je	.L364
	addq	$2048, %rcx
	shrq	$11, %rcx
	movq	%rcx, %r13
.L331:
	// rest of the function here...

	// after function epilogue and return:
.L364:
	movabsq $9007199254740992, %r13
	jmp	.L331

After the optimization, the code looks more reasonable:

	shrq	$11, %r11
	leaq	1(%r11), %rbx

Signed-off-by: Petr Tesarik <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
  • Loading branch information
Petr Tesarik authored and Christoph Hellwig committed Aug 8, 2023
1 parent f94cb36 commit d069ed2
Showing 1 changed file with 1 addition and 3 deletions.
4 changes: 1 addition & 3 deletions kernel/dma/swiotlb.c
Original file line number Diff line number Diff line change
Expand Up @@ -903,9 +903,7 @@ static inline phys_addr_t slot_addr(phys_addr_t start, phys_addr_t idx)
*/
static inline unsigned long get_max_slots(unsigned long boundary_mask)
{
if (boundary_mask == ~0UL)
return 1UL << (BITS_PER_LONG - IO_TLB_SHIFT);
return nr_slots(boundary_mask + 1);
return (boundary_mask >> IO_TLB_SHIFT) + 1;
}

static unsigned int wrap_area_index(struct io_tlb_pool *mem, unsigned int index)
Expand Down

0 comments on commit d069ed2

Please sign in to comment.