Skip to content

Commit

Permalink
mm: count zeromap read and set for swapout and swapin
Browse files Browse the repository at this point in the history
When the proportion of folios from the zeromap is small, missing their
accounting may not significantly impact profiling.  However, it's easy to
construct a scenario where this becomes an issue—for example, allocating
1 GB of memory, writing zeros from userspace, followed by MADV_PAGEOUT,
and then swapping it back in.  In this case, the swap-out and swap-in
counts seem to vanish into a black hole, potentially causing semantic
ambiguity.

On the other hand, Usama reported that zero-filled pages can exceed 10% in
workloads utilizing zswap, while Hailong noted that some app in Android
have more than 6% zero-filled pages.  Before commit 0ca0c24 ("mm:
store zero pages to be swapped out in a bitmap"), both zswap and zRAM
implemented similar optimizations, leading to these optimized-out pages
being counted in either zswap or zRAM counters (with pswpin/pswpout also
increasing for zRAM).  With zeromap functioning prior to both zswap and
zRAM, userspace will no longer detect these swap-out and swap-in actions.

We have three ways to address this:

1. Introduce a dedicated counter specifically for the zeromap.

2. Use pswpin/pswpout accounting, treating the zero map as a standard
   backend.  This approach aligns with zRAM's current handling of
   same-page fills at the device level.  However, it would mean losing the
   optimized-out page counters previously available in zRAM and would not
   align with systems using zswap.  Additionally, as noted by Nhat Pham,
   pswpin/pswpout counters apply only to I/O done directly to the backend
   device.

3. Count zeromap pages under zswap, aligning with system behavior when
   zswap is enabled.  However, this would not be consistent with zRAM, nor
   would it align with systems lacking both zswap and zRAM.

Given the complications with options 2 and 3, this patch selects
option 1.

We can find these counters from /proc/vmstat (counters for the whole
system) and memcg's memory.stat (counters for the interested memcg).

For example:

$ grep -E 'swpin_zero|swpout_zero' /proc/vmstat
swpin_zero 1648
swpout_zero 33536

$ grep -E 'swpin_zero|swpout_zero' /sys/fs/cgroup/system.slice/memory.stat
swpin_zero 3905
swpout_zero 3985

This patch does not address any specific zeromap bug, but the missing
swpout and swpin counts for zero-filled pages can be highly confusing and
may mislead user-space agents that rely on changes in these counters as
indicators.  Therefore, we add a Fixes tag to encourage the inclusion of
this counter in any kernel versions with zeromap.

Many thanks to Kanchana for the contribution of changing
count_objcg_event() to count_objcg_events() to support large folios[1],
which has now been incorporated into this patch.

[1] https://lkml.kernel.org/r/[email protected]

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 0ca0c24 ("mm: store zero pages to be swapped out in a bitmap")
Co-developed-by: Kanchana P Sridhar <[email protected]>
Signed-off-by: Barry Song <[email protected]>
Reviewed-by: Nhat Pham <[email protected]>
Reviewed-by: Chengming Zhou <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Cc: Usama Arif <[email protected]>
Cc: Yosry Ahmed <[email protected]>
Cc: Hailong Liu <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Shakeel Butt <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Baolin Wang <[email protected]>
Cc: Chris Li <[email protected]>
Cc: "Huang, Ying" <[email protected]>
Cc: Kairui Song <[email protected]>
Cc: Ryan Roberts <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
  • Loading branch information
Barry Song authored and akpm00 committed Nov 11, 2024
1 parent c289f4d commit e7ac4da
Show file tree
Hide file tree
Showing 7 changed files with 43 additions and 8 deletions.
9 changes: 9 additions & 0 deletions Documentation/admin-guide/cgroup-v2.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1599,6 +1599,15 @@ The following nested keys are defined.
pglazyfreed (npn)
Amount of reclaimed lazyfree pages

swpin_zero
Number of pages swapped into memory and filled with zero, where I/O
was optimized out because the page content was detected to be zero
during swapout.

swpout_zero
Number of zero-filled pages swapped out with I/O skipped due to the
content being detected as zero.

zswpin
Number of pages moved in to memory from zswap.

Expand Down
12 changes: 7 additions & 5 deletions include/linux/memcontrol.h
Original file line number Diff line number Diff line change
Expand Up @@ -1760,8 +1760,9 @@ static inline int memcg_kmem_id(struct mem_cgroup *memcg)

struct mem_cgroup *mem_cgroup_from_slab_obj(void *p);

static inline void count_objcg_event(struct obj_cgroup *objcg,
enum vm_event_item idx)
static inline void count_objcg_events(struct obj_cgroup *objcg,
enum vm_event_item idx,
unsigned long count)
{
struct mem_cgroup *memcg;

Expand All @@ -1770,7 +1771,7 @@ static inline void count_objcg_event(struct obj_cgroup *objcg,

rcu_read_lock();
memcg = obj_cgroup_memcg(objcg);
count_memcg_events(memcg, idx, 1);
count_memcg_events(memcg, idx, count);
rcu_read_unlock();
}

Expand Down Expand Up @@ -1825,8 +1826,9 @@ static inline struct mem_cgroup *mem_cgroup_from_slab_obj(void *p)
return NULL;
}

static inline void count_objcg_event(struct obj_cgroup *objcg,
enum vm_event_item idx)
static inline void count_objcg_events(struct obj_cgroup *objcg,
enum vm_event_item idx,
unsigned long count)
{
}

Expand Down
2 changes: 2 additions & 0 deletions include/linux/vm_event_item.h
Original file line number Diff line number Diff line change
Expand Up @@ -134,6 +134,8 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
#ifdef CONFIG_SWAP
SWAP_RA,
SWAP_RA_HIT,
SWPIN_ZERO,
SWPOUT_ZERO,
#ifdef CONFIG_KSM
KSM_SWPIN_COPY,
#endif
Expand Down
4 changes: 4 additions & 0 deletions mm/memcontrol.c
Original file line number Diff line number Diff line change
Expand Up @@ -431,6 +431,10 @@ static const unsigned int memcg_vm_event_stat[] = {
PGDEACTIVATE,
PGLAZYFREE,
PGLAZYFREED,
#ifdef CONFIG_SWAP
SWPIN_ZERO,
SWPOUT_ZERO,
#endif
#ifdef CONFIG_ZSWAP
ZSWPIN,
ZSWPOUT,
Expand Down
16 changes: 16 additions & 0 deletions mm/page_io.c
Original file line number Diff line number Diff line change
Expand Up @@ -204,14 +204,22 @@ static bool is_folio_zero_filled(struct folio *folio)

static void swap_zeromap_folio_set(struct folio *folio)
{
struct obj_cgroup *objcg = get_obj_cgroup_from_folio(folio);
struct swap_info_struct *sis = swp_swap_info(folio->swap);
int nr_pages = folio_nr_pages(folio);
swp_entry_t entry;
unsigned int i;

for (i = 0; i < folio_nr_pages(folio); i++) {
entry = page_swap_entry(folio_page(folio, i));
set_bit(swp_offset(entry), sis->zeromap);
}

count_vm_events(SWPOUT_ZERO, nr_pages);
if (objcg) {
count_objcg_events(objcg, SWPOUT_ZERO, nr_pages);
obj_cgroup_put(objcg);
}
}

static void swap_zeromap_folio_clear(struct folio *folio)
Expand Down Expand Up @@ -503,6 +511,7 @@ static void sio_read_complete(struct kiocb *iocb, long ret)
static bool swap_read_folio_zeromap(struct folio *folio)
{
int nr_pages = folio_nr_pages(folio);
struct obj_cgroup *objcg;
bool is_zeromap;

/*
Expand All @@ -517,6 +526,13 @@ static bool swap_read_folio_zeromap(struct folio *folio)
if (!is_zeromap)
return false;

objcg = get_obj_cgroup_from_folio(folio);
count_vm_events(SWPIN_ZERO, nr_pages);
if (objcg) {
count_objcg_events(objcg, SWPIN_ZERO, nr_pages);
obj_cgroup_put(objcg);
}

folio_zero_range(folio, 0, folio_size(folio));
folio_mark_uptodate(folio);
return true;
Expand Down
2 changes: 2 additions & 0 deletions mm/vmstat.c
Original file line number Diff line number Diff line change
Expand Up @@ -1415,6 +1415,8 @@ const char * const vmstat_text[] = {
#ifdef CONFIG_SWAP
"swap_ra",
"swap_ra_hit",
"swpin_zero",
"swpout_zero",
#ifdef CONFIG_KSM
"ksm_swpin_copy",
#endif
Expand Down
6 changes: 3 additions & 3 deletions mm/zswap.c
Original file line number Diff line number Diff line change
Expand Up @@ -1053,7 +1053,7 @@ static int zswap_writeback_entry(struct zswap_entry *entry,

count_vm_event(ZSWPWB);
if (entry->objcg)
count_objcg_event(entry->objcg, ZSWPWB);
count_objcg_events(entry->objcg, ZSWPWB, 1);

zswap_entry_free(entry);

Expand Down Expand Up @@ -1483,7 +1483,7 @@ bool zswap_store(struct folio *folio)

if (objcg) {
obj_cgroup_charge_zswap(objcg, entry->length);
count_objcg_event(objcg, ZSWPOUT);
count_objcg_events(objcg, ZSWPOUT, 1);
}

/*
Expand Down Expand Up @@ -1577,7 +1577,7 @@ bool zswap_load(struct folio *folio)

count_vm_event(ZSWPIN);
if (entry->objcg)
count_objcg_event(entry->objcg, ZSWPIN);
count_objcg_events(entry->objcg, ZSWPIN, 1);

if (swapcache) {
zswap_entry_free(entry);
Expand Down

0 comments on commit e7ac4da

Please sign in to comment.