Skip to content

Commit

Permalink
mm: thrash detection-based file cache sizing
Browse files Browse the repository at this point in the history
The VM maintains cached filesystem pages on two types of lists.  One list
holds the pages recently faulted into the cache, the other list holds
pages that have been referenced repeatedly on that first list.  The idea
is to prefer reclaiming young pages over those that have shown to benefit
from caching in the past.  We call the recently usedbut ultimately was not
significantly better than a FIFO policy and still thrashed cache based on
eviction speed, rather than actual demand for cache.

This patch solves one half of the problem by decoupling the ability to
detect working set changes from the inactive list size.  By maintaining a
history of recently evicted file pages it can detect frequently used pages
with an arbitrarily small inactive list size, and subsequently apply
pressure on the active list based on actual demand for cache, not just
overall eviction speed.

Every zone maintains a counter that tracks inactive list aging speed.
When a page is evicted, a snapshot of this counter is stored in the
now-empty page cache radix tree slot.  On refault, the minimum access
distance of the page can be assessed, to evaluate whether the page should
be part of the active list or not.

This fixes the VM's blindness towards working set changes in excess of the
inactive list.  And it's the foundation to further improve the protection
ability and reduce the minimum inactive list size of 50%.

Signed-off-by: Johannes Weiner <[email protected]>
Reviewed-by: Rik van Riel <[email protected]>
Reviewed-by: Minchan Kim <[email protected]>
Reviewed-by: Bob Liu <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Dave Chinner <[email protected]>
Cc: Greg Thelen <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: KOSAKI Motohiro <[email protected]>
Cc: Luigi Semenzato <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Metin Doslu <[email protected]>
Cc: Michel Lespinasse <[email protected]>
Cc: Ozgun Erdogan <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Roman Gushchin <[email protected]>
Cc: Ryan Mallon <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
  • Loading branch information
hnaz authored and sfrothwell committed Mar 6, 2014
1 parent c52f57c commit f0ee4b4
Show file tree
Hide file tree
Showing 8 changed files with 331 additions and 23 deletions.
5 changes: 5 additions & 0 deletions include/linux/mmzone.h
Original file line number Diff line number Diff line change
Expand Up @@ -142,6 +142,8 @@ enum zone_stat_item {
NUMA_LOCAL, /* allocation from local node */
NUMA_OTHER, /* allocation from other node */
#endif
WORKINGSET_REFAULT,
WORKINGSET_ACTIVATE,
NR_ANON_TRANSPARENT_HUGEPAGES,
NR_FREE_CMA_PAGES,
NR_VM_ZONE_STAT_ITEMS };
Expand Down Expand Up @@ -392,6 +394,9 @@ struct zone {
spinlock_t lru_lock;
struct lruvec lruvec;

/* Evictions & activations on the inactive file list */
atomic_long_t inactive_age;

unsigned long pages_scanned; /* since last reclaim */
unsigned long flags; /* zone flags, see below */

Expand Down
5 changes: 5 additions & 0 deletions include/linux/swap.h
Original file line number Diff line number Diff line change
Expand Up @@ -260,6 +260,11 @@ struct swap_list_t {
int next; /* swapfile to be used next */
};

/* linux/mm/workingset.c */
void *workingset_eviction(struct address_space *mapping, struct page *page);
bool workingset_refault(void *shadow);
void workingset_activation(struct page *page);

/* linux/mm/page_alloc.c */
extern unsigned long totalram_pages;
extern unsigned long totalreserve_pages;
Expand Down
2 changes: 1 addition & 1 deletion mm/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ obj-y := filemap.o mempool.o oom_kill.o fadvise.o \
util.o mmzone.o vmstat.o backing-dev.o \
mm_init.o mmu_context.o percpu.o slab_common.o \
compaction.o balloon_compaction.o \
interval_tree.o list_lru.o $(mmu-y)
interval_tree.o list_lru.o workingset.o $(mmu-y)

obj-y += init-mm.o

Expand Down
61 changes: 44 additions & 17 deletions mm/filemap.c
Original file line number Diff line number Diff line change
Expand Up @@ -469,7 +469,7 @@ int replace_page_cache_page(struct page *old, struct page *new, gfp_t gfp_mask)
EXPORT_SYMBOL_GPL(replace_page_cache_page);

static int page_cache_tree_insert(struct address_space *mapping,
struct page *page)
struct page *page, void **shadowp)
{
void **slot;
int error;
Expand All @@ -484,6 +484,8 @@ static int page_cache_tree_insert(struct address_space *mapping,
radix_tree_replace_slot(slot, page);
mapping->nrshadows--;
mapping->nrpages++;
if (shadowp)
*shadowp = p;
return 0;
}
error = radix_tree_insert(&mapping->page_tree, page->index, page);
Expand All @@ -492,18 +494,10 @@ static int page_cache_tree_insert(struct address_space *mapping,
return error;
}

/**
* add_to_page_cache_locked - add a locked page to the pagecache
* @page: page to add
* @mapping: the page's address_space
* @offset: page index
* @gfp_mask: page allocation mode
*
* This function is used to add a page to the pagecache. It must be locked.
* This function does not add the page to the LRU. The caller must do that.
*/
int add_to_page_cache_locked(struct page *page, struct address_space *mapping,
pgoff_t offset, gfp_t gfp_mask)
static int __add_to_page_cache_locked(struct page *page,
struct address_space *mapping,
pgoff_t offset, gfp_t gfp_mask,
void **shadowp)
{
int error;

Expand All @@ -526,7 +520,7 @@ int add_to_page_cache_locked(struct page *page, struct address_space *mapping,
page->index = offset;

spin_lock_irq(&mapping->tree_lock);
error = page_cache_tree_insert(mapping, page);
error = page_cache_tree_insert(mapping, page, shadowp);
radix_tree_preload_end();
if (unlikely(error))
goto err_insert;
Expand All @@ -542,16 +536,49 @@ int add_to_page_cache_locked(struct page *page, struct address_space *mapping,
page_cache_release(page);
return error;
}

/**
* add_to_page_cache_locked - add a locked page to the pagecache
* @page: page to add
* @mapping: the page's address_space
* @offset: page index
* @gfp_mask: page allocation mode
*
* This function is used to add a page to the pagecache. It must be locked.
* This function does not add the page to the LRU. The caller must do that.
*/
int add_to_page_cache_locked(struct page *page, struct address_space *mapping,
pgoff_t offset, gfp_t gfp_mask)
{
return __add_to_page_cache_locked(page, mapping, offset,
gfp_mask, NULL);
}
EXPORT_SYMBOL(add_to_page_cache_locked);

int add_to_page_cache_lru(struct page *page, struct address_space *mapping,
pgoff_t offset, gfp_t gfp_mask)
{
void *shadow = NULL;
int ret;

ret = add_to_page_cache(page, mapping, offset, gfp_mask);
if (ret == 0)
lru_cache_add_file(page);
__set_page_locked(page);
ret = __add_to_page_cache_locked(page, mapping, offset,
gfp_mask, &shadow);
if (unlikely(ret))
__clear_page_locked(page);
else {
/*
* The page might have been evicted from cache only
* recently, in which case it should be activated like
* any other repeatedly accessed page.
*/
if (shadow && workingset_refault(shadow)) {
SetPageActive(page);
workingset_activation(page);
} else
ClearPageActive(page);
lru_cache_add(page);
}
return ret;
}
EXPORT_SYMBOL_GPL(add_to_page_cache_lru);
Expand Down
2 changes: 2 additions & 0 deletions mm/swap.c
Original file line number Diff line number Diff line change
Expand Up @@ -574,6 +574,8 @@ void mark_page_accessed(struct page *page)
else
__lru_cache_activate_page(page);
ClearPageReferenced(page);
if (page_is_file_cache(page))
workingset_activation(page);
} else if (!PageReferenced(page)) {
SetPageReferenced(page);
}
Expand Down
24 changes: 19 additions & 5 deletions mm/vmscan.c
Original file line number Diff line number Diff line change
Expand Up @@ -523,7 +523,8 @@ static pageout_t pageout(struct page *page, struct address_space *mapping,
* Same as remove_mapping, but if the page is removed from the mapping, it
* gets returned with a refcount of 0.
*/
static int __remove_mapping(struct address_space *mapping, struct page *page)
static int __remove_mapping(struct address_space *mapping, struct page *page,
bool reclaimed)
{
BUG_ON(!PageLocked(page));
BUG_ON(mapping != page_mapping(page));
Expand Down Expand Up @@ -569,10 +570,23 @@ static int __remove_mapping(struct address_space *mapping, struct page *page)
swapcache_free(swap, page);
} else {
void (*freepage)(struct page *);
void *shadow = NULL;

freepage = mapping->a_ops->freepage;

__delete_from_page_cache(page, NULL);
/*
* Remember a shadow entry for reclaimed file cache in
* order to detect refaults, thus thrashing, later on.
*
* But don't store shadows in an address space that is
* already exiting. This is not just an optizimation,
* inode reclaim needs to empty out the radix tree or
* the nodes are lost. Don't plant shadows behind its
* back.
*/
if (reclaimed && page_is_file_cache(page) &&
!mapping_exiting(mapping))
shadow = workingset_eviction(mapping, page);
__delete_from_page_cache(page, shadow);
spin_unlock_irq(&mapping->tree_lock);
mem_cgroup_uncharge_cache_page(page);

Expand All @@ -595,7 +609,7 @@ static int __remove_mapping(struct address_space *mapping, struct page *page)
*/
int remove_mapping(struct address_space *mapping, struct page *page)
{
if (__remove_mapping(mapping, page)) {
if (__remove_mapping(mapping, page, false)) {
/*
* Unfreezing the refcount with 1 rather than 2 effectively
* drops the pagecache ref for us without requiring another
Expand Down Expand Up @@ -1065,7 +1079,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
}
}

if (!mapping || !__remove_mapping(mapping, page))
if (!mapping || !__remove_mapping(mapping, page, true))
goto keep_locked;

/*
Expand Down
2 changes: 2 additions & 0 deletions mm/vmstat.c
Original file line number Diff line number Diff line change
Expand Up @@ -770,6 +770,8 @@ const char * const vmstat_text[] = {
"numa_local",
"numa_other",
#endif
"workingset_refault",
"workingset_activate",
"nr_anon_transparent_hugepages",
"nr_free_cma",
"nr_dirty_threshold",
Expand Down
Loading

0 comments on commit f0ee4b4

Please sign in to comment.