diff options
author | Hugh Dickins <hughd@google.com> | 2015-12-11 13:40:55 -0800 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2015-12-12 10:15:34 -0800 |
commit | 267a4c76bbdb950688d3aeb020976c2918064584 (patch) | |
tree | 8794c27b8a8b2e10687cf19729e03f0fffad597c /mm/shmem.c | |
parent | dbe409e4f5e5075bd9ff7f8dd5c627abf3ee38c1 (diff) |
tmpfs: fix shmem_evict_inode() warnings on i_blocks
Dmitry Vyukov provides a little program, autogenerated by syzkaller,
which races a fault on a mapping of a sparse memfd object, against
truncation of that object below the fault address: run repeatedly for a
few minutes, it reliably generates shmem_evict_inode()'s
WARN_ON(inode->i_blocks).
(But there's nothing specific to memfd here, nor to the fstat which it
happened to use to generate the fault: though that looked suspicious,
since a shmem_recalc_inode() had been added there recently. The same
problem can be reproduced with open+unlink in place of memfd_create, and
with fstatfs in place of fstat.)
v3.7 commit 0f3c42f522dc ("tmpfs: change final i_blocks BUG to WARNING")
explains one cause of such a warning (a race with shmem_writepage to
swap), and possible solutions; but we never took it further, and this
syzkaller incident turns out to have a different cause.
shmem_getpage_gfp()'s error recovery, when a freshly allocated page is
then found to be beyond eof, looks plausible - decrementing the alloced
count that was just before incremented - but in fact can go wrong, if a
racing thread (the truncator, for example) gets its shmem_recalc_inode()
in just after our delete_from_page_cache(). delete_from_page_cache()
decrements nrpages, that shmem_recalc_inode() will balance the books by
decrementing alloced itself, then our decrement of alloced take it one
too low: leading to the WARNING when the object is finally evicted.
Once the new page has been exposed in the page cache,
shmem_getpage_gfp() must leave it to shmem_recalc_inode() itself to get
the accounting right in all cases (and not fall through from "trunc:" to
"decused:"). Adjust that error recovery block; and the reinitialization
of info and sbinfo can be removed too.
While we're here, fix shmem_writepage() to avoid the original issue: it
will be safe against a racing shmem_recalc_inode(), if it merely
increments swapped before the shmem_delete_from_page_cache() which
decrements nrpages (but it must then do its own shmem_recalc_inode()
before that, while still in balance, instead of after). (Aside: why do
we shmem_recalc_inode() here in the swap path? Because its raison d'etre
is to cope with clean sparse shmem pages being reclaimed behind our
back: so here when swapping is a good place to look for that case.) But
I've not now managed to reproduce this bug, even without the patch.
I don't see why I didn't do that earlier: perhaps inhibited by the
preference to eliminate shmem_recalc_inode() altogether. Driven by this
incident, I do now have a patch to do so at last; but still want to sit
on it for a bit, there's a couple of questions yet to be resolved.
Signed-off-by: Hugh Dickins <hughd@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Diffstat (limited to 'mm/shmem.c')
-rw-r--r-- | mm/shmem.c | 34 |
1 files changed, 14 insertions, 20 deletions
diff --git a/mm/shmem.c b/mm/shmem.c index 9187eee4128b..2afcdbbdb685 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -843,14 +843,14 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc) list_add_tail(&info->swaplist, &shmem_swaplist); if (add_to_swap_cache(page, swap, GFP_ATOMIC) == 0) { - swap_shmem_alloc(swap); - shmem_delete_from_page_cache(page, swp_to_radix_entry(swap)); - spin_lock(&info->lock); - info->swapped++; shmem_recalc_inode(inode); + info->swapped++; spin_unlock(&info->lock); + swap_shmem_alloc(swap); + shmem_delete_from_page_cache(page, swp_to_radix_entry(swap)); + mutex_unlock(&shmem_swaplist_mutex); BUG_ON(page_mapped(page)); swap_writepage(page, wbc); @@ -1078,7 +1078,7 @@ repeat: if (sgp != SGP_WRITE && sgp != SGP_FALLOC && ((loff_t)index << PAGE_CACHE_SHIFT) >= i_size_read(inode)) { error = -EINVAL; - goto failed; + goto unlock; } if (page && sgp == SGP_WRITE) @@ -1246,11 +1246,15 @@ clear: /* Perhaps the file has been truncated since we checked */ if (sgp != SGP_WRITE && sgp != SGP_FALLOC && ((loff_t)index << PAGE_CACHE_SHIFT) >= i_size_read(inode)) { + if (alloced) { + ClearPageDirty(page); + delete_from_page_cache(page); + spin_lock(&info->lock); + shmem_recalc_inode(inode); + spin_unlock(&info->lock); + } error = -EINVAL; - if (alloced) - goto trunc; - else - goto failed; + goto unlock; } *pagep = page; return 0; @@ -1258,23 +1262,13 @@ clear: /* * Error recovery. */ -trunc: - info = SHMEM_I(inode); - ClearPageDirty(page); - delete_from_page_cache(page); - spin_lock(&info->lock); - info->alloced--; - inode->i_blocks -= BLOCKS_PER_PAGE; - spin_unlock(&info->lock); decused: - sbinfo = SHMEM_SB(inode->i_sb); if (sbinfo->max_blocks) percpu_counter_add(&sbinfo->used_blocks, -1); unacct: shmem_unacct_blocks(info->flags, 1); failed: - if (swap.val && error != -EINVAL && - !shmem_confirm_swap(mapping, index, swap)) + if (swap.val && !shmem_confirm_swap(mapping, index, swap)) error = -EEXIST; unlock: if (page) { |