linux-toradex.git/fs/notify, branch v4.4.73

fanotify: fix list corruption in fanotify_get_response()

2016-09-30T08:18:37+00:00

commit 96d41019e3ac55f6f0115b0ce97e4f24a3d636d2 upstream.

fanotify_get_response() calls fsnotify_remove_event() when it finds that
group is being released from fanotify_release() (bypass_perm is set).

However the event it removes need not be only in the group's notification
queue but it can have already moved to access_list (userspace read the
event before closing the fanotify instance fd) which is protected by a
different lock.  Thus when fsnotify_remove_event() races with
fanotify_release() operating on access_list, the list can get corrupted.

Fix the problem by moving all the logic removing permission events from
the lists to one place - fanotify_release().

Fixes: 5838d4442bd5 ("fanotify: fix double free of pending permission events")
Link: http://lkml.kernel.org/r/1473797711-14111-3-git-send-email-jack@suse.cz
Signed-off-by: Jan Kara 
Reported-by: Miklos Szeredi 
Tested-by: Miklos Szeredi 
Reviewed-by: Miklos Szeredi 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

fsnotify: add a way to stop queueing events on group shutdown

2016-09-30T08:18:37+00:00

commit 12703dbfeb15402260e7554d32a34ac40c233990 upstream.

Implement a function that can be called when a group is being shutdown
to stop queueing new events to the group.  Fanotify will use this.

Fixes: 5838d4442bd5 ("fanotify: fix double free of pending permission events")
Link: http://lkml.kernel.org/r/1473797711-14111-2-git-send-email-jack@suse.cz
Signed-off-by: Jan Kara 
Reviewed-by: Miklos Szeredi 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

inotify: actually check for invalid bits in sys_inotify_add_watch()

2015-11-06T03:34:48+00:00

The comment here says that it is checking for invalid bits.  But, the mask
is *actually* checking to ensure that _any_ valid bit is set, which is
quite different.

Without this check, an unexpected bit could get set on an inotify object.
Since these bits are also interpreted by the fsnotify/dnotify code, there
is the potential for an object to be mishandled inside the kernel.  For
instance, can we be sure that setting the dnotify flag FS_DN_RENAME on an
inotify watch is harmless?

Add the actual check which was intended.  Retain the existing inotify bits
are being added to the watch.  Plus, this is existing behavior which would
be nice to preserve.

I did a quick sniff test that inotify functions and that my
'inotify-tools' package passes 'make check'.

Signed-off-by: Dave Hansen 
Cc: John McCutchan 
Cc: Robert Love 
Cc: Eric Paris 
Cc: Josh Boyer 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

inotify: hide internal kernel bits from fdinfo

2015-11-06T03:34:48+00:00

There was a report that my patch:

    inotify: actually check for invalid bits in sys_inotify_add_watch()

broke CRIU.

The reason is that CRIU looks up raw flags in /proc/$pid/fdinfo/* to
figure out how to rebuild inotify watches and then passes those flags
directly back in to the inotify API.  One of those flags
(FS_EVENT_ON_CHILD) is set in mark->mask, but is not part of the inotify
API.  It is used inside the kernel to _implement_ inotify but it is not
and has never been part of the API.

My patch above ensured that we only allow bits which are part of the API
(IN_ALL_EVENTS).  This broke CRIU.

FS_EVENT_ON_CHILD is really internal to the kernel.  It is set _anyway_ on
all inotify marks.  So, CRIU was really just trying to set a bit that was
already set.

This patch hides that bit from fdinfo.  CRIU will not see the bit, not try
to set it, and should work as before.  We should not have been exposing
this bit in the first place, so this is a good patch independent of the
CRIU problem.

Signed-off-by: Dave Hansen 
Reported-by: Andrey Wagin 
Acked-by: Andrey Vagin 
Acked-by: Cyrill Gorcunov 
Acked-by: Eric Paris 
Cc: Pavel Emelyanov 
Cc: John McCutchan 
Cc: Robert Love 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

2015-09-06T03:34:28+00:00

Pull vfs updates from Al Viro:
 "In this one:

   - d_move fixes (Eric Biederman)

   - UFS fixes (me; locking is mostly sane now, a bunch of bugs in error
     handling ought to be fixed)

   - switch of sb_writers to percpu rwsem (Oleg Nesterov)

   - superblock scalability (Josef Bacik and Dave Chinner)

   - swapon(2) race fix (Hugh Dickins)"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (65 commits)
  vfs: Test for and handle paths that are unreachable from their mnt_root
  dcache: Reduce the scope of i_lock in d_splice_alias
  dcache: Handle escaped paths in prepend_path
  mm: fix potential data race in SyS_swapon
  inode: don't softlockup when evicting inodes
  inode: rename i_wb_list to i_io_list
  sync: serialise per-superblock sync operations
  inode: convert inode_sb_list_lock to per-sb
  inode: add hlist_fake to avoid the inode hash lock in evict
  writeback: plug writeback at a high level
  change sb_writers to use percpu_rw_semaphore
  shift percpu_counter_destroy() into destroy_super_work()
  percpu-rwsem: kill CONFIG_PERCPU_RWSEM
  percpu-rwsem: introduce percpu_rwsem_release() and percpu_rwsem_acquire()
  percpu-rwsem: introduce percpu_down_read_trylock()
  document rwsem_release() in sb_wait_write()
  fix the broken lockdep logic in __sb_start_write()
  introduce __sb_writers_{acquired,release}() helpers
  ufs_inode_get{frag,block}(): get rid of 'phys' argument
  ufs_getfrag_block(): tidy up a bit
  ...

fsnotify: get rid of fsnotify_destroy_mark_locked()

2015-09-04T23:54:41+00:00

fsnotify_destroy_mark_locked() is subtle to use because it temporarily
releases group->mark_mutex.  To avoid future problems with this
function, split it into two.

fsnotify_detach_mark() is the part that needs group->mark_mutex and
fsnotify_free_mark() is the part that must be called outside of
group->mark_mutex.  This way it's much clearer what's going on and we
also avoid some pointless acquisitions of group->mark_mutex.

Signed-off-by: Jan Kara 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

fsnotify: remove mark->free_list

2015-09-04T23:54:41+00:00

Free list is used when all marks on given inode / mount should be
destroyed when inode / mount is going away.  However we can free all of
the marks without using a special list with some care.

Signed-off-by: Jan Kara 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

fsnotify: fix check in inotify fdinfo printing

2015-09-04T23:54:41+00:00

A check in inotify_fdinfo() checking whether mark is valid was always
true due to a bug.  Luckily we can never get to invalidated marks since
we hold mark_mutex and invalidated marks get removed from the group list
when they are invalidated under that mutex.

Anyway fix the check to make code more future proof.

Signed-off-by: Jan Kara 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

fs/notify: optimize inotify/fsnotify code for unwatched files

2015-09-04T23:54:41+00:00

I have a _tiny_ microbenchmark that sits in a loop and writes single
bytes to a file.  Writing one byte to a tmpfs file is around 2x slower
than reading one byte from a file, which is a _bit_ more than I expecte.
This is a dumb benchmark, but I think it's hard to deny that write() is
a hot path and we should avoid unnecessary overhead there.

I did a 'perf record' of 30-second samples of read and write.  The top
item in a diffprofile is srcu_read_lock() from fsnotify().  There are
active inotify fd's from systemd, but nothing is actually listening to
the file or its part of the filesystem.

I *think* we can avoid taking the srcu_read_lock() for the common case
where there are no actual marks on the file.  This means that there will
both be nothing to notify for *and* implies that there is no need for
clearing the ignore mask.

This patch gave a 13.1% speedup in writes/second on my test, which is an
improvement from the 10.8% that I saw with the last version.

Signed-off-by: Dave Hansen 
Reviewed-by: Jan Kara 
Cc: Al Viro 
Cc: Eric Paris 
Cc: John McCutchan 
Cc: Robert Love 
Cc: Andi Kleen 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

inode: convert inode_sb_list_lock to per-sb

2015-08-17T22:39:46+00:00

The process of reducing contention on per-superblock inode lists
starts with moving the locking to match the per-superblock inode
list. This takes the global lock out of the picture and reduces the
contention problems to within a single filesystem. This doesn't get
rid of contention as the locks still have global CPU scope, but it
does isolate operations on different superblocks form each other.

Signed-off-by: Dave Chinner 
Signed-off-by: Josef Bacik 
Reviewed-by: Jan Kara 
Reviewed-by: Christoph Hellwig 
Tested-by: Dave Chinner