summaryrefslogtreecommitdiff
path: root/fs/notify
AgeCommit message (Collapse)Author
2022-06-09fsnotify: fix wrong lockdep annotationsAmir Goldstein
[ Upstream commit 623af4f538b5df9b416e1b82f720af7371b4c771 ] Commit 6960b0d909cd ("fsnotify: change locking order") changed some of the mark_mutex locks in direct reclaim path to use: mutex_lock_nested(&group->mark_mutex, SINGLE_DEPTH_NESTING); This change is explained: "...It uses nested locking to avoid deadlock in case we do the final iput() on an inode which still holds marks and thus would take the mutex again when calling fsnotify_inode_delete() in destroy_inode()." The problem is that the mutex_lock_nested() is not a nested lock at all. In fact, it has the opposite effect of preventing lockdep from warning about a very possible deadlock. Due to these wrong annotations, a deadlock that was introduced with nfsd filecache in kernel v5.4 went unnoticed in v5.4.y for over two years until it was reported recently by Khazhismel Kumykov, only to find out that the deadlock was already fixed in kernel v5.5. Fix the wrong lockdep annotations. Cc: Khazhismel Kumykov <khazhy@google.com> Fixes: 6960b0d909cd ("fsnotify: change locking order") Link: https://lore.kernel.org/r/20220321112310.vpr7oxro2xkz5llh@quack3.lan/ Link: https://lore.kernel.org/r/20220422120327.3459282-4-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-09inotify: show inotify mask flags in proc fdinfoAmir Goldstein
[ Upstream commit a32e697cda27679a0327ae2cafdad8c7170f548f ] The inotify mask flags IN_ONESHOT and IN_EXCL_UNLINK are not "internal to kernel" and should be exposed in procfs fdinfo so CRIU can restore them. Fixes: 6933599697c9 ("inotify: hide internal kernel bits from fdinfo") Link: https://lore.kernel.org/r/20220422120327.3459282-2-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-02-05fanotify: Fix stale file descriptor in copy_event_to_user()Dan Carpenter
commit ee12595147ac1fbfb5bcb23837e26dd58d94b15d upstream. This code calls fd_install() which gives the userspace access to the fd. Then if copy_info_records_to_user() fails it calls put_unused_fd(fd) but that will not release it and leads to a stale entry in the file descriptor table. Generally you can't trust the fd after a call to fd_install(). The fix is to delay the fd_install() until everything else has succeeded. Fortunately it requires CAP_SYS_ADMIN to reach this code so the security impact is less. Fixes: f644bc449b37 ("fanotify: fix copy_event_to_user() fid error clean up") Link: https://lore.kernel.org/r/20220128195656.GA26981@kili Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Mathias Krause <minipli@grsecurity.net> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-18fanotify: limit number of event merge attemptsAmir Goldstein
commit b8cd0ee8cda68a888a317991c1e918a8cba1a568 upstream. Event merges are expensive when event queue size is large, so limit the linear search to 128 merge tests. [Stable backport notes] The following statement from upstream commit is irrelevant for backport: - -In combination with 128 size hash table, there is a potential to merge -with up to 16K events in the hashed queue. - [Stable backport notes] The problem is as old as fanotify and described in the linked cover letter "Performance improvement for fanotify merge". This backported patch fixes the performance issue at the cost of merging fewer potential events. Fixing the performance issue is more important than preserving the "event merge" behavior, which was not predictable in any way that applications could rely on. Link: https://lore.kernel.org/r/20210304104826.3993892-6-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/linux-fsdevel/20210202162010.305971-1-amir73il@gmail.com/ Link: https://lore.kernel.org/linux-fsdevel/20210915163334.GD6166@quack2.suse.cz/ Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-23fanotify: fix copy_event_to_user() fid error clean upMatthew Bobrowski
[ Upstream commit f644bc449b37cc32d3ce7b36a88073873aa21bd5 ] Ensure that clean up is performed on the allocated file descriptor and struct file object in the event that an error is encountered while copying fid info objects. Currently, we return directly to the caller when an error is experienced in the fid info copying helper, which isn't ideal given that the listener process could be left with a dangling file descriptor in their fdtable. Fixes: 5e469c830fdb ("fanotify: copy event fid info to user") Fixes: 44d705b0370b ("fanotify: report name info for FAN_DIR_MODIFY event") Link: https://lore.kernel.org/linux-fsdevel/YMKv1U7tNPK955ho@google.com/T/#m15361cd6399dad4396aad650de25dbf6b312288e Link: https://lore.kernel.org/r/1ef8ae9100101eb1a91763c516c2e9a3a3b112bd.1623376346.git.repnop@google.com Signed-off-by: Matthew Bobrowski <repnop@google.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-01-17fanotify: Fix sys_fanotify_mark() on native x86-32Brian Gerst
commit 2ca408d9c749c32288bc28725f9f12ba30299e8f upstream. Commit 121b32a58a3a ("x86/entry/32: Use IA32-specific wrappers for syscalls taking 64-bit arguments") converted native x86-32 which take 64-bit arguments to use the compat handlers to allow conversion to passing args via pt_regs. sys_fanotify_mark() was however missed, as it has a general compat handler. Add a config option that will use the syscall wrapper that takes the split args for native 32-bit. [ bp: Fix typo in Kconfig help text. ] Fixes: 121b32a58a3a ("x86/entry/32: Use IA32-specific wrappers for syscalls taking 64-bit arguments") Reported-by: Paweł Jasiak <pawel@jasiak.xyz> Signed-off-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Jan Kara <jack@suse.cz> Acked-by: Andy Lutomirski <luto@kernel.org> Link: https://lkml.kernel.org/r/20201130223059.101286-1-brgerst@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-30fsnotify: fix events reported to watching parent and childAmir Goldstein
commit fecc4559780d52d174ea05e3bf543669165389c3 upstream. fsnotify_parent() used to send two separate events to backends when a parent inode is watching children and the child inode is also watching. In an attempt to avoid duplicate events in fanotify, we unified the two backend callbacks to a single callback and handled the reporting of the two separate events for the relevant backends (inotify and dnotify). However the handling is buggy and can result in inotify and dnotify listeners receiving events of the type they never asked for or spurious events. The problem is the unified event callback with two inode marks (parent and child) is called when any of the parent and child inodes are watched and interested in the event, but the parent inode's mark that is interested in the event on the child is not necessarily the one we are currently reporting to (it could belong to a different group). So before reporting the parent or child event flavor to backend we need to check that the mark is really interested in that event flavor. The semantics of INODE and CHILD marks were hard to follow and made the logic more complicated than it should have been. Replace it with INODE and PARENT marks semantics to hopefully make the logic more clear. Thanks to Hugh Dickins for spotting a bug in the earlier version of this patch. Fixes: 497b0c5a7c06 ("fsnotify: send event to parent and child with single callback") CC: stable@vger.kernel.org Link: https://lore.kernel.org/r/20201202120713.702387-4-amir73il@gmail.com Reported-by: Hugh Dickins <hughd@google.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-30inotify: convert to handle_inode_event() interfaceAmir Goldstein
commit 1a2620a99803ad660edc5d22fd9c66cce91ceb1c upstream. Convert inotify to use the simple handle_inode_event() interface to get rid of the code duplication between the generic helper fsnotify_handle_event() and the inotify_handle_event() callback, which also happen to be buggy code. The bug will be fixed in the generic helper. Link: https://lore.kernel.org/r/20201202120713.702387-3-amir73il@gmail.com CC: stable@vger.kernel.org Fixes: b9a1b9772509 ("fsnotify: create method handle_inode_event() in fsnotify_operations") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-30fsnotify: generalize handle_inode_event()Amir Goldstein
commit 950cc0d2bef078e1f6459900ca4d4b2a2e0e3c37 upstream. The handle_inode_event() interface was added as (quoting comment): "a simple variant of handle_event() for groups that only have inode marks and don't have ignore mask". In other words, all backends except fanotify. The inotify backend also falls under this category, but because it required extra arguments it was left out of the initial pass of backends conversion to the simple interface. This results in code duplication between the generic helper fsnotify_handle_event() and the inotify_handle_event() callback which also happen to be buggy code. Generalize the handle_inode_event() arguments and add the check for FS_EXCL_UNLINK flag to the generic helper, so inotify backend could be converted to use the simple interface. Link: https://lore.kernel.org/r/20201202120713.702387-2-amir73il@gmail.com CC: stable@vger.kernel.org Fixes: b9a1b9772509 ("fsnotify: create method handle_inode_event() in fsnotify_operations") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-11-09fanotify: fix logic of reporting name info with watched parentAmir Goldstein
The victim inode's parent and name info is required when an event needs to be delivered to a group interested in filename info OR when the inode's parent is interested in an event on its children. Let us call the first condition 'parent_needed' and the second condition 'parent_interested'. In fsnotify_parent(), the condition where the inode's parent is interested in some events on its children, but not necessarily interested the specific event is called 'parent_watched'. fsnotify_parent() tests the condition (!parent_watched && !parent_needed) for sending the event without parent and name info, which is correct. It then wrongly assumes that parent_watched implies !parent_needed and tests the condition (parent_watched && !parent_interested) for sending the event without parent and name info, which is wrong, because parent may still be needed by some group. For example, after initializing a group with FAN_REPORT_DFID_NAME and adding a FAN_MARK_MOUNT with FAN_OPEN mask, open events on non-directory children of "testdir" are delivered with file name info. After adding another mark to the same group on the parent "testdir" with FAN_CLOSE|FAN_EVENT_ON_CHILD mask, open events on non-directory children of "testdir" are no longer delivered with file name info. Fix the logic and use auxiliary variables to clarify the conditions. Fixes: 9b93f33105f5 ("fsnotify: send event with parent/name info to sb/mount/non-dir marks") Cc: stable@vger.kernel.org#v5.9 Link: https://lore.kernel.org/r/20201108105906.8493-1-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-10-18mm, memcg: rework remote charging API to support nestingRoman Gushchin
Currently the remote memcg charging API consists of two functions: memalloc_use_memcg() and memalloc_unuse_memcg(), which set and clear the memcg value, which overwrites the memcg of the current task. memalloc_use_memcg(target_memcg); <...> memalloc_unuse_memcg(); It works perfectly for allocations performed from a normal context, however an attempt to call it from an interrupt context or just nest two remote charging blocks will lead to an incorrect accounting. On exit from the inner block the active memcg will be cleared instead of being restored. memalloc_use_memcg(target_memcg); memalloc_use_memcg(target_memcg_2); <...> memalloc_unuse_memcg(); Error: allocation here are charged to the memcg of the current process instead of target_memcg. memalloc_unuse_memcg(); This patch extends the remote charging API by switching to a single function: struct mem_cgroup *set_active_memcg(struct mem_cgroup *memcg), which sets the new value and returns the old one. So a remote charging block will look like: old_memcg = set_active_memcg(target_memcg); <...> set_active_memcg(old_memcg); This patch is heavily based on the patch by Johannes Weiner, which can be found here: https://lkml.org/lkml/2020/5/28/806 . Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Dan Schatzberg <dschatzberg@fb.com> Link: https://lkml.kernel.org/r/20200821212056.3769116-1-guro@fb.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-23treewide: Use fallthrough pseudo-keywordGustavo A. R. Silva
Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through markings when it is the case. [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2020-07-28fanotify: compare fsid when merging name eventJan Kara
When merging name events, fsids of the two involved events have to match. Otherwise we could merge events from two different filesystems and thus effectively loose the second event. Backporting note: Although the commit cacfb956d46e introducing this bug was merged for 5.7, the relevant code didn't get used in the end until 7e8283af6ede ("fanotify: report parent fid + name + child fid") which will be merged with this patch. So there's no need for backporting this. Fixes: cacfb956d46e ("fanotify: record name info for FAN_DIR_MODIFY event") Reported-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-07-27fsnotify: create method handle_inode_event() in fsnotify_operationsAmir Goldstein
The method handle_event() grew a lot of complexity due to the design of fanotify and merging of ignore masks. Most backends do not care about this complex functionality, so we can hide this complexity from them. Introduce a method handle_inode_event() that serves those backends and passes a single inode mark and less arguments. This change converts all backends except fanotify and inotify to use the simplified handle_inode_event() method. In pricipal, inotify could have also used the new method, but that would require passing more arguments on the simple helper (data, data_type, cookie), so we leave it with the handle_event() method. Link: https://lore.kernel.org/r/20200722125849.17418-9-amir73il@gmail.com Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-07-27fanotify: report parent fid + child fidAmir Goldstein
Add support for FAN_REPORT_FID | FAN_REPORT_DIR_FID. Internally, it is implemented as a private case of reporting both parent and child fids and name, the parent and child fids are recorded in a variable length fanotify_name_event, but there is no name. It should be noted that directory modification events are recorded in fixed size fanotify_fid_event when not reporting name, just like with group flags FAN_REPORT_FID. Link: https://lore.kernel.org/r/20200716084230.30611-23-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-07-27fanotify: report parent fid + name + child fidAmir Goldstein
For a group with fanotify_init() flag FAN_REPORT_DFID_NAME, the parent fid and name are reported for events on non-directory objects with an info record of type FAN_EVENT_INFO_TYPE_DFID_NAME. If the group also has the init flag FAN_REPORT_FID, the child fid is also reported with another info record that follows the first info record. The second info record is the same info record that would have been reported to a group with only FAN_REPORT_FID flag. When the child fid needs to be recorded, the variable size struct fanotify_name_event is preallocated with enough space to store the child fh between the dir fh and the name. Link: https://lore.kernel.org/r/20200716084230.30611-22-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-07-27fanotify: add support for FAN_REPORT_NAMEAmir Goldstein
Introduce a new fanotify_init() flag FAN_REPORT_NAME. It requires the flag FAN_REPORT_DIR_FID and there is a constant for setting both flags named FAN_REPORT_DFID_NAME. For a group with flag FAN_REPORT_NAME, the parent fid and name are reported for directory entry modification events (create/detete/move) and for events on non-directory objects. Events on directories themselves are reported with their own fid and "." as the name. The parent fid and name are reported with an info record of type FAN_EVENT_INFO_TYPE_DFID_NAME, similar to the way that parent fid is reported with into type FAN_EVENT_INFO_TYPE_DFID, but with an appended null terminated name string. Link: https://lore.kernel.org/r/20200716084230.30611-21-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-07-27fanotify: report events with parent dir fid to sb/mount/non-dir marksAmir Goldstein
In a group with flag FAN_REPORT_DIR_FID, when adding an inode mark with FAN_EVENT_ON_CHILD, events on non-directory children are reported with the fid of the parent. When adding a filesystem or mount mark or mark on a non-dir inode, we want to report events that are "possible on child" (e.g. open/close) also with fid of the parent, as if the victim inode's parent is interested in events "on child". Some events, currently only FAN_MOVE_SELF, should be reported to a sb/mount/non-dir mark with parent fid even though they are not reported to a watching parent. To get the desired behavior we set the flag FAN_EVENT_ON_CHILD on all the sb/mount/non-dir mark masks in a group with FAN_REPORT_DIR_FID. Link: https://lore.kernel.org/r/20200716084230.30611-20-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-07-27fanotify: add basic support for FAN_REPORT_DIR_FIDAmir Goldstein
For now, the flag is mutually exclusive with FAN_REPORT_FID. Events include a single info record of type FAN_EVENT_INFO_TYPE_DFID with a directory file handle. For now, events are only reported for: - Directory modification events - Events on children of a watching directory - Events on directory objects Soon, we will add support for reporting the parent directory fid for events on non-directories with filesystem/mount mark and support for reporting both parent directory fid and child fid. Link: https://lore.kernel.org/r/20200716084230.30611-19-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-07-27fsnotify: send event with parent/name info to sb/mount/non-dir marksAmir Goldstein
Similar to events "on child" to watching directory, send event with parent/name info if sb/mount/non-dir marks are interested in parent/name info. The FS_EVENT_ON_CHILD flag can be set on sb/mount/non-dir marks to specify interest in parent/name info for events on non-directory inodes. Events on "orphan" children (disconnected dentries) are sent without parent/name info. Events on directories are sent with parent/name info only if the parent directory is watching. After this change, even groups that do not subscribe to events on children could get an event with mark iterator type TYPE_CHILD and without mark iterator type TYPE_INODE if fanotify has marks on the same objects. dnotify and inotify event handlers can already cope with that situation. audit does not subscribe to events that are possible on child, so won't get to this situation. nfsd does not access the marks iterator from its event handler at the moment, so it is not affected. This is a bit too fragile, so we should prepare all groups to cope with mark type TYPE_CHILD preferably using a generic helper. Link: https://lore.kernel.org/r/20200716084230.30611-16-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-07-27inotify: do not set FS_EVENT_ON_CHILD in non-dir mark maskAmir Goldstein
FS_EVENT_ON_CHILD has currently no meaning for non-dir inode marks. In the following patches we want to use that bit to mean that mark's notification group cares about parent and name information. So stop setting FS_EVENT_ON_CHILD for non-dir marks. Link: https://lore.kernel.org/r/20200722125849.17418-3-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-07-27fsnotify: pass dir and inode arguments to fsnotify()Amir Goldstein
The arguments of fsnotify() are overloaded and mean different things for different event types. Replace the to_tell argument with separate arguments @dir and @inode, because we may be sending to both dir and child. Using the @data argument to pass the child is not enough, because dirent events pass this argument (for audit), but we do not report to child. Document the new fsnotify() function argumenets. Link: https://lore.kernel.org/r/20200722125849.17418-7-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-07-27fsnotify: create helper fsnotify_inode()Amir Goldstein
Simple helper to consolidate biolerplate code. Link: https://lore.kernel.org/r/20200722125849.17418-5-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-07-27fsnotify: send event to parent and child with single callbackAmir Goldstein
Instead of calling fsnotify() twice, once with parent inode and once with child inode, if event should be sent to parent inode, send it with both parent and child inodes marks in object type iterator and call the backend handle_event() callback only once. The parent inode is assigned to the standard "inode" iterator type and the child inode is assigned to the special "child" iterator type. In that case, the bit FS_EVENT_ON_CHILD will be set in the event mask, the dir argument to handle_event will be the parent inode, the file_name argument to handle_event is non NULL and refers to the name of the child and the child inode can be accessed with fsnotify_data_inode(). This will allow fanotify to make decisions based on child or parent's ignored mask. For example, when a parent is interested in a specific event on its children, but a specific child wishes to ignore this event, the event will not be reported. This is not what happens with current code, but according to man page, it is the expected behavior. Link: https://lore.kernel.org/r/20200716084230.30611-15-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-07-27inotify: report both events on parent and child with single callbackAmir Goldstein
fsnotify usually calls inotify_handle_event() once for watching parent to report event with child's name and once for watching child to report event without child's name. Do the same thing with a single callback instead of two callbacks when marks iterator contains both inode and child entries. Link: https://lore.kernel.org/r/20200716084230.30611-13-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-07-27dnotify: report both events on parent and child with single callbackAmir Goldstein
For some events (e.g. DN_ATTRIB on sub-directory) fsnotify may call dnotify_handle_event() once for watching parent and once again for the watching sub-directory. Do the same thing with a single callback instead of two callbacks when marks iterator contains both inode and child entries. Link: https://lore.kernel.org/r/20200716084230.30611-12-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-07-27fanotify: no external fh buffer in fanotify_name_eventAmir Goldstein
The fanotify_fh struct has an inline buffer of size 12 which is enough to store the most common local filesystem file handles (e.g. ext4, xfs). For file handles that do not fit in the inline buffer (e.g. btrfs), an external buffer is allocated to store the file handle. When allocating a variable size fanotify_name_event, there is no point in allocating also an external fh buffer when file handle does not fit in the inline buffer. Check required size for encoding fh, preallocate an event buffer sufficient to contain both file handle and name and store the name after the file handle. At this time, when not reporting name in event, we still allocate the fixed size fanotify_fid_event and an external buffer for large file handles, but fanotify_alloc_name_event() has already been prepared to accept a NULL file_name. Link: https://lore.kernel.org/r/20200716084230.30611-11-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-07-27fanotify: use struct fanotify_info to parcel the variable size bufferAmir Goldstein
An fanotify event name is always recorded relative to a dir fh. Encapsulate the name_len member of fanotify_name_event in a new struct fanotify_info, which describes the parceling of the variable size buffer of an fanotify_name_event. The dir_fh member of fanotify_name_event is renamed to _dir_fh and is not accessed directly, but via the fanotify_info_dir_fh() accessor. Although the dir_fh len information is already available in struct fanotify_fh, we store it also in dif_fh_totlen member of fanotify_info, including the size of fanotify_fh header, so we know the offset of the name in the buffer without looking inside the dir_fh. We also add a file_fh_totlen member to allow packing another file handle in the variable size buffer after the dir_fh and before the name. We are going to use that space to store the child fid. Link: https://lore.kernel.org/r/20200716084230.30611-10-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-07-27fanotify: use FAN_EVENT_ON_CHILD as implicit flag on sb/mount/non-dir marksAmir Goldstein
Up to now, fanotify allowed to set the FAN_EVENT_ON_CHILD flag on sb/mount marks and non-directory inode mask, but the flag was ignored. Mask out the flag if it is provided by user on sb/mount/non-dir marks and define it as an implicit flag that cannot be removed by user. This flag is going to be used internally to request for events with parent and name info. Link: https://lore.kernel.org/r/20200716084230.30611-8-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-07-27fanotify: prepare for implicit event flags in mark maskAmir Goldstein
So far, all flags that can be set in an fanotify mark mask can be set explicitly by a call to fanotify_mark(2). Prepare for defining implicit event flags that cannot be set by user with fanotify_mark(2), similar to how inotify/dnotify implicitly set the FS_EVENT_ON_CHILD flag. Implicit event flags cannot be removed by user and mark gets destroyed when only implicit event flags remain in the mask. Link: https://lore.kernel.org/r/20200716084230.30611-7-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-07-27fanotify: mask out special event flags from ignored maskAmir Goldstein
The special event flags (FAN_ONDIR, FAN_EVENT_ON_CHILD) never had any meaning in ignored mask. Mask them out explicitly. Link: https://lore.kernel.org/r/20200716084230.30611-6-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-07-27fanotify: generalize test for FAN_REPORT_FIDAmir Goldstein
As preparation for new flags that report fids, define a bit set of flags for a group reporting fids, currently containing the only bit FAN_REPORT_FID. Link: https://lore.kernel.org/r/20200716084230.30611-5-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-07-27fanotify: distinguish between fid encode error and null fidAmir Goldstein
In fanotify_encode_fh(), both cases of NULL inode and failure to encode ended up with fh type FILEID_INVALID. Distiguish the case of NULL inode, by setting fh type to FILEID_ROOT. This is just a semantic difference at this point. Remove stale comment and unneeded check from fid event compare helpers. Link: https://lore.kernel.org/r/20200716084230.30611-4-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-07-27fanotify: generalize merge logic of events on dirAmir Goldstein
An event on directory should never be merged with an event on non-directory regardless of the event struct type. This change has no visible effect, because currently, with struct fanotify_path_event, the relevant events will not be merged because event path of dir will be different than event path of non-dir. Link: https://lore.kernel.org/r/20200716084230.30611-3-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-07-27fanotify: generalize the handling of extra event flagsAmir Goldstein
In fanotify_group_event_mask() there is logic in place to make sure we are not going to handle an event with no type and just FAN_ONDIR flag. Generalize this logic to any FANOTIFY_EVENT_FLAGS. There is only one more flag in this group at the moment - FAN_EVENT_ON_CHILD. We never report it to user, but we do pass it in to fanotify_alloc_event() when group is reporting fid as indication that event happened on child. We will have use for this indication later on. Link: https://lore.kernel.org/r/20200716084230.30611-2-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-07-27fanotify: remove event FAN_DIR_MODIFYAmir Goldstein
It was never enabled in uapi and its functionality is about to be superseded by events FAN_CREATE, FAN_DELETE, FAN_MOVE with group flag FAN_REPORT_NAME. Keep a place holder variable name_event instead of removing the name recording code since it will be used by the new events. Link: https://lore.kernel.org/r/20200708111156.24659-17-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-07-27fsnotify: pass dir argument to handle_event() callbackAmir Goldstein
The 'inode' argument to handle_event(), sometimes referred to as 'to_tell' is somewhat obsolete. It is a remnant from the times when a group could only have an inode mark associated with an event. We now pass an iter_info array to the callback, with all marks associated with an event. Most backends ignore this argument, with two exceptions: 1. dnotify uses it for sanity check that event is on directory 2. fanotify uses it to report fid of directory on directory entry modification events Remove the 'inode' argument and add a 'dir' argument. The callback function signature is deliberately changed, because the meaning of the argument has changed and the arguments have been documented. The 'dir' argument is set to when 'file_name' is specified and it is referring to the directory that the 'file_name' entry belongs to. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-07-15fanotify: break up fanotify_alloc_event()Amir Goldstein
Break up fanotify_alloc_event() into helpers by event struct type. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-07-15fanotify: create overflow event typeAmir Goldstein
The special overflow event is allocated as struct fanotify_path_event, but with a null path. Use a special event type to identify the overflow event, so the helper fanotify_has_event_path() will always indicate a non null path. Allocating the overflow event doesn't need any of the fancy stuff in fanotify_alloc_event(), so create a simplified helper for allocating the overflow event. There is also no need to store and report the pid with an overflow event. Link: https://lore.kernel.org/r/20200708111156.24659-7-amir73il@gmail.com Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-07-15inotify: do not use objectid when comparing eventsAmir Goldstein
inotify's event->wd is the object identifier. Compare that instead of the common fsnotidy event objectid, so we can get rid of the objectid field later. Link: https://lore.kernel.org/r/20200708111156.24659-6-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-07-15fsnotify: return non const from fsnotify_data_inode()Amir Goldstein
Return non const inode pointer from fsnotify_data_inode(). None of the fsnotify hooks pass const inode pointer as data and callers often need to cast to a non const pointer. Link: https://lore.kernel.org/r/20200708111156.24659-3-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-07-15fsnotify: fold fsnotify() call into fsnotify_parent()Amir Goldstein
All (two) callers of fsnotify_parent() also call fsnotify() to notify the child inode. Move the second fsnotify() call into fsnotify_parent(). This will allow more flexibility in making decisions about which of the two event falvors should be sent. Using 'goto notify_child' in the inline helper seems a bit strange, but it mimics the code in __fsnotify_parent() for clarity and the goto pattern will become less strage after following patches are applied. Link: https://lore.kernel.org/r/20200708111156.24659-2-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-07-15fsnotify: Rearrange fast path to minimise overhead when there is no watcherMel Gorman
The fsnotify paths are trivial to hit even when there are no watchers and they are surprisingly expensive. For example, every successful vfs_write() hits fsnotify_modify which calls both fsnotify_parent and fsnotify unless FMODE_NONOTIFY is set which is an internal flag invisible to userspace. As it stands, fsnotify_parent is a guaranteed functional call even if there are no watchers and fsnotify() does a substantial amount of unnecessary work before it checks if there are any watchers. A perf profile showed that applying mnt->mnt_fsnotify_mask in fnotify() was almost half of the total samples taken in that function during a test. This patch rearranges the fast paths to reduce the amount of work done when there are no watchers. The test motivating this was "perf bench sched messaging --pipe". Despite the fact the pipes are anonymous, fsnotify is still called a lot and the overhead is noticeable even though it's completely pointless. It's likely the overhead is negligible for real IO so this is an extreme example. This is a comparison of hackbench using processes and pipes on a 1-socket machine with 8 CPU threads without fanotify watchers. 5.7.0 5.7.0 vanilla fastfsnotify-v1r1 Amean 1 0.4837 ( 0.00%) 0.4630 * 4.27%* Amean 3 1.5447 ( 0.00%) 1.4557 ( 5.76%) Amean 5 2.6037 ( 0.00%) 2.4363 ( 6.43%) Amean 7 3.5987 ( 0.00%) 3.4757 ( 3.42%) Amean 12 5.8267 ( 0.00%) 5.6983 ( 2.20%) Amean 18 8.4400 ( 0.00%) 8.1327 ( 3.64%) Amean 24 11.0187 ( 0.00%) 10.0290 * 8.98%* Amean 30 13.1013 ( 0.00%) 12.8510 ( 1.91%) Amean 32 13.9190 ( 0.00%) 13.2410 ( 4.87%) 5.7.0 5.7.0 vanilla fastfsnotify-v1r1 Duration User 157.05 152.79 Duration System 1279.98 1219.32 Duration Elapsed 182.81 174.52 This is showing that the latencies are improved by roughly 2-9%. The variability is not shown but some of these results are within the noise as this workload heavily overloads the machine. That said, the system CPU usage is reduced by quite a bit so it makes sense to avoid the overhead even if it is a bit tricky to detect at times. A perf profile of just 1 group of tasks showed that 5.14% of samples taken were in either fsnotify() or fsnotify_parent(). With the patch, 2.8% of samples were in fsnotify, mostly function entry and the initial check for watchers. The check for watchers is complicated enough that inlining it may be controversial. [Amir] Slightly simplify with mnt_or_sb_mask => marks_mask Link: https://lore.kernel.org/r/20200708111156.24659-1-amir73il@gmail.com Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-07-15fanotify: Avoid softlockups when reading many eventsJan Kara
When user provides large buffer for events and there are lots of events available, we can try to copy them all to userspace without scheduling which can softlockup the kernel (furthermore exacerbated by the contention on notification_lock). Add a scheduling point after copying each event. Note that usually the real underlying problem is the cost of fanotify event merging and the resulting contention on notification_lock but this is a cheap way to somewhat reduce the problem until we can properly address that. Reported-by: Francesco Ruggeri <fruggeri@arista.com> Link: https://lore.kernel.org/lkml/20200714025417.A25EB95C0339@us180.sjc.aristanetworks.com Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-06-14treewide: replace '---help---' in Kconfig files with 'help'Masahiro Yamada
Since commit 84af7a6194e4 ("checkpatch: kconfig: prefer 'help' over '---help---'"), the number of '---help---' has been gradually decreasing, but there are still more than 2400 instances. This commit finishes the conversion. While I touched the lines, I also fixed the indentation. There are a variety of indentation styles found. a) 4 spaces + '---help---' b) 7 spaces + '---help---' c) 8 spaces + '---help---' d) 1 space + 1 tab + '---help---' e) 1 tab + '---help---' (correct indentation) f) 1 tab + 1 space + '---help---' g) 1 tab + 2 spaces + '---help---' In order to convert all of them to 1 tab + 'help', I ran the following commend: $ find . -name 'Kconfig*' | xargs sed -i 's/^[[:space:]]*---help---/\thelp/' Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-06-04Merge tag 'fsnotify_for_v5.8-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fsnotify updates from Jan Kara: "Several smaller fixes and cleanups for fsnotify subsystem" * tag 'fsnotify_for_v5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: fanotify: fix ignore mask logic for events on child and on dir fanotify: don't write with size under sizeof(response) fsnotify: Remove proc_fs.h include fanotify: remove reference to fill_event_metadata() fsnotify: add mutex destroy fanotify: prefix should_merge() fanotify: Replace zero-length array with flexible-array inotify: Fix error return code assignment flow. fsnotify: Add missing annotation for fsnotify_finish_user_wait() and for fsnotify_prepare_user_wait()
2020-06-01Merge tag 'docs-5.8' of git://git.lwn.net/linuxLinus Torvalds
Pull documentation updates from Jonathan Corbet: "A fair amount of stuff this time around, dominated by yet another massive set from Mauro toward the completion of the RST conversion. I *really* hope we are getting close to the end of this. Meanwhile, those patches reach pretty far afield to update document references around the tree; there should be no actual code changes there. There will be, alas, more of the usual trivial merge conflicts. Beyond that we have more translations, improvements to the sphinx scripting, a number of additions to the sysctl documentation, and lots of fixes" * tag 'docs-5.8' of git://git.lwn.net/linux: (130 commits) Documentation: fixes to the maintainer-entry-profile template zswap: docs/vm: Fix typo accept_threshold_percent in zswap.rst tracing: Fix events.rst section numbering docs: acpi: fix old http link and improve document format docs: filesystems: add info about efivars content Documentation: LSM: Correct the basic LSM description mailmap: change email for Ricardo Ribalda docs: sysctl/kernel: document unaligned controls Documentation: admin-guide: update bug-hunting.rst docs: sysctl/kernel: document ngroups_max nvdimm: fixes to maintainter-entry-profile Documentation/features: Correct RISC-V kprobes support entry Documentation/features: Refresh the arch support status files Revert "docs: sysctl/kernel: document ngroups_max" docs: move locking-specific documents to locking/ docs: move digsig docs to the security book docs: move the kref doc into the core-api book docs: add IRQ documentation at the core-api book docs: debugging-via-ohci1394.txt: add it to the core-api book docs: fix references for ipmi.rst file ...
2020-05-27fanotify: turn off support for FAN_DIR_MODIFYAmir Goldstein
FAN_DIR_MODIFY has been enabled by commit 44d705b0370b ("fanotify: report name info for FAN_DIR_MODIFY event") in 5.7-rc1. Now we are planning further extensions to the fanotify API and during that we realized that FAN_DIR_MODIFY may behave slightly differently to be more consistent with extensions we plan. So until we finalize these extensions, let's not bind our hands with exposing FAN_DIR_MODIFY to userland. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-05-25fanotify: fix ignore mask logic for events on child and on dirAmir Goldstein
The comments in fanotify_group_event_mask() say: "If the event is on dir/child and this mark doesn't care about events on dir/child, don't send it!" Specifically, mount and filesystem marks do not care about events on child, but they can still specify an ignore mask for those events. For example, a group that has: - A mount mark with mask 0 and ignore_mask FAN_OPEN - An inode mark on a directory with mask FAN_OPEN | FAN_OPEN_EXEC with flag FAN_EVENT_ON_CHILD A child file open for exec would be reported to group with the FAN_OPEN event despite the fact that FAN_OPEN is in ignore mask of mount mark, because the mark iteration loop skips over non-inode marks for events on child when calculating the ignore mask. Move ignore mask calculation to the top of the iteration loop block before excluding marks for events on dir/child. Link: https://lore.kernel.org/r/20200524072441.18258-1-amir73il@gmail.com Reported-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/linux-fsdevel/20200521162443.GA26052@quack2.suse.cz/ Fixes: 55bf882c7f13 "fanotify: fix merging marks masks with FAN_ONDIR" Fixes: b469e7e47c8a "fanotify: fix handling of events on child..." Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-05-13fanotify: don't write with size under sizeof(response)Fabian Frederick
fanotify_write() only aligned copy_from_user size to sizeof(response) for higher values. This patch avoids all values below as suggested by Amir Goldstein and set to response size unconditionally. Link: https://lore.kernel.org/r/20200512181921.405973-1-fabf@skynet.be Signed-off-by: Fabian Frederick <fabf@skynet.be> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>