linux-toradex.git/fs/nfs, branch master

Merge tag 'nfsd-7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

2026-06-18T16:14:15+00:00

Pull nfsd updates from Chuck Lever:
 "Jeff Layton wired up netlink upcalls for the auth.unix.ip and
  auth.unix.gid caches in SunRPC and the svc_export and nfsd.fh caches
  in NFSD. The new kernel-user API is more extensible and lays the
  groundwork for retiring the old pipe interface.

  The default NFS r/w block size rises to 4MB on hosts with at least
  16GB of RAM, reducing per-RPC overhead on fast networks. Smaller
  machines keep their previously computed default, and the value remains
  tunable through /proc/fs/nfsd/max_block_size.

  Chuck Lever converted the server's RPCSEC GSS Kerberos code to the
  kernel's shared crypto/krb5 library. The conversion retires and
  removes SunRPC's bespoke implementation of Kerberos v5, but keeps
  RPCSEC GSS-API.

  Continuing the xdrgen migration that converted the NLMv4 server XDR
  layer in v7.1, Chuck Lever converted the NLM version 3 server-side XDR
  layer from hand-written C to xdrgen-generated code. As with the NLMv4
  conversion in v7.1, the goals are improved memory safety, lower
  maintenance burden, and groundwork for generation of Rust code for
  this layer instead of C.

  Chuck Lever fixed an issue where lingering NFSv4 state pins a mounted
  file system after it is unexported. A new netlink-based mechanism can
  now release NLM locks and NFSv4 state by client address, by
  filesystem, and by export. Now an administrator can quiesce an export
  cleanly before unmounting it.

  The remaining patches are bug fixes, clean-ups, and minor
  optimizations, including a batch of memory-leak and use-after-free
  fixes in the ACL, lockd, and TLS handshake paths, many of them
  reported by Chris Mason. Sincere thanks to all contributors,
  reviewers, testers, and bug reporters who participated in the v7.2
  NFSD development cycle"

* tag 'nfsd-7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (106 commits)
  svcrdma: wake sq waiters when the transport closes
  nfsd: reset write verifier on deferred writeback errors
  nfsd: avoid leaking pre-allocated openowner on unconfirmed retry race
  sunrpc: wait for in-flight TLS handshake callback when cancel loses race
  sunrpc: pin svc_xprt across the asynchronous TLS handshake callback
  nfsd: fix posix_acl leak on SETACL decode failure
  nfsd: fix posix_acl leak and ignored error in nfsd4_create_file
  nfsd: check get_user() return when reading princhashlen
  nfsd: fix inverted cp_ttl check in async copy reaper
  nfsd: fix dead ACL conflict guard in nfsd4_create
  NFSD: Fix SECINFO_NO_NAME decode error cleanup
  sunrpc: harden rq_procinfo lifecycle to prevent double-free
  SUNRPC: Return an error from xdr_buf_to_bvec() on overflow
  SUNRPC: Bound-check xdr_buf_to_bvec() stores before writing
  nfsd: release layout stid on setlease failure
  lockd: Avoid hashing uninitialized bytes in nlm4svc_lookup_file()
  lockd: Plug nlm_file refcount leak on cached nlm_do_fopen() failure
  lockd: Plug nlm_file leak when nlm_do_fopen() fails
  Revert "NFSD: Defer sub-object cleanup in export put callbacks"
  Revert "svcrdma: Use contiguous pages for RDMA Read sink buffers"
  ...

Merge tag 'lsm-pr-20260615' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm

2026-06-17T11:34:16+00:00

Pull lsm update from Paul Moore:
 "A single LSM update the security_inode_listsecurity() hook to be able
  to leverage the xattr_list_one() helper function.

  We wanted to do this for a while, but we needed to fixup the callers
  in the NFS code first. With the NFS code changes shipping in Linux
  v7.0 and no one complaining, it seemed a good time to complete the
  shift"

* tag 'lsm-pr-20260615' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm:
  security,fs,nfs,net: update security_inode_listsecurity() interface

Merge tag 'pull-dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

2026-06-14T22:45:31+00:00

Pull dcache updates from Al Viro:

 - d_alloc_parallel() API change (Neil's with my changes)

 - NORCU fixes

 - Reorganization and simplification of dentry eviction logic

 - Simplifying rcu_read_lock() scopes in fs/dcache.c

 - Secondary roots work - getting rid of NFS fake root dentries and
   dealing with remaining shrink_dcache_for_umount() and
   shrink_dentry_list() races

 - making cursors NORCU (surprisingly easy)

* tag 'pull-dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (22 commits)
  make cursors NORCU
  nfs: get rid of fake root dentries
  wind ->s_roots via ->d_sib instead of ->d_hash
  shrink_dentry_tree(): unify the calls of shrink_dentry_list()
  shrinking rcu_read_lock() scope in d_alloc_parallel()
  d_walk(): shrink rcu_read_lock() scope
  document dentry_kill()
  adjust calling conventions of lock_for_kill(), fold __dentry_kill() into dentry_kill()
  Document rcu_read_lock() use in select_collect2()
  Shift rcu_read_{,un}lock() inside fast_dput()
  simplify safety for lock_for_kill() slowpath
  fold lock_for_kill() and __dentry_kill() into common helper
  fold lock_for_kill() into shrink_kill()
  shrink_dentry_list(): start with removing from shrink list
  d_prune_aliases(): make sure to skip NORCU aliases
  kill d_dispose_if_unused()
  make to_shrink_list() return whether it has moved dentry to list
  select_collect(): ignore dentries on shrink lists if they have positive refcounts
  find_acceptable_alias(): skip NORCU aliases with zero refcount
  fix a race between d_find_any_alias() and final dput() of NORCU dentries
  ...

Merge tag 'vfs-7.2-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

2026-06-14T22:29:45+00:00

Pull misc vfs updates from Christian Brauner:
 "Features:

   - Reduce pipe->mutex contention by pre-allocating pages outside the
     lock in anon_pipe_write().

     anon_pipe_write() called alloc_page() once per page while holding
     pipe->mutex. The allocation can sleep doing direct reclaim and runs
     memcg charging, which extends the critical section and stalls any
     concurrent reader on the same mutex. Now up to 8 pages are
     pre-allocated before the mutex is taken, leftovers are recycled
     into the per-pipe tmp_page[] cache before unlock, and any remainder
     is released after unlock, keeping the allocator out of the critical
     section on both sides. On a writers x readers sweep with 64KB
     writes against a 1 MB pipe throughput improves 6-28% and average
     write latency drops 5-22%; under memory pressure - when the cost of
     holding the mutex across reclaim is highest - throughput improves
     21-48% and latency drops 17-33%. The microbenchmark is added to
     selftests.

   - uaccess/sockptr: fix the ignored_trailing logic in
     copy_struct_to_user() to behave as documented and the usize check
     in copy_struct_from_sockptr() for user pointers, and add
     copy_struct_{from,to}_bounce_buffer() and copy_struct_to_sockptr()
     helpers for upcoming users (IPPROTO_SMBDIRECT, IPPROTO_QUIC).

   - bpf: add a sleepable bpf_real_inode() kfunc that resolves the real
     inode backing a dentry via d_real_inode(). On overlayfs the inode
     attached to the dentry doesn't carry the underlying device
     information; this is used by the filesystem restriction BPF program
     that was merged into systemd.

   - docs: add guidelines for submitting new filesystems, motivated by
     the maintenance burden abandoned and untestable filesystems impose
     on VFS developers, blocking infrastructure work like folio
     conversions and iomap migration.

  Fixes:

   - libfs: set SB_I_NOEXEC and SB_I_NODEV by default in init_pseudo()
     and drop the now-redundant assignments in callers. This began as a
     one-line dma-buf fix for a path_noexec() warning; a pseudo
     filesystem has no reason not to set SB_I_NOEXEC. All init_pseudo()
     callers were audited: the only visible effect is on dma-buf where
     SB_I_NOEXEC silences the warning.

   - Handle set_blocksize() failures in legacy filesystems (bfs, hpfs,
     qnx4, jfs, befs, affs, isofs, minix, ntfs3, omfs). Mounting a
     device with a sector size > PAGE_SIZE crashed roughly half of them;
     the rest had the same missing error handling pattern. Plus a
     follow-up releasing the superblock buffer_head when setting the
     minix v3 block size fails.

   - mount: honour SB_NOUSER in the new mount API.

   - fs/fcntl: fix a SOFTIRQ-unsafe lock order in fasync signaling by
     switching the process-group paths of send_sigio() and send_sigurg()
     from read_lock(&tasklist_lock) to RCU, matching the single-PID
     path.

   - vfs: add an FS_USERNS_DELEGATABLE flag and set it for NFS, fixing
     delegated NFS mounts (fsopen() in a container with the mount
     performed by a privileged daemon) that broke when non-init
     s_user_ns was tied to FS_USERNS_MOUNT.

   - selftests/namespaces: fix a hang in nsid_test where an unreaped
     grandchild kept the TAP pipe write-end open, a waitpid(-1) race in
     listns_efault_test, and a false FAIL on kernels without listns()
     where the tests should SKIP.

   - filelock: fix the break_lease() stub signature for
     CONFIG_FILE_LOCKING=n.

   - init/initramfs_test: wait for the async initramfs unpacking before
     running; the test and do_populate_rootfs() share the parser state.

   - fs/coredump: reduce redundant log noise in
     validate_coredump_safety().

   - iomap: pass the correct length to fserror_report_io() in
     __iomap_write_begin().

   - backing-file: fix the backing_file_open() kerneldoc.

  Cleanups:

   - initramfs: refactor the cpio hex header parsing to use hex2bin()
     instead of the hand-rolled simple_strntoul() which is reverted, and
     extend the initramfs KUnit tests to cover header fields with 0x
     prefixes.

   - Replace __get_free_pages() and friends with kmalloc()/kzalloc()
     across quota, proc, ocfs2/dlm, nilfs2, nfs, nfsd, libfs, jfs, jbd2,
     isofs, fuse, select, namespace, configfs, binfmt_misc, bfs, and the
     do_mounts init code - part of the larger work of replacing page
     allocator calls with kmalloc().

   - Use clear_and_wake_up_bit() in unlock_buffer() and
     journal_end_buffer_io_sync() instead of open-coding the sequence.

   - Drop unused VFS exports: unexport drop_super_exclusive(), remove
     start_removing_user_path_at(), and fold __start_removing_path()
     into start_removing_path().

   - fs/read_write: narrow the __kernel_write() export with
     EXPORT_SYMBOL_FOR_MODULES().

   - vfs: uapi: retire octal and hex constants in favor of (1 << n) for
     the O_ flags. Finding a free bit for a new flag across the
     architectures was needlessly hard with the mixed bases.

   - dcache: add extra sanity checks of dead dentries in dentry_free()
     via a new DENTRY_WARN_ONCE() that also prints d_flags.

   - iov_iter: use kmemdup_array() in dup_iter() to harden the
     allocation against multiplication overflow.

   - fs/pipe: write to ->poll_usage only once.

   - vfs: remove an always-taken if-branch in find_next_fd().

   - dcache: use kmalloc_flex() for struct external_name in __d_alloc().

   - namei: use QSTR() instead of QSTR_INIT() in path_pts().

   - sync_file_range: delete dead S_ISLNK code.

   - Comment fixes: retire a stale comment in fget_task_next() and fix
     assorted spelling mistakes"

* tag 'vfs-7.2-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (73 commits)
  backing-file: fix backing_file_open() kerneldoc parameter
  iomap: pass the correct len to fserror_report_io in __iomap_write_begin
  vfs: add FS_USERNS_DELEGATABLE flag and set it for NFS
  filelock: fix break_lease() stub signature for CONFIG_FILE_LOCKING=n
  vfs: uapi: retire octal and hex numbers in favor of (1 << n) for O_ flags
  bpf: add bpf_real_inode() kfunc
  fs/read_write: Do not export __kernel_write() to the entire world
  libfs: drop redundant SB_I_NOEXEC/SB_I_NODEV in init_pseudo() callers
  libfs: set SB_I_NOEXEC and SB_I_NODEV by default in init_pseudo()
  mount: honour SB_NOUSER in the new mount API
  fs/fcntl: fix SOFTIRQ-unsafe lock order in fasync signaling
  selftests/pipe: add pipe_bench microbenchmark
  fs/pipe: pre-allocate pages outside pipe->mutex in anon_pipe_write
  fs: retire stale comment in fget_task_next()
  fs: fix spelling mistakes in comment
  bfs: replace get_zeroed_page() with kzalloc()
  binfmt_misc: replace __get_free_page() with kmalloc()
  configfs: replace __get_free_pages() with kzalloc()
  fs/namespace: use __getname() to allocate mntpath buffer
  fs/select: replace __get_free_page() with kmalloc()
  ...

Merge tag 'vfs-7.2-rc1.openat2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

2026-06-14T21:41:05+00:00

Pull openat2 updates from Christian Brauner:
 "Features:

   - Add O_EMPTYPATH to openat(2)/openat2(2). To get an operable file
     descriptor from an O_PATH file descriptor it is possible to use
     openat(fd, ".", O_DIRECTORY) for directories, but other file types
     require going through open("/proc//fd/") and thus depend
     on a functioning procfs.

     With O_EMPTYPATH an empty path string is accepted and LOOKUP_EMPTY
     is set at path resolution time, allowing to reopen the file behind
     the file descriptor directly. Selftests are included.

   - Add an OPENAT2_REGULAR flag for openat2(2) which refuses to open
     anything but regular files with the new EFTYPE error code.

     This implements the "ability to only open regular files" feature
     requested by userspace via uapi-group.org and protects services
     from being redirected to fifos, device nodes, and friends.

     All atomic_open implementations were audited for OPENAT2_REGULAR
     handling. Explicit checks were added to ceph, gfs2, nfs (v4), and
     cifs/smb - these are the filesystems whose atomic_open can
     encounter an existing non-regular file and would otherwise call
     finish_open() on it or return a misleading error code.

     The remaining implementations (9p, fuse, vboxsf, nfs v2/v3) only
     call finish_open() on freshly created files and use
     finish_no_open() for lookup hits, letting the VFS catch non-regular
     files via the do_open() safety net.

  Cleanups:

   - Migrate the openat2 selftests to the kselftest harness and move
     them under selftests/filesystems/. The tests were written in the
     early days of selftests' TAP support and the modern kselftest
     harness is much easier to follow and maintain. The contents of the
     tests are unchanged and the new emptypath tests are ported on top.

   - Make the LAST_XXX last-type constants private to fs/namei.c. The
     only user outside of fs/namei.c was ksmbd which only needs to know
     whether the last component is a regular one, so
     vfs_path_parent_lookup() now performs the LAST_NORM check
     internally. The ints are replaced with a dedicated enum last_type"

* tag 'vfs-7.2-rc1.openat2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  vfs: replace ints with enum last_type for LAST_XXX
  vfs: make LAST_XXX private to fs/namei.c
  selftests: openat2: port emptypath_test to kselftest harness
  kselftest/openat2: test for OPENAT2_REGULAR flag
  openat2: new OPENAT2_REGULAR flag support
  openat2: introduce EFTYPE error code
  selftest: add tests for O_EMPTYPATH
  vfs: add O_EMPTYPATH to openat(2)/openat2(2)
  selftests: openat2: migrate to kselftest harness
  selftests: openat2: switch from custom ARRAY_LEN to ARRAY_SIZE
  selftests: openat2: move helpers to header
  selftests: move openat2 tests to selftests/filesystems/

Merge tag 'vfs-7.2-rc1.casefold' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

2026-06-14T21:25:34+00:00

Pull vfs casefolding updates from Christian Brauner:
 "This exposes the case folding behavior of local filesystems so that
  file servers - nfsd, ksmbd, and user space file servers - can report
  the actual behavior to clients instead of guessing.

  Filesystems report case-insensitive and case-nonpreserving behavior
  via new file_kattr flags in their fileattr_get implementations. fat,
  exfat, ntfs3, hfs, hfsplus, xfs, cifs, nfs, vboxsf, and isofs are
  wired up. Local filesystems that are not explicitly handled default to
  the usual POSIX behavior of case-sensitive and case-preserving.

  nfsd uses this to report case folding via NFSv3 PATHCONF and to
  implement the NFSv4 FATTR4_CASE_INSENSITIVE and FATTR4_CASE_PRESERVING
  attributes - both have been part of the NFS protocols for decades to
  support clients on non-POSIX systems - and ksmbd reports it via
  FS_ATTRIBUTE_INFORMATION. Exposing the information through the
  fileattr uapi covers user space file servers.

  The immediate motivation is interoperability: Windows NFS clients
  hard-require servers to report case-insensitivity for Win32
  applications to work correctly, and a client that knows the server is
  case-insensitive can avoid issuing multiple LOOKUP/READDIR requests
  searching for case variants.

  The Linux NFS client already grew support for case-insensitive shares
  years ago in support of the Hammerspace NFS server - negative dentry
  caching must be disabled (a lookup for "FILE.TXT" failing must not
  cache a negative entry when "file.txt" exists) and directory change
  invalidation must drop cached case-folded name variants. Such servers
  often operate in multi-protocol environments where a single file
  service instance caters to both NFS and SMB clients, and nfsd needs to
  report case folding properly to participate as a first-class citizen
  there.

  A follow-up series brings fixes for the initial work: the nfsd
  case-info probe now uses kernel credentials, maps -ESTALE to
  NFS3ERR_STALE, and has its cost capped across READDIR entries; the nfs
  client avoids transiently zeroed case capability bits during the probe
  and skips the pathconf probe when neither field is consumed; the
  FS_CASEFOLD_FL semantics are clarified in the UAPI header; and the
  tools UAPI headers are synced"

* tag 'vfs-7.2-rc1.casefold' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (22 commits)
  nfsd: Cap case-folding probe cost across READDIR entries
  nfsd: Map -ESTALE from case probe to NFS3ERR_STALE
  nfsd: Use kernel credentials for case-info probe
  fs: Clarify FS_CASEFOLD_FL semantics in UAPI header
  nfs: Skip pathconf probe when neither field is consumed
  nfs: Avoid transient zeroed case capability bits during probe
  tools headers UAPI: Sync case-sensitivity flags from linux/fs.h
  ksmbd: Report filesystem case sensitivity via FS_ATTRIBUTE_INFORMATION
  nfsd: Implement NFSv4 FATTR4_CASE_INSENSITIVE and FATTR4_CASE_PRESERVING
  nfsd: Report export case-folding via NFSv3 PATHCONF
  isofs: Implement fileattr_get for case sensitivity
  vboxsf: Implement fileattr_get for case sensitivity
  nfs: Implement fileattr_get for case sensitivity
  cifs: Implement fileattr_get for case sensitivity
  xfs: Report case sensitivity in fileattr_get
  hfsplus: Report case sensitivity in fileattr_get
  hfs: Implement fileattr_get for case sensitivity
  ntfs3: Implement fileattr_get for case sensitivity
  exfat: Implement fileattr_get for case sensitivity
  fat: Implement fileattr_get for case sensitivity
  ...

Merge tag 'vfs-7.2-rc1.inode' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

2026-06-14T21:14:23+00:00

Pull vfs inode updates from Christian Brauner:
 "This extends the lockless ->i_count handling.

  iput() could already decrement any value greater than one locklessly
  but acquiring a reference always required taking inode->i_lock. Now
  acquiring a reference is lockless as long as the count was already at
  least 1, i.e., only the 0->1 and 1->0 transitions take the lock.

  This avoids the lock for the common cases of nfs calling into the
  inode hash and btrfs using igrab(). Cleanup-wise icount_read_once() is
  added to line up with inode_state_read_once() and the open-coded
  ->i_count loads across the tree are converted, and ihold() is
  relocated and tidied up.

  On top of that some stale lock ordering annotations are retired from
  the inode hash code: iunique() no longer takes the hash lock since the
  inode hash became RCU-searchable and s_inode_list_lock is no longer
  taken under the hash lock either"

* tag 'vfs-7.2-rc1.inode' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  fs: retire stale lock ordering annotations from inode hash
  fs: allow lockless ->i_count bumps as long as it does not transition 0->1
  fs: relocate and tidy up ihold()
  fs: add icount_read_once() and stop open-coding ->i_count loads

vfs: add FS_USERNS_DELEGATABLE flag and set it for NFS

2026-06-09T15:17:29+00:00

Commit e1c5ae59c0f2 ("fs: don't allow non-init s_user_ns for filesystems
without FS_USERNS_MOUNT") prevents the mount of any filesystem inside a
container that doesn't have FS_USERNS_MOUNT set.

This broke NFS mounts in our containerized environment. We have a daemon
somewhat like systemd-mountfsd running in the init_ns. A process does a
fsopen() inside the container and passes it to the daemon via unix
socket.

The daemon then vets that the request is for an allowed NFS server and
performs the mount. This now fails because the fc->user_ns is set to the
value in the container and NFS doesn't set FS_USERNS_MOUNT.  We don't
want to add FS_USERNS_MOUNT to NFS since that would allow the container
to mount any NFS server (even malicious ones).

Add a new FS_USERNS_DELEGATABLE flag, and enable it on NFS.

Fixes: e1c5ae59c0f2 ("fs: don't allow non-init s_user_ns for filesystems without FS_USERNS_MOUNT")
Signed-off-by: Jeff Layton 
Link: https://patch.msgid.link/20260129-twmount-v1-1-4874ed2a15c4@kernel.org
Acked-by: Anna Schumaker 
Reviewed-by: Alexander Mikhalitsyn 
Reviewed-by: Jeff Layton 
Signed-off-by: Christian Brauner (Amutable)

nfs: get rid of fake root dentries

2026-06-05T04:34:56+00:00

... just grab the reference to the (real) root we are about to return
for the first mount of this superblock and be done with that.

Once upon a time dentry tree eviction at fs shutdown used to break
if ->s_root had been spliced on top of something; that hadn't been
the case for years now, and these fake root dentries violate a bunch
of invariants.  Let's get rid of them...

Signed-off-by: Al Viro

VFS: use wait_var_event for waiting in d_alloc_parallel()

2026-06-05T04:34:54+00:00

Parallel lookup starts with a call of d_alloc_parallel().  That primitive
either returns a matching hashed dentry or allocates a new one in the
in-lookup state and returns it to the caller.  Once the caller is done
with lookup, it indicates so either by call of d_{splice_alias,add}()
or by call of d_done_lookup(); at that point dentry leaves the in-lookup
state.

If d_alloc_parallel() finds a matching in-lookup dentry, it must wait for
that dentry to leave the in-lookup state, one way or another.  Currently
by supplying wait_queue_head to d_alloc_parallel().  If d_alloc_parallel()
creates a new in-lookup dentry, the address of that wait_queue_head is stored
in ->d_wait of new dentry and stays there while it's in the in-lookup;
subsequent d_alloc_parallel() will wait on the queue found in the matching
in-lookup dentry.  Transition out of in-lookup state wakes waiters on that
queue (if any).

That works, but the calling conventions are inconvenient - the caller must
supply wait_queue_head and make sure that it survives at least until the new
in-lookup dentry leaves the in-lookup state.  That amounts to boilerplate
in the d_alloc_parallel() callers that are followed by a call of d_lookup_done()
in the same function; in cases like nfs asynchronous unlink it gets worse than
that.

This patch changes d_alloc_parallel() to use wake_up_var_locked() to
wake up waiters, and wait_var_event_spinlock() to wait.  dentry->d_lock
is used for synchronisation as it is already held and the relevant
times.

That eliminates the need of caller-supplied wait_queue_head, simplifying
the calling conventions.  Better yet, we only need one bit of information
stored in dentry itself: whether there are any waiters to be woken up,
and that can be easily stored in ->d_flags; ->d_wait goes away.

The reason we need that bit (DCACHE_LOOKUP_WAITERS) is that with wait_var
machinery the queues are shared with all kinds of stuff and there's
no way tell if any of the waiters have anything to do with our dentry;
most of the time none of them will be relevant, so we need to avoid the
pointless wakeups.

Another benefit of the new scheme comes from the fact that wakeups
have to be done outside of write-side critical areas of ->i_dir_seq;
with the old scheme we need to carry the value picked from ->d_wait from
__d_lookup_unhash() to the place where we actually wake the waiters up.
Now we can just leave DCACHE_LOOKUP_WAITERS in ->d_flags until we get
to doing wakeups - that's done within the same ->d_lock scope, so we
are fine; new bit is accessed only under ->d_lock and it's seen only
on dentries with DCACHE_PAR_LOOKUP in ->d_flags.

__d_lookup_unhash() no longer needs to re-init ->d_lru.  That was
previously shared (in a union) with ->d_wait but ->d_wait is now gone
so it no longer corrupts ->d_lru.

Co-developed-by: Al Viro  # saner handling of flags
Signed-off-by: NeilBrown 
Signed-off-by: Al Viro