summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2025-11-16Merge tag 'mm-hotfixes-stable-2025-11-16-10-40' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "7 hotfixes. 5 are cc:stable, 4 are against mm/ All are singletons - please see the respective changelogs for details" * tag 'mm-hotfixes-stable-2025-11-16-10-40' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: mm, swap: fix potential UAF issue for VMA readahead selftests/user_events: fix type cast for write_index packed member in perf_test lib/test_kho: check if KHO is enabled mm/huge_memory: fix folio split check for anon folios in swapcache MAINTAINERS: update David Hildenbrand's email address crash: fix crashkernel resource shrink mm: fix MAX_FOLIO_ORDER on powerpc configs with hugetlb
2025-11-16procfs: make /self and /thread_self dentries persistentAl Viro
... and there's no need to remember those pointers anywhere - ->kill_sb() no longer needs to bother since kill_anon_super() will take care of them anyway and proc_pid_readdir() only wants the inumbers, which we had in a couple of static variables all along. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-11-16primitives for maintaining persisitencyAl Viro
* d_make_persistent(dentry, inode) - bump refcount, mark persistent and make hashed positive. Return value is a borrowed reference to dentry; it can be used until something removes persistency (at the very least, until the parent gets unlocked, but some filesystems may have stronger exclusion). * d_make_discardable() - remove persistency mark and drop reference. d_make_persistent() is similar to combination of d_instantiate(), dget() and setting flag. The only difference is that unlike d_instantiate() it accepts hashed and unhashed negatives alike. It is always called in strong locking environment (parent held exclusive, or, in some cases, dentry coming from d_alloc_name()); if we ever start using it with parent held only shared and dentry coming from d_alloc_parallel(), we'll need to copy the in-lookup logics from __d_add(). d_make_discardable() is eqiuvalent to combination of removing flag and dput(); since flag removal requires ->d_lock, there's no point trying to avoid taking that for refcount decrement as fast_dput() does. The slow path of dput() has been taken into a helper and reused in d_make_discardable() instead. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-11-16introduce a flag for explicitly marking persistently pinned dentriesAl Viro
Some filesystems use a kinda-sorta controlled dentry refcount leak to pin dentries of created objects in dcache (and undo it when removing those). Reference is grabbed and not released, but it's not actually _stored_ anywhere. That works, but it's hard to follow and verify; among other things, we have no way to tell _which_ of the increments is intended to be an unpaired one. Worse, on removal we need to decide whether the reference had already been dropped, which can be non-trivial if that removal is on umount and we need to figure out if this dentry is pinned due to e.g. unlink() not done. Usually that is handled by using kill_litter_super() as ->kill_sb(), but there are open-coded special cases of the same (consider e.g. /proc/self). Things get simpler if we introduce a new dentry flag (DCACHE_PERSISTENT) marking those "leaked" dentries. Having it set claims responsibility for +1 in refcount. The end result this series is aiming for: * get these unbalanced dget() and dput() replaced with new primitives that would, in addition to adjusting refcount, set and clear persistency flag. * instead of having kill_litter_super() mess with removing the remaining "leaked" references (e.g. for all tmpfs files that hadn't been removed prior to umount), have the regular shrink_dcache_for_umount() strip DCACHE_PERSISTENT of all dentries, dropping the corresponding reference if it had been set. After that kill_litter_super() becomes an equivalent of kill_anon_super(). Doing that in a single step is not feasible - it would affect too many places in too many filesystems. It has to be split into a series. Here we * introduce the new flag * teach shrink_dcache_for_umount() to handle it (i.e. remove and drop refcount on anything that survives to umount with that flag still set) * teach kill_litter_super() that anything with that flag does *not* need to be unpinned. Next commits will add primitives for maintaing that flag and convert the common helpers to those. After that - a long series of per-filesystem patches converting to those primitives. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-11-16new helper: simple_done_creating()Al Viro
should be paired with simple_start_creating() - unlocks parent and drops dentry reference. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-11-16new helper: simple_remove_by_name()Al Viro
simple_recursive_removal(), but instead of victim dentry it takes parent + name. Used to be open-coded in fs/fuse/control.c, but there's no need to expose the guts of that thing there and there are other potential users, so let's lift it into libfs... Acked-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-11-15mm: fix MAX_FOLIO_ORDER on powerpc configs with hugetlbDavid Hildenbrand (Red Hat)
In the past, CONFIG_ARCH_HAS_GIGANTIC_PAGE indicated that we support runtime allocation of gigantic hugetlb folios. In the meantime it evolved into a generic way for the architecture to state that it supports gigantic hugetlb folios. In commit fae7d834c43c ("mm: add __dump_folio()") we started using CONFIG_ARCH_HAS_GIGANTIC_PAGE to decide MAX_FOLIO_ORDER: whether we could have folios larger than what the buddy can handle. In the context of that commit, we started using MAX_FOLIO_ORDER to detect page corruptions when dumping tail pages of folios. Before that commit, we assumed that we cannot have folios larger than the highest buddy order, which was obviously wrong. In commit 7b4f21f5e038 ("mm/hugetlb: check for unreasonable folio sizes when registering hstate"), we used MAX_FOLIO_ORDER to detect inconsistencies, and in fact, we found some now. Powerpc allows for configs that can allocate gigantic folio during boot (not at runtime), that do not set CONFIG_ARCH_HAS_GIGANTIC_PAGE and can exceed PUD_ORDER. To fix it, let's make powerpc select CONFIG_ARCH_HAS_GIGANTIC_PAGE with hugetlb on powerpc, and increase the maximum folio size with hugetlb to 16 GiB on 64bit (possible on arm64 and powerpc) and 1 GiB on 32 bit (powerpc). Note that on some powerpc configurations, whether we actually have gigantic pages depends on the setting of CONFIG_ARCH_FORCE_MAX_ORDER, but there is nothing really problematic about setting it unconditionally: we just try to keep the value small so we can better detect problems in __dump_folio() and inconsistencies around the expected largest folio in the system. Ideally, we'd have a better way to obtain the maximum hugetlb folio size and detect ourselves whether we really end up with gigantic folios. Let's defer bigger changes and fix the warnings first. While at it, handle gigantic DAX folios more clearly: DAX can only end up creating gigantic folios with HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD. Add a new Kconfig option HAVE_GIGANTIC_FOLIOS to make both cases clearer. In particular, worry about ARCH_HAS_GIGANTIC_PAGE only with HUGETLB_PAGE. Note: with enabling CONFIG_ARCH_HAS_GIGANTIC_PAGE on powerpc, we will now also allow for runtime allocations of folios in some more powerpc configs. I don't think this is a problem, but if it is we could handle it through __HAVE_ARCH_GIGANTIC_PAGE_RUNTIME_SUPPORTED. While __dump_page()/__dump_folio was also problematic (not handling dumping of tail pages of such gigantic folios correctly), it doesn't seem critical enough to mark it as a fix. Link: https://lkml.kernel.org/r/20251114214920.2550676-1-david@kernel.org Fixes: 7b4f21f5e038 ("mm/hugetlb: check for unreasonable folio sizes when registering hstate") Reported-by: Christophe Leroy <christophe.leroy@csgroup.eu> Closes: https://lore.kernel.org/r/3e043453-3f27-48ad-b987-cc39f523060a@csgroup.eu/ Reported-by: Sourabh Jain <sourabhjain@linux.ibm.com> Closes: https://lore.kernel.org/r/94377f5c-d4f0-4c0f-b0f6-5bf1cd7305b1@linux.ibm.com/ Signed-off-by: David Hildenbrand (Red Hat) <david@kernel.org> Cc: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Donet Tom <donettom@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-15Merge tag 'core-urgent-2025-11-15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core fix from Ingo Molnar: "Fix a broken #ifndef in the <linux/entry-virt.h> header. It hasn't caused problems upstream yet because no arch overrides arch_xfer_to_guest_mode_handle_work() at this moment" * tag 'core-urgent-2025-11-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: entry: Fix ifndef around arch_xfer_to_guest_mode_handle_work() stub
2025-11-15static_call: allow using STATIC_CALL_TRAMP_STR() from assemblyNaman Jain
STATIC_CALL_TRAMP_STR() could not be used from .S files because static_call_types.h was not safe to include in assembly as it pulled in C types/constructs that are unavailable under __ASSEMBLY__. Make the header assembly-friendly by adding __ASSEMBLY__ checks and providing only the minimal definitions needed for assembly, so that it can be safely included by .S code. This enables emitting the static call trampoline symbol name via STATIC_CALL_TRAMP_STR() directly in assembly sources, to be used with 'call' instruction. Also, move a certain definitions out of __ASSEMBLY__ checks in compiler_types.h to meet the dependencies. No functional change for C compilation. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Naman Jain <namjain@linux.microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2025-11-15Drivers: hv: VMBus protocol version 6.0Roman Kisel
The confidential VMBus is supported starting from the protocol version 6.0 onwards. Provide the required definitions. No functional changes. Signed-off-by: Roman Kisel <romank@linux.microsoft.com> Reviewed-by: Alok Tiwari <alok.a.tiwari@oracle.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2025-11-14Merge branch 'mlx5-next' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Tariq Toukan says: ==================== mlx5-next updates 2025-11-13 The following pull-request contains common mlx5 updates * 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux: net/mlx5: Expose definition for 1600Gbps link mode net/mlx5: fs, set non default device per namespace net/mlx5: fs, Add other_eswitch support for steering tables net/mlx5: Add OTHER_ESWITCH HW capabilities net/mlx5: Add direct ST mode support for RDMA PCI/TPH: Expose pcie_tph_get_st_table_loc() {rdma,net}/mlx5: Query vports mac address from device ==================== Link: https://patch.msgid.link/1763027252-1168760-1-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf after 6.18-rc5+Alexei Starovoitov
Cross-merge BPF and other fixes after downstream PR. Minor conflict in kernel/bpf/helpers.c Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-14Merge tag 'pci-v6.18-fixes-5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci Pull pci fixes from Bjorn Helgaas: - Cache the ASPM L0s/L1 Supported bits early so quirks can override them if necessary (Bjorn Helgaas) - Add quirks for PA Semi and Freescale Root Ports and a HiSilicon Wi-Fi device that are reported to have broken L0s and L1 (Shawn Lin, Bjorn Helgaas) * tag 'pci-v6.18-fixes-5' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: PCI/ASPM: Avoid L0s and L1 on Hi1105 [19e5:1105] Wi-Fi PCI/ASPM: Avoid L0s and L1 on PA Semi [1959:a002] Root Ports PCI/ASPM: Avoid L0s and L1 on Freescale [1957:0451] Root Ports PCI/ASPM: Convert quirks to override advertised link states PCI/ASPM: Add pcie_aspm_remove_cap() to override advertised link states PCI/ASPM: Cache L0s/L1 Supported so advertised link states can be overridden
2025-11-14Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfLinus Torvalds
Pull bpf fixes from Alexei Starovoitov: - Fix interaction between livepatch and BPF fexit programs (Song Liu) With Steven and Masami acks. - Fix stack ORC unwind from BPF kprobe_multi (Jiri Olsa) With Steven and Masami acks. - Fix out of bounds access in widen_imprecise_scalars() in the verifier (Eduard Zingerman) - Fix conflicts between MPTCP and BPF sockmap (Jiayuan Chen) - Fix net_sched storage collision with BPF data_meta/data_end (Eric Dumazet) - Add _impl suffix to BPF kfuncs with implicit args to avoid breaking them in bpf-next when KF_IMPLICIT_ARGS is added (Mykyta Yatsenko) * tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: selftests/bpf: Test widen_imprecise_scalars() with different stack depth bpf: account for current allocated stack depth in widen_imprecise_scalars() bpf: Add bpf_prog_run_data_pointers() selftests/bpf: Add mptcp test with sockmap mptcp: Fix proto fallback detection with BPF mptcp: Disallow MPTCP subflows from sockmap selftests/bpf: Add stacktrace ips test for raw_tp selftests/bpf: Add stacktrace ips test for kprobe_multi/kretprobe_multi x86/fgraph,bpf: Fix stack ORC unwind from kprobe_multi return probe Revert "perf/x86: Always store regs->ip in perf_callchain_kernel()" bpf: add _impl suffix for bpf_stream_vprintk() kfunc bpf:add _impl suffix for bpf_task_work_schedule* kfuncs selftests/bpf: Add tests for livepatch + bpf trampoline ftrace: bpf: Fix IPMODIFY + DIRECT in modify_ftrace_direct() ftrace: Fix BPF fexit with livepatch
2025-11-14PCI/TSM: Add pci_tsm_guest_req() for managing TDIsDan Williams
A PCIe device function interface assigned to a TVM is a TEE Device Interface (TDI). A TDI instantiated by pci_tsm_bind() needs additional steps taken by the TVM to be accepted into the TVM's Trusted Compute Boundary (TCB) and transitioned to the RUN state. pci_tsm_guest_req() is a channel for the guest to request TDISP collateral, like Device Interface Reports, and effect TDISP state changes, like LOCKED->RUN transititions. Similar to IDE establishment and pci_tsm_bind(), these are long running operations involving SPDM message passing via the DOE mailbox. The path for a TVM to invoke pci_tsm_guest_req() is: * TSM triggers exit via guest-to-host-interface ABI (implementation specific) * VMM invokes handler (KVM handle_exit() -> userspace io) * handler issues request (userspace io handler -> ioctl() -> pci_tsm_guest_req()) * handler supplies response * VMM posts response, notifies/re-enters TVM This path is purely a transport for messages from TVM to platform TSM. By design the host kernel does not and must not care about the content of these messages. I.e. the host kernel is not in the TCB of the TVM. As this is an opaque passthrough interface, similar to fwctl, the kernel requires that implementations stay within the bounds defined by 'enum pci_tsm_req_scope'. Violation of those expectations likely has market and regulatory consequences. Out of scope requests are blocked by default. Co-developed-by: Xu Yilun <yilun.xu@linux.intel.com> Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Link: https://patch.msgid.link/20251113021446.436830-8-dan.j.williams@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2025-11-14PCI/TSM: Add pci_tsm_bind() helper for instantiating TDIsDan Williams
After a PCIe device has established a secure link and session between a TEE Security Manager (TSM) and its local Device Security Manager (DSM), the device or its subfunctions are candidates to be bound to a private memory context, a TVM. A PCIe device function interface assigned to a TVM is a TEE Device Interface (TDI). The pci_tsm_bind() requests the low-level TSM driver to associate the device with private MMIO and private IOMMU context resources of a given TVM represented by a @kvm argument. A device in the bound state corresponds to the TDISP protocol LOCKED state and awaits validation by the TVM. It is a 'struct pci_tsm_link_ops' operation because, similar to IDE establishment, it involves host side resource establishment and context setup on behalf of the guest. It is also expected to be performed lazily to allow for operation of the device in non-confidential "shared" context for pre-lock configuration. Co-developed-by: Xu Yilun <yilun.xu@linux.intel.com> Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Link: https://patch.msgid.link/20251113021446.436830-7-dan.j.williams@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2025-11-14PCI/IDE: Initialize an ID for all IDE streamsDan Williams
The PCIe spec defines two types of streams - selective and link. Each stream has an ID from the same bucket so a stream ID does not tell the type. The spec defines an "enable" bit for every stream and required stream IDs to be unique among all enabled stream but there is no such requirement for disabled streams. However, when IDE_KM is programming keys, an IDE-capable device needs to know the type of stream being programmed to write it directly to the hardware as keys are relatively large, possibly many of them and devices often struggle with keeping around rather big data not being used. Walk through all streams on a device and initialise the IDs to some unique number, both link and selective. The weakest part of this proposal is the host bridge ide_stream_ids_ida. Technically, a Stream ID only needs to be unique within a given partner pair. However, with "anonymous" / unassigned streams there is no convenient place to track the available ids. Proceed with an ida in the host bridge for now, but consider moving this tracking to be an ide_stream_ids_ida per device. Co-developed-by: Alexey Kardashevskiy <aik@amd.com> Signed-off-by: Alexey Kardashevskiy <aik@amd.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Link: https://patch.msgid.link/20251113021446.436830-6-dan.j.williams@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2025-11-14PCI/IDE: Add Address Association Register setup for downstream MMIOXu Yilun
The address ranges for downstream Address Association Registers need to cover memory addresses for all functions (PFs/VFs/downstream devices) managed by a Device Security Manager (DSM). The proposed solution is get the memory (32-bit only) range and prefetchable-memory (64-bit capable) range from the immediate ancestor downstream port (either the direct-attach RP or deepest switch port when switch attached). Similar to RID association, address associations will be set by default if hardware sets 'Number of Address Association Register Blocks' in the 'Selective IDE Stream Capability Register' to a non-zero value. TSM drivers can opt-out of the settings by zero'ing out unwanted / unsupported address ranges. E.g. TDX Connect only supports prefetachable (64-bit capable) memory ranges for the Address Association setting. If the immediate downstream port provides both a memory range and prefetchable-memory range, but the IDE partner port only provides 1 Address Association Register block then the TSM driver can pick which range to associate, or let the PCI core prioritize memory. Note, the Address Association Register setup for upstream requests is still uncertain so is not included. Co-developed-by: Aneesh Kumar K.V <aneesh.kumar@kernel.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@kernel.org> Co-developed-by: Arto Merilainen <amerilainen@nvidia.com> Signed-off-by: Arto Merilainen <amerilainen@nvidia.com> Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com> Co-developed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Link: https://patch.msgid.link/20251114010227.567693-1-dan.j.williams@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2025-11-14sched_ext: Pass locked CPU parameter to scx_hardlockup() and add docsTejun Heo
With the buddy lockup detector, smp_processor_id() returns the detecting CPU, not the locked CPU, making scx_hardlockup()'s printouts confusing. Pass the locked CPU number from watchdog_hardlockup_check() as a parameter instead. Also add kerneldoc comments to handle_lockup(), scx_hardlockup(), and scx_rcu_cpu_stall() documenting their return value semantics. Suggested-by: Doug Anderson <dianders@chromium.org> Reviewed-by: Douglas Anderson <dianders@chromium.org> Acked-by: Andrea Righi <arighi@nvidia.com> Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-11-14PM: runtime: Wrapper macros for ACQUIRE()/ACQUIRE_ERR()Rafael J. Wysocki
Add wrapper macros for ACQUIRE()/ACQUIRE_ERR() and runtime PM usage counter guards introduced recently: pm_runtime_active_try, pm_runtime_active_auto_try, pm_runtime_active_try_enabled, and pm_runtime_active_auto_try_enabled. The new macros should be more straightforward to use. For example, they can be used for rewriting a piece of code like below: ACQUIRE(pm_runtime_active_try, pm)(dev); if ((ret = ACQUIRE_ERR(pm_runtime_active_try, &pm))) return ret; in the following way: PM_RUNTIME_ACQUIRE(dev, pm); if ((ret = PM_RUNTIME_ACQUIRE_ERR(&pm))) return ret; If the original code does not care about the specific error code returned when attepmting to resume the device: ACQUIRE(pm_runtime_active_try, pm)(dev); if (ACQUIRE_ERR(pm_runtime_active_try, &pm)) return -ENXIO; it may be changed like this: PM_RUNTIME_ACQUIRE(dev, pm); if (PM_RUNTIME_ACQUIRE_ERR(&pm)) return -ENXIO; Link: https://lore.kernel.org/linux-pm/5068916.31r3eYUQgx@rafael.j.wysocki/ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Dhruva Gole <d-gole@ti.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Link: https://patch.msgid.link/3400866.aeNJFYEL58@rafael.j.wysocki
2025-11-14time: Fix a few typos in time[r] related code commentsJianyun Gao
Signed-off-by: Jianyun Gao <jianyungao89@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://patch.msgid.link/20250927093411.1509275-1-jianyungao89@gmail.com
2025-11-14PCI: Convert BAR sizes bitmasks to u64Ilpo Järvinen
PCIe r7.0, sec 7.8.6, defines resizable BAR sizes beyond the currently supported maximum of 128TB, which will require more than u32 to store the entire bitmask. Convert Resizable BAR related functions to use u64 bitmask for BAR sizes to make the typing more future-proof. The support for the larger BAR sizes themselves is not added at this point. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patch.msgid.link/20251113180053.27944-12-ilpo.jarvinen@linux.intel.com
2025-11-14PCI: Add pci_rebar_get_max_size()Ilpo Järvinen
Add pci_rebar_get_max_size() to allow simplifying code that wants to know the maximum possible size for a Resizable BAR. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patch.msgid.link/20251113180053.27944-9-ilpo.jarvinen@linux.intel.com
2025-11-14PCI: Add pci_rebar_size_supported() helperIlpo Järvinen
Many callers of pci_rebar_get_possible_sizes() are interested in finding out if a particular encoded BAR Size (PCIe r7.0, sec 7.8.6.3) is supported by the particular BAR. Add pci_rebar_size_supported() into PCI core to make it easy for the drivers to determine if the BAR size is supported or not. Use the new function in pci_resize_resource() and in pci_iov_vf_bar_set_size(). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patch.msgid.link/20251113180053.27944-6-ilpo.jarvinen@linux.intel.com
2025-11-14PCI: Move pci_rebar_size_to_bytes() and export itIlpo Järvinen
pci_rebar_size_to_bytes() is in drivers/pci/pci.h but would be useful for endpoint drivers as well. Move the function to rebar.c and export it. In addition, convert the literal to where the number comes from (PCI_REBAR_MIN_SIZE). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patch.msgid.link/20251113180053.27944-4-ilpo.jarvinen@linux.intel.com
2025-11-14PCI: Move pci_rebar_bytes_to_size() and clean it upIlpo Järvinen
Move pci_rebar_bytes_to_size() from include/linux/pci.h to rebar.c as it does not look very trivial and is not expected to be performance critical. Convert literals to use a newly added PCI_REBAR_MIN_SIZE define. Also add kernel doc for the function as the function is exported. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Michael J. Ruhl <mjruhl@habana.ai> Link: https://patch.msgid.link/20251113180053.27944-3-ilpo.jarvinen@linux.intel.com
2025-11-14PCI: Fix restoring BARs on BAR resize rollback pathIlpo Järvinen
BAR resize operation is implemented in the pci_resize_resource() and pbus_reassign_bridge_resources() functions. pci_resize_resource() can be called either from __resource_resize_store() from sysfs or directly by the driver for the Endpoint Device. The pci_resize_resource() requires that caller has released the device resources that share the bridge window with the BAR to be resized as otherwise the bridge window is pinned in place and cannot be changed. pbus_reassign_bridge_resources() rolls back resources if the resize operation fails, but rollback is performed only for the bridge windows. Because releasing the device resources are done by the caller of the BAR resize interface, these functions performing the BAR resize do not have access to the device resources as they were before the resize. pbus_reassign_bridge_resources() could try __pci_bridge_assign_resources() after rolling back the bridge windows as they were, however, it will not guarantee the resource are assigned due to differences in how FW and the kernel assign the resources (alignment of the start address and tail). To perform rollback robustly, the BAR resize interface has to be altered to also release the device resources that share the bridge window with the BAR to be resized. Also, remove restoring from the entries failed list as saved list should now contain both the bridge windows and device resources so the extra restore is duplicated work. Some drivers (currently only amdgpu) want to prevent releasing some resources. Add exclude_bars param to pci_resize_resource() and make amdgpu pass its register BAR (BAR 2 or 5), which should never be released during resize operation. Normally 64-bit prefetchable resources do not share a bridge window with the 32-bit only register BAR, but there are various fallbacks in the resource assignment logic which may make the resources share the bridge window in rare cases. This change (together with the driver side changes) is to counter the resource releases that had to be done to prevent resource tree corruption in the ("PCI: Release assigned resource before restoring them") change. As such, it likely restores functionality in cases where device resources were released to avoid resource tree conflicts which appeared to be "working" when such conflicts were not correctly detected by the kernel. Reported-by: Simon Richter <Simon.Richter@hogyros.de> Link: https://lore.kernel.org/linux-pci/f9a8c975-f5d3-4dd2-988e-4371a1433a60@hogyros.de/ Reported-by: Alex Bennée <alex.bennee@linaro.org> Link: https://lore.kernel.org/linux-pci/874irqop6b.fsf@draig.linaro.org/ Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> [bhelgaas: squash amdgpu BAR selection from https://lore.kernel.org/r/20251114103053.13778-1-ilpo.jarvinen@linux.intel.com] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Alex Bennée <alex.bennee@linaro.org> # AVA, AMD GPU Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patch.msgid.link/20251113162628.5946-7-ilpo.jarvinen@linux.intel.com
2025-11-14crypto: ccp - Add an API to return the supported SEV-SNP policy bitsTom Lendacky
Supported policy bits are dependent on the level of SEV firmware that is currently running. Create an API to return the supported policy bits for the current level of firmware. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Link: https://patch.msgid.link/e3f711366ddc22e3dd215c987fd2e28dc1c07f54.1761593632.git.thomas.lendacky@amd.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-14KVM: SEV: Consolidate the SEV policy bits in a single header fileTom Lendacky
Consolidate SEV policy bit definitions into a single file. Use include/linux/psp-sev.h to hold the definitions and remove the current definitions from the arch/x86/kvm/svm/sev.c and arch/x86/include/svm.h files. No functional change intended. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://patch.msgid.link/d9639f88a0b521a1a67aeac77cc609fdea1f90bd.1761593632.git.thomas.lendacky@amd.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-11-14bpf: Add bpf_prog_run_data_pointers()Eric Dumazet
syzbot found that cls_bpf_classify() is able to change tc_skb_cb(skb)->drop_reason triggering a warning in sk_skb_reason_drop(). WARNING: CPU: 0 PID: 5965 at net/core/skbuff.c:1192 __sk_skb_reason_drop net/core/skbuff.c:1189 [inline] WARNING: CPU: 0 PID: 5965 at net/core/skbuff.c:1192 sk_skb_reason_drop+0x76/0x170 net/core/skbuff.c:1214 struct tc_skb_cb has been added in commit ec624fe740b4 ("net/sched: Extend qdisc control block with tc control block"), which added a wrong interaction with db58ba459202 ("bpf: wire in data and data_end for cls_act_bpf"). drop_reason was added later. Add bpf_prog_run_data_pointers() helper to save/restore the net_sched storage colliding with BPF data_meta/data_end. Fixes: ec624fe740b4 ("net/sched: Extend qdisc control block with tc control block") Reported-by: syzbot <syzkaller@googlegroups.com> Closes: https://lore.kernel.org/netdev/6913437c.a70a0220.22f260.013b.GAE@google.com/ Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Reviewed-by: Victor Nogueira <victor@mojatatu.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://patch.msgid.link/20251112125516.1563021-1-edumazet@google.com
2025-11-14ASoC: codecs: lpass-macro: complete sm6115 supportMark Brown
Merge series from Srinivas Kandagatla <srinivas.kandagatla@oss.qualcomm.com>: This patch series fixes SM6115 lpass codec macro support and adding missing dt-bindings to complete support for SM6115. SM6115 lpass codec macro support is added partially and broken to some extent, Fix this broken support and add complete lpass macro support for this SoC.
2025-11-14PM: Introduce new PMSG_POWEROFF eventMario Limonciello (AMD)
PMSG_POWEROFF will be used for the PM core to allow differentiating between a hibernation or shutdown sequence when re-using callbacks for common code. Hibernation is started by writing a hibernation method (such as 'platform' 'shutdown', or 'reboot') to use into /sys/power/disk and writing 'disk' to /sys/power/state. Shutdown is initiated with the reboot() syscall with arguments on whether to halt the system or power it off. Tested-by: Eric Naim <dnaim@cachyos.org> Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org> Link: https://patch.msgid.link/20251112224025.2051702-2-superm1@kernel.org Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-11-14hrtimer: Store time as ktime_t in restart blockThomas Weißschuh
The hrtimer core uses ktime_t to represent times, use that also for the restart block. CPU timers internally use nanoseconds instead of ktime_t but use the same restart block, so use the correct accessors for those. Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://patch.msgid.link/20251110-restart-block-expiration-v1-3-5d39cc93df4f@linutronix.de
2025-11-14futex: Store time as ktime_t in restart blockThomas Weißschuh
The futex core uses ktime_t to represent times, use that also for the restart block. This allows the simplification of the accessors. Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20251110-restart-block-expiration-v1-2-5d39cc93df4f@linutronix.de
2025-11-14spi-cadence: support transmission withMark Brown
Merge series from Jun Guo <jun.guo@cixtech.com>: The Cadence SPI IP supports configurable FIFO data widths during integration. On some SoCs, the FIFO data width is designed to be 16 or 32 bits at the chip design stage. However, the current driver only supports communication with an 8-bit FIFO data width. Therefore, these patches are added to enable the driver to support communication with 16-bit and 32-bit FIFO data widths.
2025-11-14VFS: introduce end_creating_keep()NeilBrown
Occasionally the caller of end_creating() wants to keep using the dentry. Rather then requiring them to dget() the dentry (when not an error) before calling end_creating(), provide end_creating_keep() which does this. cachefiles and overlayfs make use of this. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: NeilBrown <neil@brown.name> Link: https://patch.msgid.link/20251113002050.676694-16-neilb@ownmail.net Tested-by: syzbot@syzkaller.appspotmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-14VFS: change vfs_mkdir() to unlock on failure.NeilBrown
vfs_mkdir() already drops the reference to the dentry on failure but it leaves the parent locked. This complicates end_creating() which needs to unlock the parent even though the dentry is no longer available. If we change vfs_mkdir() to unlock on failure as well as releasing the dentry, we can remove the "parent" arg from end_creating() and simplify the rules for calling it. Note that cachefiles_get_directory() can choose to substitute an error instead of actually calling vfs_mkdir(), for fault injection. In that case it needs to call end_creating(), just as vfs_mkdir() now does on error. ovl_create_real() will now unlock on error. So the conditional end_creating() after the call is removed, and end_creating() is called internally on error. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Tested-by: syzbot@syzkaller.appspotmail.com Signed-off-by: NeilBrown <neil@brown.name> Link: https://patch.msgid.link/20251113002050.676694-15-neilb@ownmail.net Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-14ecryptfs: use new start_creating/start_removing APIsNeilBrown
This requires the addition of start_creating_dentry() which is given the dentry which has already been found, and asks for it to be locked and its parent validated. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: NeilBrown <neil@brown.name> Link: https://patch.msgid.link/20251113002050.676694-14-neilb@ownmail.net Tested-by: syzbot@syzkaller.appspotmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-14Add start_renaming_two_dentries()NeilBrown
A few callers want to lock for a rename and already have both dentries. Also debugfs does want to perform a lookup but doesn't want permission checking, so start_renaming_dentry() cannot be used. This patch introduces start_renaming_two_dentries() which is given both dentries. debugfs performs one lookup itself. As it will only continue with a negative dentry and as those cannot be renamed or unlinked, it is safe to do the lookup before getting the rename locks. overlayfs uses start_renaming_two_dentries() in three places and selinux uses it twice in sel_make_policy_nodes(). In sel_make_policy_nodes() we now lock for rename twice instead of just once so the combined operation is no longer atomic w.r.t the parent directory locks. As selinux_state.policy_mutex is held across the whole operation this does not open up any interesting races. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: NeilBrown <neil@brown.name> Link: https://patch.msgid.link/20251113002050.676694-13-neilb@ownmail.net Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-14VFS/ovl/smb: introduce start_renaming_dentry()NeilBrown
Several callers perform a rename on a dentry they already have, and only require lookup for the target name. This includes smb/server and a few different places in overlayfs. start_renaming_dentry() performs the required lookup and takes the required lock using lock_rename_child() It is used in three places in overlayfs and in ksmbd_vfs_rename(). In the ksmbd case, the parent of the source is not important - the source must be renamed from wherever it is. So start_renaming_dentry() allows rd->old_parent to be NULL and only checks it if it is non-NULL. On success rd->old_parent will be the parent of old_dentry with an extra reference taken. Other start_renaming function also now take the extra reference and end_renaming() now drops this reference as well. ovl_lookup_temp(), ovl_parent_lock(), and ovl_parent_unlock() are all removed as they are no longer needed. OVL_TEMPNAME_SIZE and ovl_tempname() are now declared in overlayfs.h so that ovl_check_rename_whiteout() can access them. ovl_copy_up_workdir() now always cleans up on error. Reviewed-by: Namjae Jeon <linkinjeon@kernel.org> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: NeilBrown <neil@brown.name> Link: https://patch.msgid.link/20251113002050.676694-12-neilb@ownmail.net Tested-by: syzbot@syzkaller.appspotmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-14VFS/nfsd/ovl: introduce start_renaming() and end_renaming()NeilBrown
start_renaming() combines name lookup and locking to prepare for rename. It is used when two names need to be looked up as in nfsd and overlayfs - cases where one or both dentries are already available will be handled separately. __start_renaming() avoids the inode_permission check and hash calculation and is suitable after filename_parentat() in do_renameat2(). It subsumes quite a bit of code from that function. start_renaming() does calculate the hash and check X permission and is suitable elsewhere: - nfsd_rename() - ovl_rename() In ovl, ovl_do_rename_rd() is factored out of ovl_do_rename(), which itself will be gone by the end of the series. Acked-by: Chuck Lever <chuck.lever@oracle.com> (for nfsd parts) Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: NeilBrown <neil@brown.name> -- Changes since v3: - added missig dput() in ovl_rename when "whiteout" is not-NULL. Changes since v2: - in __start_renaming() some label have been renamed, and err is always set before a "goto out_foo" rather than passing the error in a dentry*. - ovl_do_rename() changed to call the new ovl_do_rename_rd() rather than keeping duplicate code - code around ovl_cleanup() call in ovl_rename() restructured. Link: https://patch.msgid.link/20251113002050.676694-11-neilb@ownmail.net Tested-by: syzbot@syzkaller.appspotmail.com Acked-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-14VFS: add start_creating_killable() and start_removing_killable()NeilBrown
These are similar to start_creating() and start_removing(), but allow a fatal signal to abort waiting for the lock. They are used in btrfs for subvol creation and removal. btrfs_may_create() no longer needs IS_DEADDIR() and start_creating_killable() includes that check. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: NeilBrown <neil@brown.name> Link: https://patch.msgid.link/20251113002050.676694-10-neilb@ownmail.net Tested-by: syzbot@syzkaller.appspotmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-14VFS: introduce start_removing_dentry()NeilBrown
start_removing_dentry() is similar to start_removing() but instead of providing a name for lookup, the target dentry is given. start_removing_dentry() checks that the dentry is still hashed and in the parent, and if so it locks and increases the refcount so that end_removing() can be used to finish the operation. This is used in cachefiles, overlayfs, smb/server, and apparmor. There will be other users including ecryptfs. As start_removing_dentry() takes an extra reference to the dentry (to be put by end_removing()), there is no need to explicitly take an extra reference to stop d_delete() from using dentry_unlink_inode() to negate the dentry - as in cachefiles_delete_object(), and ksmbd_vfs_unlink(). cachefiles_bury_object() now gets an extra ref to the victim, which is drops. As it includes the needed end_removing() calls, the caller doesn't need them. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Namjae Jeon <linkinjeon@kernel.org> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: NeilBrown <neil@brown.name> Link: https://patch.msgid.link/20251113002050.676694-9-neilb@ownmail.net Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-14VFS: introduce start_creating_noperm() and start_removing_noperm()NeilBrown
xfs, fuse, ipc/mqueue need variants of start_creating or start_removing which do not check permissions. This patch adds _noperm versions of these functions. Note that do_mq_open() was only calling mntget() so it could call path_put() - it didn't really need an extra reference on the mnt. Now it doesn't call mntget() and uses end_creating() which does the dput() half of path_put(). Also mq_unlink() previously passed d_inode(dentry->d_parent) as the dir inode to vfs_unlink(). This is after locking d_inode(mnt->mnt_root) These two inodes are the same, but normally calls use the textual parent. So I've changes the vfs_unlink() call to be given d_inode(mnt->mnt_root). Reviewed-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: NeilBrown <neil@brown.name> -- changes since v2: - dir arg passed to vfs_unlink() in mq_unlink() changed to match the dir passed to lookup_noperm() - restore assignment to path->mnt even though the mntget() is removed. Link: https://patch.msgid.link/20251113002050.676694-7-neilb@ownmail.net Tested-by: syzbot@syzkaller.appspotmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-14VFS/nfsd/cachefiles/ovl: introduce start_removing() and end_removing()NeilBrown
start_removing() is similar to start_creating() but will only return a positive dentry with the expectation that it will be removed. This is used by nfsd, cachefiles, and overlayfs. They are changed to also use end_removing() to terminate the action begun by start_removing(). This is a simple alias for end_dirop(). Apart from changes to the error paths, as we no longer need to unlock on a lookup error, an effect on callers is that they don't need to test if the found dentry is positive or negative - they can be sure it is positive. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: NeilBrown <neil@brown.name> Link: https://patch.msgid.link/20251113002050.676694-6-neilb@ownmail.net Tested-by: syzbot@syzkaller.appspotmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-14VFS/nfsd/cachefiles/ovl: add start_creating() and end_creating()NeilBrown
start_creating() is similar to simple_start_creating() but is not so simple. It takes a qstr for the name, includes permission checking, and does NOT report an error if the name already exists, returning a positive dentry instead. This is currently used by nfsd, cachefiles, and overlayfs. end_creating() is called after the dentry has been used. end_creating() drops the reference to the dentry as it is generally no longer needed. This is exactly the first section of end_creating_path() so that function is changed to call the new end_creating() These calls help encapsulate locking rules so that directory locking can be changed. Occasionally this change means that the parent lock is held for a shorter period of time, for example in cachefiles_commit_tmpfile(). As this function now unlocks after an unlink and before the following lookup, it is possible that the lookup could again find a positive dentry, so a while loop is introduced there. In overlayfs the ovl_lookup_temp() function has ovl_tempname() split out to be used in ovl_start_creating_temp(). The other use of ovl_lookup_temp() is preparing for a rename. When rename handling is updated, ovl_lookup_temp() will be removed. Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: NeilBrown <neil@brown.name> Link: https://patch.msgid.link/20251113002050.676694-5-neilb@ownmail.net Tested-by: syzbot@syzkaller.appspotmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-14VFS: introduce start_dirop() and end_dirop()NeilBrown
The fact that directory operations (create,remove,rename) are protected by a lock on the parent is known widely throughout the kernel. In order to change this - to instead lock the target dentry - it is best to centralise this knowledge so it can be changed in one place. This patch introduces start_dirop() which is local to VFS code. It performs the required locking for create and remove. Rename will be handled separately. Various functions with names like start_creating() or start_removing_path(), some of which already exist, will export this functionality beyond the VFS. end_dirop() is the partner of start_dirop(). It drops the lock and releases the reference on the dentry. It *is* exported so that various end_creating etc functions can be inline. As vfs_mkdir() drops the dentry on error we cannot use end_dirop() as that won't unlock when the dentry IS_ERR(). For now we need an explicit unlock when dentry IS_ERR(). I hope to change vfs_mkdir() to unlock when it drops a dentry so that explicit unlock can go away. end_dirop() can always be called on the result of start_dirop(), but not after vfs_mkdir(). After a vfs_mkdir() we still may need the explicit unlock as seen in end_creating_path(). As well as adding start_dirop() and end_dirop() this patch uses them in: - simple_start_creating (which requires sharing lookup_noperm_common() with libfs.c) - start_removing_path / start_removing_user_path_at - filename_create / end_creating_path() - do_rmdir(), do_unlinkat() Reviewed-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: NeilBrown <neil@brown.name> Link: https://patch.msgid.link/20251113002050.676694-3-neilb@ownmail.net Tested-by: syzbot@syzkaller.appspotmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-14nsproxy: fix free_nsproxy() and simplify create_new_namespaces()Christian Brauner
Make it possible to handle NULL being passed to the reference count helpers instead of forcing the caller to handle this. Afterwards we can nicely allow a cleanup guard to handle nsproxy freeing. Active reference count handling is not done in nsproxy_free() but rather in free_nsproxy() as nsproxy_free() is also called from setns() failure paths where a new nsproxy has been prepared but has not been marked as active via switch_task_namespaces(). Link: https://lore.kernel.org/690bfb9e.050a0220.2e3c35.0013.GAE@google.com Link: https://patch.msgid.link/20251111-sakralbau-guthaben-7dcc277d337f@brauner Fixes: 3c9820d5c64a ("ns: add active reference count") Reported-by: syzbot+0b2e79f91ff6579bfa5b@syzkaller.appspotmail.com Reported-by: syzbot+0a8655a80e189278487e@syzkaller.appspotmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-14block-dma: properly take MMIO pathLeon Romanovsky
In commit eadaa8b255f3 ("dma-mapping: introduce new DMA attribute to indicate MMIO memory"), DMA_ATTR_MMIO attribute was added to describe MMIO addresses, which require to avoid any memory cache flushing, as an outcome of the discussion pointed in Link tag below. In case of PCI_P2PDMA_MAP_THRU_HOST_BRIDGE transfer, blk-mq-dm logic treated this as regular page and relied on "struct page" DMA flow. That flow performs CPU cache flushing, which shouldn't be done here, and doesn't set IOMMU_MMIO flag in DMA-IOMMU case. As a solution, let's encode peer-to-peer transaction type in NVMe IOD flags variable and provide it to blk-mq-dma API. Link: https://lore.kernel.org/all/f912c446-1ae9-4390-9c11-00dce7bf0fd3@arm.com/ Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-11-14Merge branch 'pwm/th1520' into pwm/for-nextUwe Kleine-König