summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2025-11-19net/mlx5: Move SF dev table notifier registration outside the PF devlink lockCosmin Ratiu
This completes the previous patches by moving notifier registration for SF dev tables outside the devlink locked critical section in mlx5_init_one() / mlx5_uninit_one() and into the mlx5_mdev_init() / mlx5_mdev_uninit() functions. This is only done for non-SFs, since SFs do not have a SF HW table themselves. After this patch, notifiers can grab the PF devlink lock (soon to be necessary) without creating a locking cycle. Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1763325940-1231508-7-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-19net/mlx5: Move the SF table notifiers outside the devlink lockCosmin Ratiu
Move the SF table notifiers registration/unregistration outside of mlx5_init_one() / mlx5_uninit_one() and into the mlx5_mdev_init() / mlx5_mdev_uninit() functions. This is only done for non-SFs, since SFs do not have a SF table themselves and thus don't need notifiers. Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1763325940-1231508-6-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-19net/mlx5: Move the SF HW table notifier outside the devlink lockCosmin Ratiu
Move the SF HW table notifier registration/unregistration outside of mlx5_init_one() / mlx5_uninit_one() and into the mlx5_mdev_init() / mlx5_mdev_uninit() functions. This is only done for non-SFs, since SFs do not have a SF HW table themselves. Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1763325940-1231508-5-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-19net/mlx5: Move the vhca event notifier outside of the devlink lockCosmin Ratiu
The vhca event notifier consists of an atomic notifier for vhca state changes (used for SF events), multiple workqueues and a blocking notifier chain for delivering the vhca state change events for further processing. This patch moves the vhca notifier head outside of mlx5_init_one() / mlx5_uninit_one() and into the mlx5_mdev_init() / mlx5_mdev_uninit() functions. This allows called notifiers to grab the PF devlink lock which was previously impossible because it would create a circular lock dependency. mlx5_vhca_event_stop() is now called earlier in the cleanup phase and flushes the workqueues to ensure that after the call, there are no pending events. This simplifies the cleanup flow for vhca event consumers. Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1763325940-1231508-4-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-19net/mlx5: Move the esw mode notifier chain outside the devlink lockCosmin Ratiu
The esw mode change notifier chain is initialized/cleaned up in mlx5_init_one() / mlx5_uninit_one() with the devlink lock held. Move the notifier head from the eswitch struct into mlx5_priv directly, and initialize it outside the critical section. This will allow notifier registration to happen earlier in the init procedure in subsequent patches. Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1763325940-1231508-3-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-19arm_mpam: Probe hardware to find the supported partid/pmg valuesJames Morse
CPUs can generate traffic with a range of PARTID and PMG values, but each MSC may also have its own maximum size for these fields. Before MPAM can be used, the driver needs to probe each RIS on each MSC, to find the system-wide smallest value that can be used. The limits from requestors (e.g. CPUs) also need taking into account. While doing this, RIS entries that firmware didn't describe are created under MPAM_CLASS_UNKNOWN. This adds the low level MSC write accessors. While we're here, implement the mpam_register_requestor() call for the arch code to register the CPU limits. Future callers of this will tell us about the SMMU and ITS. Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Ben Horgan <ben.horgan@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-19arm_mpam: Add the class and component structures for firmware described risJames Morse
An MSC is a container of resources, each identified by their RIS index. Some RIS are described by firmware to provide their position in the system. Others are discovered when the driver probes the hardware. To configure a resource it needs to be found by its class, e.g. 'L2'. There are two kinds of grouping, a class is a set of components, which are visible to user-space as there are likely to be multiple instances of the L2 cache. (e.g. one per cluster or package) Add support for creating and destroying structures to allow a hierarchy of resources to be created. Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-19ACPI / MPAM: Parse the MPAM tableJames Morse
Add code to parse the arm64 specific MPAM table, looking up the cache level from the PPTT and feeding the end result into the MPAM driver. This happens in two stages. Platform devices are created first for the MSC devices. Once the driver probes it calls acpi_mpam_parse_resources() to discover the RIS entries the MSC contains. For now the MPAM hook mpam_ris_create() is stubbed out, but will update the MPAM driver with optional discovered data about the RIS entries. CC: Carl Worth <carl@os.amperecomputing.com> Link: https://developer.arm.com/documentation/den0065/3-0bet/?lang=en Reviewed-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-19ACPI: Define acpi_put_table cleanup handler and acpi_get_table_pointer() helperBen Horgan
Define a cleanup helper for use with __free to release the acpi table when the pointer goes out of scope. Also, introduce the helper acpi_get_table_pointer() to simplify a commonly used pattern involving acpi_get_table(). These are first used in a subsequent commit. Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-19platform: Define platform_device_put cleanup handlerBen Horgan
Define a cleanup helper for use with __free to destroy platform devices automatically when the pointer goes out of scope. This is only intended to be used in error cases and so should be used with return_ptr() or no_free_ptr() directly to avoid the automatic destruction on success. A first use of this is introduced in a subsequent commit. Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-19ACPI / PPTT: Add a helper to fill a cpumask from a cache_idJames Morse
MPAM identifies CPUs by the cache_id in the PPTT cache structure. The driver needs to know which CPUs are associated with the cache. The CPUs may not all be online, so cacheinfo does not have the information. Add a helper to pull this information out of the PPTT. CC: Rohit Mathew <Rohit.Mathew@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Jeremy Linton <jeremy.linton@arm.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-19ACPI / PPTT: Find cache level by cache-idJames Morse
The MPAM table identifies caches by id. The MPAM driver also wants to know the cache level to determine if the platform is of the shape that can be managed via resctrl. Cacheinfo has this information, but only for CPUs that are online. Waiting for all CPUs to come online is a problem for platforms where CPUs are brought online late by user-space. Add a helper that walks every possible cache, until it finds the one identified by cache-id, then return the level. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Jeremy Linton <jeremy.linton@arm.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-19ACPI / PPTT: Add a helper to fill a cpumask from a processor containerJames Morse
The ACPI MPAM table uses the UID of a processor container specified in the PPTT to indicate the subset of CPUs and cache topology that can access each MPAM System Component (MSC). This information is not directly useful to the kernel. The equivalent cpumask is needed instead. Add a helper to find the processor container by its id, then walk the possible CPUs to fill a cpumask with the CPUs that have this processor container as a parent. CC: Dave Martin <dave.martin@arm.com> Reviewed-by: Sudeep Holla <sudeep.holla@arm.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Reviewed-by: Hanjun Guo <guohanjun@huawei.com> Reviewed-by: Jeremy Linton <jeremy.linton@arm.com> Tested-by: Fenghua Yu <fenghuay@nvidia.com> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> Tested-by: Gavin Shan <gshan@redhat.com> Tested-by: Zeng Heng <zengheng4@huawei.com> Tested-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-19smp: Introduce a helper function to check for pending IPIsUlf Hansson
When governors used during cpuidle try to find the most optimal idle state for a CPU or a group of CPUs, they are known to quite often fail. One reason for this is, that they are not taking into account whether there has been an IPI scheduled for any of the CPUs that are affected by the selected idle state. To enable pending IPIs to be taken into account for cpuidle decisions, introduce a new helper function, cpus_peek_for_pending_ipi(). Suggested-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2025-11-19usb: chipidea: ci_hdrc_imx: Set out of band wakeup for i.MX95Peng Fan
i.MX95 USB2 inside HSIOMIX could still wakeup Linux, even if HSIOMIX power domain(Digital logic) is off. There is still always on logic have the wakeup capability which is out band wakeup capbility. So use device_set_out_band_wakeup for i.MX95 to make sure usb2 could wakeup system even if HSIOMIX power domain is in off state. Tested-by: Xu Yang <xu.yang_2@nxp.com> Reviewed-by: Xu Yang <xu.yang_2@nxp.com> Signed-off-by: Peng Fan <peng.fan@nxp.com> Acked-by: Peter Chen <peter.chen@kernel.org> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2025-11-19PM: wakeup: Add out-of-band system wakeup support for devicesPeng Fan
Some devices can wake up the system from suspend even when their power domains are turned off. This is possible because their system-wakeup logic resides in an always-on power domain - indicating that they support out-of-band system wakeup. Currently, PM domain core doesn't power off such devices if they are marked as system wakeup sources. To better represent devices with out-of-band wakeup capability, this patch introduces a new flag out_band_wakeup in 'struct dev_pm_info'. Two helper APIs are added: - device_set_out_band_wakeup() - to mark a device as having out-of-band wakeup capability. - device_out_band_wakeup() - to query the flag. Allow the PM core and drivers to distinguish between regular and out-of-band wakeup sources, enable more accurate power management decision. Signed-off-by: Peng Fan <peng.fan@nxp.com> Reviewed-by: Dhruva Gole <d-gole@ti.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2025-11-19mm: add spurious fault fixing support for huge pmdHuang Ying
The page faults may be spurious because of the racy access to the page table. For example, a non-populated virtual page is accessed on 2 CPUs simultaneously, thus the page faults are triggered on both CPUs. However, it's possible that one CPU (say CPU A) cannot find the reason for the page fault if the other CPU (say CPU B) has changed the page table before the PTE is checked on CPU A. Most of the time, the spurious page faults can be ignored safely. However, if the page fault is for the write access, it's possible that a stale read-only TLB entry exists in the local CPU and needs to be flushed on some architectures. This is called the spurious page fault fixing. In the current kernel, there is spurious fault fixing support for pte, but not for huge pmd because no architectures need it. But in the next patch in the series, we will change the write protection fault handling logic on arm64, so that some stale huge pmd entries may remain in the TLB. These entries need to be flushed via the huge pmd spurious fault fixing mechanism. Signed-off-by: Huang Ying <ying.huang@linux.alibaba.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Zi Yan <ziy@nvidia.com> Cc: Will Deacon <will@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Yang Shi <yang@os.amperecomputing.com> Cc: Christoph Lameter (Ampere) <cl@gentwo.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Barry Song <baohua@kernel.org> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Yin Fengwei <fengwei_yin@linux.alibaba.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-19ima: Access decompressed kernel module to verify appended signatureCoiby Xu
Currently, when in-kernel module decompression (CONFIG_MODULE_DECOMPRESS) is enabled, IMA has no way to verify the appended module signature as it can't decompress the module. Define a new kernel_read_file_id enumerate READING_MODULE_COMPRESSED so IMA can calculate the compressed kernel module data hash on READING_MODULE_COMPRESSED and defer appraising/measuring it until on READING_MODULE when the module has been decompressed. Before enabling in-kernel module decompression, a kernel module in initramfs can still be loaded with ima_policy=secure_boot. So adjust the kernel module rule in secure_boot policy to allow either an IMA signature OR an appended signature i.e. to use "appraise func=MODULE_CHECK appraise_type=imasig|modsig". Reported-by: Karel Srot <ksrot@redhat.com> Suggested-by: Mimi Zohar <zohar@linux.ibm.com> Suggested-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Coiby Xu <coxu@redhat.com> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2025-11-19platform/x86/intel: Introduce Intel Elkhart Lake PSE I/ORaag Jadav
Intel Elkhart Lake Programmable Service Engine (PSE) includes two PCI devices that expose two different capabilities of GPIO and Timed I/O as a single PCI function through shared MMIO with below layout. GPIO: 0x0000 - 0x1000 TIO: 0x1000 - 0x2000 This driver enumerates the PCI parent device and creates auxiliary child devices for these capabilities. The actual functionalities are provided by their respective auxiliary drivers. Signed-off-by: Raag Jadav <raag.jadav@intel.com> Acked-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Link: https://lore.kernel.org/r/20251112034040.457801-2-raag.jadav@intel.com Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2025-11-20devres: Move devm_alloc_percpu() and related to devres.hAndy Shevchenko
Move devm_alloc_percpu() and related to devres.h where it belongs. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://patch.msgid.link/20251111145046.997309-3-andriy.shevchenko@linux.intel.com [ Fix minor typo in commit message. - Danilo ] Signed-off-by: Danilo Krummrich <dakr@kernel.org>
2025-11-19autofs: dont trigger mount if it cant succeedIan Kent
If a mount namespace contains autofs mounts, and they are propagation private, and there is no namespace specific automount daemon to handle possible automounting then attempted path resolution will loop until MAXSYMLINKS is reached before failing causing quite a bit of noise in the log. Add a check for this in autofs ->d_automount() so that the VFS can immediately return an error in this case. Since the mount is propagation private an EPERM return seems most appropriate. Suggested by: Christian Brauner <brauner@kernel.org> Signed-off-by: Ian Kent <raven@themaw.net> Link: https://patch.msgid.link/20251118024631.10854-2-raven@themaw.net Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-18net/mlx5: Abort new commands if all command slots are stalledSaeed Mahameed
In case of a FW issue, FW might be not responding to FW commands, causing kernel lockout for a long period of time, e.g. rtnl_lock held while ethtool is trying to collect stats waiting for FW to respond to multiple commands, when all of them will timeout. While there's no immediate indication of the FW lockout, we can safely assume that something is wrong when all command slots are busy and in a timeout state and no FW completion was received on any of them. In such case, start immediately failing new commands. Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1763415729-1238421-5-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-18bpf: Replace bpf memory allocator with kmalloc_nolock() in local storageAmery Hung
Replace bpf memory allocator with kmalloc_nolock() to reduce memory wastage due to preallocation. In bpf_selem_free(), an selem now needs to wait for a RCU grace period before being freed when reuse_now == true. Therefore, rcu_barrier() should be always be called in bpf_local_storage_map_free(). In bpf_local_storage_free(), since smap->storage_ma is no longer needed to return the memory, the function is now independent from smap. Remove the outdated comment in bpf_local_storage_alloc(). We already free selem after an RCU grace period in bpf_local_storage_update() when bpf_local_storage_alloc() failed the cmpxchg since commit c0d63f309186 ("bpf: Add bpf_selem_free()"). Signed-off-by: Amery Hung <ameryhung@gmail.com> Reviewed-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20251114201329.3275875-5-ameryhung@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-18bpf: Save memory alloction info in bpf_local_storageAmery Hung
Save the memory allocation method used for bpf_local_storage in the struct explicitly so that we don't need to go through the hassle to find out the info. When a later patch replaces BPF memory allocator with kmalloc_noloc(), bpf_local_storage_free() will no longer need smap->storage_ma to return the memory and completely remove the dependency on smap in bpf_local_storage_free(). Signed-off-by: Amery Hung <ameryhung@gmail.com> Reviewed-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20251114201329.3275875-4-ameryhung@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-18bpf: Remove smap argument from bpf_selem_free()Amery Hung
Since selem already saves a pointer to smap, use it instead of an additional argument in bpf_selem_free(). This requires moving the SDATA(selem)->smap assignment from bpf_selem_link_map() to bpf_selem_alloc() since bpf_selem_free() may be called without the selem being linked to smap in bpf_local_storage_update(). Signed-off-by: Amery Hung <ameryhung@gmail.com> Reviewed-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20251114201329.3275875-3-ameryhung@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-18bpf: Always charge/uncharge memory when allocating/unlinking storage elementsAmery Hung
Since commit a96a44aba556 ("bpf: bpf_sk_storage: Fix invalid wait context lockdep report"), {charge,uncharge}_mem are always true when allocating a bpf_local_storage_elem or unlinking a bpf_local_storage_elem from local storage, so drop these arguments. No functional change. Signed-off-by: Amery Hung <ameryhung@gmail.com> Reviewed-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20251114201329.3275875-2-ameryhung@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-18block: Remove queue freezing from several sysfs store callbacksBart Van Assche
Freezing the request queue from inside sysfs store callbacks may cause a deadlock in combination with the dm-multipath driver and the queue_if_no_path option. Additionally, freezing the request queue slows down system boot on systems where sysfs attributes are set synchronously. Fix this by removing the blk_mq_freeze_queue() / blk_mq_unfreeze_queue() calls from the store callbacks that do not strictly need these callbacks. Add the __data_racy annotation to request_queue.rq_timeout to suppress KCSAN data race reports about the rq_timeout reads. This patch may cause a small delay in applying the new settings. For all the attributes affected by this patch, I/O will complete correctly whether the old or the new value of the attribute is used. This patch affects the following sysfs attributes: * io_poll_delay * io_timeout * nomerges * read_ahead_kb * rq_affinity Here is an example of a deadlock triggered by running test srp/002 if this patch is not applied: task:multipathd Call Trace: <TASK> __schedule+0x8c1/0x1bf0 schedule+0xdd/0x270 schedule_preempt_disabled+0x1c/0x30 __mutex_lock+0xb89/0x1650 mutex_lock_nested+0x1f/0x30 dm_table_set_restrictions+0x823/0xdf0 __bind+0x166/0x590 dm_swap_table+0x2a7/0x490 do_resume+0x1b1/0x610 dev_suspend+0x55/0x1a0 ctl_ioctl+0x3a5/0x7e0 dm_ctl_ioctl+0x12/0x20 __x64_sys_ioctl+0x127/0x1a0 x64_sys_call+0xe2b/0x17d0 do_syscall_64+0x96/0x3a0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 </TASK> task:(udev-worker) Call Trace: <TASK> __schedule+0x8c1/0x1bf0 schedule+0xdd/0x270 blk_mq_freeze_queue_wait+0xf2/0x140 blk_mq_freeze_queue_nomemsave+0x23/0x30 queue_ra_store+0x14e/0x290 queue_attr_store+0x23e/0x2c0 sysfs_kf_write+0xde/0x140 kernfs_fop_write_iter+0x3b2/0x630 vfs_write+0x4fd/0x1390 ksys_write+0xfd/0x230 __x64_sys_write+0x76/0xc0 x64_sys_call+0x276/0x17d0 do_syscall_64+0x96/0x3a0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 </TASK> Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Cc: Nilay Shroff <nilay@linux.ibm.com> Cc: Martin Wilck <mwilck@suse.com> Cc: Benjamin Marzinski <bmarzins@redhat.com> Cc: stable@vger.kernel.org Fixes: af2814149883 ("block: freeze the queue in queue_attr_store") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Nilay Shroff <nilay@linux.ibm.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-11-18fs: Add the __data_racy annotation to backing_dev_info.ra_pagesBart Van Assche
Some but not all .ra_pages changes happen while block layer I/O is paused with blk_mq_freeze_queue(). Filesystems may read .ra_pages even while block layer I/O is paused, e.g. from inside their .fadvise callback. Annotating all .ra_pages reads with READ_ONCE() would be cumbersome. Hence, add the __data_racy annotatation to the .ra_pages member variable. Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Nilay Shroff <nilay@linux.ibm.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Nilay Shroff <nilay@linux.ibm.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-11-19devres: Remove unused devm_free_percpu()Andy Shevchenko
Remove unused devm_free_percpu(). By the way, it was never used in the drivers/ from day 1. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://patch.msgid.link/20251111145046.997309-2-andriy.shevchenko@linux.intel.com Signed-off-by: Danilo Krummrich <dakr@kernel.org>
2025-11-18efi/libstub: gop: Add support for reading EDIDThomas Zimmermann
Add support for EFI_EDID_DISCOVERED_PROTOCOL and EFI_EDID_ACTIVE_PROTOCOL as defined in UEFI 2.8, sec 12.9. Define GUIDs and data structures in the rsp header files. In the GOP setup function, read the EDID of the primary GOP device. First try EFI_EDID_ACTIVE_PROTOCOL, which supports user-specified EDID data. Or else try EFI_EDID_DISCOVERED_PROTOCOL, which returns the display device's native EDID. If no EDID could be retrieved, clear the storage. Rename efi_setup_gop() to efi_setup_graphics() to reflect the changes Let callers pass an optional instance of struct edid_data, if they are interested. While screen_info and edid_info come from the same device handle, they should be considered indendent data. The former refers to the graphics mode, the latter refers to the display device. GOP devices might not provide both. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2025-11-18efi: Fix trailing whitespace in header fileThomas Zimmermann
Resolve an issue with the coding style. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2025-11-18regulator: pca9450: Add support for setting debounce settingsMartijn de Gouw
Make the different debounce timers configurable from the devicetree. Depending on the board design, these have to be set different than the default register values. Signed-off-by: Martijn de Gouw <martijn.de.gouw@prodrive-technologies.com> Link: https://patch.msgid.link/20251117202215.1936139-2-martijn.de.gouw@prodrive-technologies.com Signed-off-by: Mark Brown <broonie@kernel.org>
2025-11-18PCI: Add .assert_perst() to control PCIe PERST#Krishna Chaitanya Chundru
Controller driver probes first, enables link training and scans the bus. When the PCI bridge is found, its child DT nodes will be scanned and pwrctrl devices will be created if needed. By the time pwrctrl driver probe gets called, link training is already enabled by controller driver. Certain devices like TC9563, which uses the PCI pwrctl framework, need to configure the device before the PCIe link is up. As the controller driver already enables link training as part of its probe, the moment device is powered on, controller and device participate in link training and link can come up immediately and may not have time to configure the device. So we need to stop the link training by using assert_perst() by asserting PERST# and de-asserting PERST# after device is configured. Signed-off-by: Krishna Chaitanya Chundru <krishna.chundru@oss.qualcomm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Manivannan Sadhasivam <mani@kernel.org> Link: https://patch.msgid.link/20251101-tc9563-v9-2-de3429f7787a@oss.qualcomm.com
2025-11-18phy: add new phy_notify_state() apiPeter Griffin
Add a new phy_notify_state() api that notifies and configures a phy for a given state transition. This is intended to be used by phy drivers which need to do some runtime configuration of parameters that can't be handled by phy_calibrate() or phy_power_{on|off}(). The first usage of this API is in the Samsung UFS phy that needs to issue some register writes when entering and exiting the hibernate link state. Signed-off-by: Peter Griffin <peter.griffin@linaro.org> Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org> Link: https://patch.msgid.link/20251112-phy-notify-pmstate-v5-1-39df622d8fcb@linaro.org Signed-off-by: Vinod Koul <vkoul@kernel.org>
2025-11-18reset: fix BIT macro referenceEncrow Thorne
RESET_CONTROL_FLAGS_BIT_* macros use BIT(), but reset.h does not include bits.h. This causes compilation errors when including reset.h standalone. Include bits.h to make reset.h self-contained. Suggested-by: Troy Mitchell <troy.mitchell@linux.dev> Reviewed-by: Troy Mitchell <troy.mitchell@linux.dev> Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de> Signed-off-by: Encrow Thorne <jyc0019@gmail.com> Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
2025-11-18reset: remove legacy reset lookup codeBartosz Golaszewski
There are no more users of this code. Let's remove the exported symbols and the implementation from reset core. Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de> Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> [p.zabel@pengutronix.de: folded in 8e6ec20e-8965-4b42-99fc-0462269ff2f1@paulmck-laptop] Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
2025-11-18mm/huge_memory: Fix initialization of huge zero folioLinus Torvalds
The recent fix to properly initialize the tags of the huge zero folio had an unfortunate not-so-subtle side effect: it caused the actual *contents* of the huge zero folio to not be initialized at all when the hardware didn't support the memory tagging. The reason was the unfortunate semantics of tag_clear_highpage(): on hardware that didn't do the tagging, it would silently just not do anything at all. And since this is done only on arm64 with MTE support, that basically meant most hardware. It wasn't necessarily immediately obvious since the huge zero page isn't necessarily very heavily used - or because it might already be zero because all-zeroes is the most common pattern. But it ends up causing random odd user space failures when you do hit it. The unfortunate semantics have been around for a while, but became a real bug only when we started actively using __GFP_ZEROTAGS in the generic get_huge_zero_folio() function - before that, it had only ever been used in code that checked that the hardware supported it. Fix this by simply changing the semantics of tag_clear_highpage() to return whether it actually successfully did something or not. While at it, also make it initialize multiple pages in one go, since that's actually what the only caller wants it to do and it simplifies the whole logic. Fixes: adfb6609c680 ("mm/huge_memory: initialise the tags of the huge zero folio") Link: https://lore.kernel.org/all/20251117082023.90176-1-00107082@163.com/ Reviewed-by: David Hildenbrand (Red Hat) <david@kernel.org> Reported-and-tested-by: David Wang <00107082@163.com> Reported-and-tested-by: Carlos Llamas <cmllamas@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-11-18rseq: Delete duplicate if statement in rseq_virt_userspace_exit()Dan Carpenter
This if statement is indented weirdly. It's a duplicate and doesn't affect runtime (beyond wasting a little time). Delete it. Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://patch.msgid.link/aRxP3YcwscrP1BU_@stanley.mountain
2025-11-18ASoC: Intel: avs: Allow for NHLT configurationMark Brown
Merge series from Cezary Rojewski <cezary.rojewski@intel.com> From AudioDSP perspective, only gateway-related modules e.g.: Copier: Small set of changes providing new feature which the driver is already utilizing on the market - for its Long-Term-Support (LTS) devices. The goal is to cover systems which shipped with invalid Non HDAudio Link Table (NHLT) - not just the descriptors (headers), but cases where the hardware configuration is invalid too. The table is part of the ACPI tree and forcing BIOS updates is not a feasible solution. With the override, the topology file can carry the hardware configuration instead. From AudioDSP perspective, only gateway-related modules e.g.: Copier care about the procedure. To ensure correct order of operations when initializing such modules, the overrides take precedence over what's currently there in the NHLT.
2025-11-18platform/x86: wmi: Remove extern keyword from prototypesArmin Wolf
The external function definitions do not need the "extern" keyword. Remove it to silence the associated checkpatch warnings. Signed-off-by: Armin Wolf <W_Armin@gmx.de> Link: https://patch.msgid.link/20251111131125.3379-4-W_Armin@gmx.de Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-11-18platform/x86: asus-armoury: add ppt_* and nv_* tuning knobsLuke D. Jones
Adds the ppt_* and nv_* tuning knobs that are available via WMI methods and adds proper min/max levels plus defaults. The min/max are defined by ASUS and typically gained by looking at what they allow in the ASUS Armoury Crate application - ASUS does not share the values outside of this. It could also be possible to gain the AMD values by use of ryzenadj and testing for the minimum stable value. The general rule of thumb for adding to the match table is that if the model range has a single CPU used throughout, then the DMI match can omit the last letter of the model number as this is the GPU model. If a min or max value is not provided it is assumed that the particular setting is not supported. for example ppt_pl2_sppt_min/max is not set. If a <ppt_setting>_def is not set then the default is assumed to be <ppt_setting>_max It is assumed that at least AC settings are available so that the firmware attributes will be created - if no DC table is available and power is on DC, then reading the attributes is -ENODEV. Co-developed-by: Denis Benato <denis.benato@linux.dev> Signed-off-by: Denis Benato <denis.benato@linux.dev> Signed-off-by: Luke D. Jones <luke@ljones.dev> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Tested-by: Mateusz Schyboll <dragonn@op.pl> Tested-by: Porfet Lillian <porfet828@gmail.com> Link: https://patch.msgid.link/20251102215319.3126879-10-denis.benato@linux.dev Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-11-18platform/x86: asus-wmi: rename ASUS_WMI_DEVID_PPT_FPPTDenis Benato
Maintain power-related WMI macros naming consistency: rename ASUS_WMI_DEVID_PPT_FPPT to ASUS_WMI_DEVID_PPT_PL3_FPPT. Link: https://lore.kernel.org/all/cad7b458-5a7a-4975-94a1-d0c74f6f3de5@oracle.com/ Suggested-by: ALOK TIWARI <alok.a.tiwari@oracle.com> Signed-off-by: Denis Benato <denis.benato@linux.dev> Link: https://.../ Link: https://patch.msgid.link/20251102215319.3126879-9-denis.benato@linux.dev Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-11-18platform/x86: asus-armoury: add screen auto-brightness toggleLuke D. Jones
Add screen_auto_brightness toggle supported on some laptops. Signed-off-by: Denis Benato <denis.benato@linux.dev> Signed-off-by: Luke D. Jones <luke@ljones.dev> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Link: https://patch.msgid.link/20251102215319.3126879-7-denis.benato@linux.dev Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-11-18platform/x86: asus-armoury: add apu-mem control supportLuke D. Jones
Implement the APU memory size control under the asus-armoury module using the fw_attributes class. This allows the APU allocated memory size to be adjusted depending on the users priority. A reboot is required after change. Co-developed-by: Denis Benato <denis.benato@linux.dev> Signed-off-by: Denis Benato <denis.benato@linux.dev> Signed-off-by: Luke D. Jones <luke@ljones.dev> Link: https://patch.msgid.link/20251102215319.3126879-5-denis.benato@linux.dev Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-11-17bpf: Fix invalid prog->stats access when update_effective_progs failsPu Lehui
Syzkaller triggers an invalid memory access issue following fault injection in update_effective_progs. The issue can be described as follows: __cgroup_bpf_detach update_effective_progs compute_effective_progs bpf_prog_array_alloc <-- fault inject purge_effective_progs /* change to dummy_bpf_prog */ array->items[index] = &dummy_bpf_prog.prog ---softirq start--- __do_softirq ... __cgroup_bpf_run_filter_skb __bpf_prog_run_save_cb bpf_prog_run stats = this_cpu_ptr(prog->stats) /* invalid memory access */ flags = u64_stats_update_begin_irqsave(&stats->syncp) ---softirq end--- static_branch_dec(&cgroup_bpf_enabled_key[atype]) The reason is that fault injection caused update_effective_progs to fail and then changed the original prog into dummy_bpf_prog.prog in purge_effective_progs. Then a softirq came, and accessing the members of dummy_bpf_prog.prog in the softirq triggers invalid mem access. To fix it, skip updating stats when stats is NULL. Fixes: 492ecee892c2 ("bpf: enable program stats") Signed-off-by: Pu Lehui <pulehui@huawei.com> Link: https://lore.kernel.org/r/20251115102343.2200727-1-pulehui@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-17d_make_discardable(): warn if given a non-persistent dentryAl Viro
At this point there are very few call chains that might lead to d_make_discardable() on a dentry that hadn't been made persistent: calls of simple_unlink() and simple_rmdir() in configfs and apparmorfs. Both filesystems do pin (part of) their contents in dcache, but they are currently playing very unusual games with that. Converting them to more usual patterns might be possible, but it's definitely going to be a long series of changes in both cases. For now the easiest solution is to have both stop using simple_unlink() and simple_rmdir() - that allows to make d_make_discardable() warn when given a non-persistent dentry. Rather than giving them full-blown private copies (with calls of d_make_discardable() replaced with dput()), let's pull the parts of simple_unlink() and simple_rmdir() that deal with timestamps and link counts into separate helpers (__simple_unlink() and __simple_rmdir() resp.) and have those used by configfs and apparmorfs. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-11-17kill securityfs_recursive_remove()Al Viro
it's an unused alias for securityfs_remove() Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-11-17get rid of kill_litter_super()Al Viro
Not used anymore. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-11-17net: phy: Add helper for fixing RGMII PHY mode based on internal mac delayInochi Amaoto
The "phy-mode" property of devicetree indicates whether the PCB has delay now, which means the mac needs to modify the PHY mode based on whether there is an internal delay in the mac. This modification is similar for many ethernet drivers. To simplify code, define the helper phy_fix_phy_mode_for_mac_delays(speed, mac_txid, mac_rxid) to fix PHY mode based on whether mac adds internal delay. Suggested-by: Russell King (Oracle) <linux@armlinux.org.uk> Signed-off-by: Inochi Amaoto <inochiama@gmail.com> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20251114003805.494387-3-inochiama@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-17memregion: Support fine grained invalidate by cpu_cache_invalidate_memregion()Yicong Yang
Extend cpu_cache_invalidate_memregion() to support invalidating a particular range of memory by introducing start and length parameters. Control of types of invalidation is left for when use cases turn up. For now everything is Clean and Invalidate. Where the range is unknown, use the provided cpu_cache_invalidate_all() helper to act as documentation of intent in a fashion that is clearer than passing (0, -1) to cpu_cache_invalidate_memregion(). Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Davidlohr Bueso <dave@stgolabs.net> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Conor Dooley <conor.dooley@microchip.com>