summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
9 dayscgroup/cpuset: Skip security check for hotplug induced v1 task migrationWaiman Long
When a CPU hot removal causes a v1 cpuset to lose all its CPUs, the cpuset hotplug handler will schedule a work function to migrate tasks in that cpuset with no CPU to its ancestor to enable those tasks to continue running. If a strict security policy is in place, however, the task migration may fail when security_task_setscheduler() call in cpuset_can_attach() returns a -EACCES error. That will mean that those tasks will have no CPU to run on. The system administrators will have to explicitly intervene to either add CPUs to that cpuset or move the tasks elsewhere if they are aware of it. This problem was found by a reported test failure in the LTP's cpuset_hotplug_test.sh. Fix this problem by treating this special case as an exception to skip the setsched security check in cpuset_can_attach() when a v1 cpuset with tasks have no CPU left. With that patch applied, the cpuset_hotplug_test.sh test can be run successfully without failure. Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
9 dayscgroup/cpuset: Simplify setsched decision check in task iteration loop of ↵Waiman Long
cpuset_can_attach() Centralize the check required to run security_task_setscheduler() in the task iteration loop of cpuset_can_attach() outside of the loop as it has no dependency on the characteristics of the tasks themselves. There is no functional change. Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
9 daysMerge tag 'fs_for_v7.0-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull udf fix from Jan Kara: "Fix for a race in UDF that can lead to memory corruption" * tag 'fs_for_v7.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: udf: Fix race between file type conversion and writeback mpage: Provide variant of mpage_writepages() with own optional folio handler
9 daysALSA: hda/realtek: add quirk for Acer Swift SFG14-73Zhang Heng
fix mute/micmute LEDs and headset microphone for Acer Swift SFG14-73. Link: https://bugzilla.kernel.org/show_bug.cgi?id=220279 Cc: stable@vger.kernel.org Signed-off-by: Zhang Heng <zhangheng@kylinos.cn> Link: https://patch.msgid.link/20260331094614.186063-1-zhangheng@kylinos.cn Signed-off-by: Takashi Iwai <tiwai@suse.de>
9 daysALSA: hda/realtek: Add quirk for Lenovo Yoga Pro 7 14IMH9Alexander Savenko
The Lenovo Yoga Pro 7 14IMH9 (DMI: 83E2) shares PCI SSID 17aa:3847 with the Legion 7 16ACHG6, but has a different codec subsystem ID (17aa:38cf). The existing SND_PCI_QUIRK for 17aa:3847 applies ALC287_FIXUP_LEGION_16ACHG6, which attempts to initialize an external I2C amplifier (CLSA0100) that is not present on the Yoga Pro 7 14IMH9. As a result, pin 0x17 (bass speakers) is connected to DAC 0x06 which has no volume control, making hardware volume adjustment completely non-functional. Audio is either silent or at maximum volume regardless of the slider position. Add a HDA_CODEC_QUIRK entry using the codec subsystem ID (17aa:38cf) to correctly identify the Yoga Pro 7 14IMH9 and apply ALC287_FIXUP_YOGA9_14IMH9_BASS_SPK_PIN, which redirects pin 0x17 to DAC 0x02 and restores proper volume control. The existing Legion entry is preserved unchanged. This follows the same pattern used for 17aa:386e, where Legion Y9000X and Yoga Pro 7 14ARP8 share a PCI SSID but are distinguished via HDA_CODEC_QUIRK. Link: https://github.com/nomad4tech/lenovo-yoga-pro-7-linux Tested-by: Alexander Savenko <alex.sav4387@gmail.com> Signed-off-by: Alexander Savenko <alex.sav4387@gmail.com> Link: https://patch.msgid.link/20260331082929.44890-1-alex.sav4387@gmail.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
9 daysbridge: mrp: reject zero test interval to avoid OOM panicXiang Mei
br_mrp_start_test() and br_mrp_start_in_test() accept the user-supplied interval value from netlink without validation. When interval is 0, usecs_to_jiffies(0) yields 0, causing the delayed work (br_mrp_test_work_expired / br_mrp_in_test_work_expired) to reschedule itself with zero delay. This creates a tight loop on system_percpu_wq that allocates and transmits MRP test frames at maximum rate, exhausting all system memory and causing a kernel panic via OOM deadlock. The same zero-interval issue applies to br_mrp_start_in_test_parse() for interconnect test frames. Use NLA_POLICY_MIN(NLA_U32, 1) in the nla_policy tables for both IFLA_BRIDGE_MRP_START_TEST_INTERVAL and IFLA_BRIDGE_MRP_START_IN_TEST_INTERVAL, so zero is rejected at the netlink attribute parsing layer before the value ever reaches the workqueue scheduling code. This is consistent with how other bridge subsystems (br_fdb, br_mst) enforce range constraints on netlink attributes. Fixes: 20f6a05ef635 ("bridge: mrp: Rework the MRP netlink interface") Fixes: 7ab1748e4ce6 ("bridge: mrp: Extend MRP netlink interface for configuring MRP interconnect") Reported-by: Weiming Shi <bestswngs@gmail.com> Signed-off-by: Xiang Mei <xmei5@asu.edu> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20260328063000.1845376-1-xmei5@asu.edu Signed-off-by: Paolo Abeni <pabeni@redhat.com>
9 daysASoC: Intel: boards: fix unmet dependency on PINCTRLJulian Braha
This reverts commit c073f0757663 ("ASoC: Intel: sof_sdw: select PINCTRL_CS42L43 and SPI_CS42L43") Currently, SND_SOC_INTEL_SOUNDWIRE_SOF_MACH selects PINCTRL_CS42L43 without also selecting or depending on PINCTRL, despite PINCTRL_CS42L43 depending on PINCTRL. See the following Kbuild warning: WARNING: unmet direct dependencies detected for PINCTRL_CS42L43 Depends on [n]: PINCTRL [=n] && MFD_CS42L43 [=m] Selected by [m]: - SND_SOC_INTEL_SOUNDWIRE_SOF_MACH [=m] && SOUND [=y] && SND [=m] && SND_SOC [=m] && SND_SOC_INTEL_MACH [=y] && (SND_SOC_SOF_INTEL_COMMON [=m] || !SND_SOC_SOF_INTEL_COMMON [=m]) && SND_SOC_SOF_INTEL_SOUNDWIRE [=m] && I2C [=y] && SPI_MASTER [=y] && ACPI [=y] && (MFD_INTEL_LPSS [=n] || COMPILE_TEST [=y]) && (SND_SOC_INTEL_USER_FRIENDLY_LONG_NAMES [=n] || COMPILE_TEST [=y]) && SOUNDWIRE [=m] In response to v1 of this patch [1], Arnd pointed out that there is no compile-time dependency sof_sdw and the PINCTRL_CS42L43 driver. After testing, I can confirm that the kernel compiled with SND_SOC_INTEL_SOUNDWIRE_SOF_MACH enabled and PINCTRL_CS42L43 disabled. This unmet dependency was detected by kconfirm, a static analysis tool for Kconfig. Link: https://lore.kernel.org/all/b8aecc71-1fed-4f52-9f6c-263fbe56d493@app.fastmail.com/ [1] Fixes: c073f0757663 ("ASoC: Intel: sof_sdw: select PINCTRL_CS42L43 and SPI_CS42L43") Signed-off-by: Julian Braha <julianbraha@gmail.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Link: https://patch.msgid.link/20260325001522.1727678-1-julianbraha@gmail.com Signed-off-by: Mark Brown <broonie@kernel.org>
9 daysmisc/mei: INTEL_MEI should depend on X86 or DRM_XEGeert Uytterhoeven
The Intel Management Engine Interface is only present on x86 platforms and Intel Xe graphics cards. Hence add a dependency on X86 or DRM_XE, to prevent asking the user about this driver when configuring a kernel for a non-x86 architecture and without Xe graphics support. Fixes: 25f9b0d35155 ("misc/mei: Allow building Intel ME interface on non-x86") Cc: stable <stable@kernel.org> Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Link: https://patch.msgid.link/8e2646fb71b148b3d38beb13f19b14e3634a1e1a.1769541024.git.geert+renesas@glider.be Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 daysmei: me: reduce the scope on unexpected resetAlexander Usyskin
After commit 2cedb296988c ("mei: me: trigger link reset if hw ready is unexpected") some devices started to show long resume times (5-7 seconds). This happens as mei falsely detects unready hardware, starts parallel link reset flow and triggers link reset timeouts in the resume callback. Address it by performing detection of unready hardware only when driver is in the MEI_DEV_ENABLED state instead of blacklisting states as done in the original patch. This eliminates active waitqueue check as in MEI_DEV_ENABLED state there will be no active waitqueue. Reviewed-by: Rafael J. Wysocki (Intel) <rafael@kernel.org> Reported-by: Todd Brandt <todd.e.brandt@linux.intel.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=221023 Tested-by: Todd Brandt <todd.e.brandt@linux.intel.com> Fixes: 2cedb296988c ("mei: me: trigger link reset if hw ready is unexpected") Cc: stable <stable@kernel.org> Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com> Link: https://patch.msgid.link/20260330083830.536056-1-alexander.usyskin@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 daysrust_binder: use AssertSync for BINDER_VM_OPSAlice Ryhl
When declaring an immutable global variable in Rust, the compiler checks that it looks thread safe, because it is generally safe to access said global variable. When using C bindings types for these globals, we don't really want this check, because it is conservative and assumes pointers are not thread safe. In the case of BINDER_VM_OPS, this is a challenge when combined with the patch 'userfaultfd: introduce vm_uffd_ops' [1], which introduces a pointer field to vm_operations_struct. It previously only held function pointers, which are considered thread safe. Rust Binder should not be assuming that vm_operations_struct contains no pointer fields, so to fix this, use AssertSync (which Rust Binder has already declared for another similar global of type struct file_operations with the same problem). This ensures that even if another commit adds a pointer field to vm_operations_struct, this does not cause problems. Fixes: 8ef2c15aeae0 ("rust_binder: check ownership before using vma") Cc: stable <stable@kernel.org> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202603121235.tpnRxFKO-lkp@intel.com/ Link: https://lore.kernel.org/r/20260306171815.3160826-8-rppt@kernel.org [1] Signed-off-by: Alice Ryhl <aliceryhl@google.com> Reviewed-by: Gary Guo <gary@garyguo.net> Link: https://patch.msgid.link/20260314111951.4139029-1-aliceryhl@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 daysthermal: core: Address thermal zone removal races with resumeRafael J. Wysocki
Since thermal_zone_pm_complete() and thermal_zone_device_resume() re-initialize the poll_queue delayed work for the given thermal zone, the cancel_delayed_work_sync() in thermal_zone_device_unregister() may miss some already running work items and the thermal zone may be freed prematurely [1]. There are two failing scenarios that both start with running thermal_pm_notify_complete() right before invoking thermal_zone_device_unregister() for one of the thermal zones. In the first scenario, there is a work item already running for the given thermal zone when thermal_pm_notify_complete() calls thermal_zone_pm_complete() for that thermal zone and it continues to run when thermal_zone_device_unregister() starts. Since the poll_queue delayed work has been re-initialized by thermal_pm_notify_complete(), the running work item will be missed by the cancel_delayed_work_sync() in thermal_zone_device_unregister() and if it continues to run past the freeing of the thermal zone object, a use-after-free will occur. In the second scenario, thermal_zone_device_resume() queued up by thermal_pm_notify_complete() runs right after the thermal_zone_exit() called by thermal_zone_device_unregister() has returned. The poll_queue delayed work is re-initialized by it before cancel_delayed_work_sync() is called by thermal_zone_device_unregister(), so it may continue to run after the freeing of the thermal zone object, which also leads to a use-after-free. Address the first failing scenario by ensuring that no thermal work items will be running when thermal_pm_notify_complete() is called. For this purpose, first move the cancel_delayed_work() call from thermal_zone_pm_complete() to thermal_zone_pm_prepare() to prevent new work from entering the workqueue going forward. Next, switch over to using a dedicated workqueue for thermal events and update the code in thermal_pm_notify() to flush that workqueue after thermal_pm_notify_prepare() has returned which will take care of all leftover thermal work already on the workqueue (that leftover work would do nothing useful anyway because all of the thermal zones have been flagged as suspended). The second failing scenario is addressed by adding a tz->state check to thermal_zone_device_resume() to prevent it from re-initializing the poll_queue delayed work if the thermal zone is going away. Note that the above changes will also facilitate relocating the suspend and resume of thermal zones closer to the suspend and resume of devices, respectively. Fixes: 5a5efdaffda5 ("thermal: core: Resume thermal zones asynchronously") Reported-by: syzbot+3b3852c6031d0f30dfaf@syzkaller.appspotmail.com Closes: https://syzbot.org/bug?extid=3b3852c6031d0f30dfaf Reported-by: Mauricio Faria de Oliveira <mfo@igalia.com> Closes: https://lore.kernel.org/linux-pm/20260324-thermal-core-uaf-init_delayed_work-v1-1-6611ae76a8a1@igalia.com/ [1] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Mauricio Faria de Oliveira <mfo@igalia.com> Tested-by: Mauricio Faria de Oliveira <mfo@igalia.com> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Cc: All applicable <stable@vger.kernel.org> Link: https://patch.msgid.link/6267615.lOV4Wx5bFT@rafael.j.wysocki
9 daysASoC: Intel: ehl_rt5660: Use the correct rtd->dev device in hw_paramsSachin Mokashi
In rt5660_hw_params(), the error path for snd_soc_dai_set_sysclk() correctly uses rtd->dev as the logging device, but the error path for snd_soc_dai_set_pll() uses codec_dai->dev instead. These two devices are distinct: - rtd->dev is the platform device of the PCM runtime (the Intel HDA/SSP controller, e.g. 0000:00:1f.3), which owns the machine driver callback. - codec_dai->dev is the I2C device of the rt5660 codec itself (i2c-10EC5660:00). Since hw_params is a machine driver operation and both calls are made within the same function from the machine driver's context, all error messages should be attributed to rtd->dev. Using codec_dai->dev for one of them would suggest the error originates inside the codec driver, which is misleading. Align the PLL error log with the sysclk one to use rtd->dev, matching the convention used by all other Intel board drivers in this directory. Signed-off-by: Sachin Mokashi <sachin.mokashi@intel.com> Link: https://patch.msgid.link/20260327131439.1330373-1-sachin.mokashi@intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
9 daysdrm/sysfb: Fix efidrm error handling and memory type mismatchChen Ni
Fix incorrect error checking and memory type confusion in efidrm_device_create(). devm_memremap() returns error pointers, not NULL, and returns system memory while devm_ioremap() returns I/O memory. The code incorrectly passes system memory to iosys_map_set_vaddr_iomem(). Restructure to handle each memory type separately. Use devm_ioremap*() with ERR_PTR(-ENXIO) for WC/UC, and devm_memremap() with ERR_CAST() for WT/WB. Fixes: 32ae90c66fb6 ("drm/sysfb: Add efidrm for EFI displays") Signed-off-by: Chen Ni <nichen@iscas.ac.cn> Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patch.msgid.link/20260311064652.2903449-1-nichen@iscas.ac.cn
9 daysMerge branch ↵Paolo Abeni
'correct-bd-length-masks-and-bql-accounting-for-multi-bd-tx-packets' Suraj Gupta says: ==================== Correct BD length masks and BQL accounting for multi-BD TX packets This patch series fixes two issues in the Xilinx AXI Ethernet driver: 1. Corrects the BD length masks to match the AXIDMA IP spec. 2. Fixes BQL accounting for multi-BD TX packets. ==================== Link: https://patch.msgid.link/20260327073238.134948-1-suraj.gupta2@amd.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
9 daysnet: xilinx: axienet: Fix BQL accounting for multi-BD TX packetsSuraj Gupta
When a TX packet spans multiple buffer descriptors (scatter-gather), axienet_free_tx_chain sums the per-BD actual length from descriptor status into a caller-provided accumulator. That sum is reset on each NAPI poll. If the BDs for a single packet complete across different polls, the earlier bytes are lost and never credited to BQL. This causes BQL to think bytes are permanently in-flight, eventually stalling the TX queue. The SKB pointer is stored only on the last BD of a packet. When that BD completes, use skb->len for the byte count instead of summing per-BD status lengths. This matches netdev_sent_queue(), which debits skb->len, and naturally survives across polls because no partial packet contributes to the accumulator. Fixes: c900e49d58eb ("net: xilinx: axienet: Implement BQL") Signed-off-by: Suraj Gupta <suraj.gupta2@amd.com> Reviewed-by: Sean Anderson <sean.anderson@linux.dev> Link: https://patch.msgid.link/20260327073238.134948-3-suraj.gupta2@amd.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
9 daysnet: xilinx: axienet: Correct BD length masks to match AXIDMA IP specSuraj Gupta
The XAXIDMA_BD_CTRL_LENGTH_MASK and XAXIDMA_BD_STS_ACTUAL_LEN_MASK macros were defined as 0x007FFFFF (23 bits), but the AXI DMA IP product guide (PG021) specifies the buffer length field as bits 25:0 (26 bits). Update both masks to match the IP documentation. In practice this had no functional impact, since Ethernet frames are far smaller than 2^23 bytes and the extra bits were always zero, but the masks should still reflect the hardware specification. Fixes: 8a3b7a252dca ("drivers/net/ethernet/xilinx: added Xilinx AXI Ethernet driver") Signed-off-by: Suraj Gupta <suraj.gupta2@amd.com> Reviewed-by: Sean Anderson <sean.anderson@linux.dev> Link: https://patch.msgid.link/20260327073238.134948-2-suraj.gupta2@amd.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
9 daysNFC: pn533: bound the UART receive bufferPengpeng Hou
pn532_receive_buf() appends every incoming byte to dev->recv_skb and only resets the buffer after pn532_uart_rx_is_frame() recognizes a complete frame. A continuous stream of bytes without a valid PN532 frame header therefore keeps growing the skb until skb_put_u8() hits the tail limit. Drop the accumulated partial frame once the fixed receive buffer is full so malformed UART traffic cannot grow the skb past PN532_UART_SKB_BUFF_LEN. Fixes: c656aa4c27b1 ("nfc: pn533: add UART phy driver") Signed-off-by: Pengpeng Hou <pengpeng@iscas.ac.cn> Link: https://patch.msgid.link/20260326142033.82297-1-pengpeng@iscas.ac.cn Signed-off-by: Paolo Abeni <pabeni@redhat.com>
9 daysnet: bonding: fix use-after-free in bond_xmit_broadcast()Xiang Mei
bond_xmit_broadcast() reuses the original skb for the last slave (determined by bond_is_last_slave()) and clones it for others. Concurrent slave enslave/release can mutate the slave list during RCU-protected iteration, changing which slave is "last" mid-loop. This causes the original skb to be double-consumed (double-freed). Replace the racy bond_is_last_slave() check with a simple index comparison (i + 1 == slaves_count) against the pre-snapshot slave count taken via READ_ONCE() before the loop. This preserves the zero-copy optimization for the last slave while making the "last" determination stable against concurrent list mutations. The UAF can trigger the following crash: ================================================================== BUG: KASAN: slab-use-after-free in skb_clone Read of size 8 at addr ffff888100ef8d40 by task exploit/147 CPU: 1 UID: 0 PID: 147 Comm: exploit Not tainted 7.0.0-rc3+ #4 PREEMPTLAZY Call Trace: <TASK> dump_stack_lvl (lib/dump_stack.c:123) print_report (mm/kasan/report.c:379 mm/kasan/report.c:482) kasan_report (mm/kasan/report.c:597) skb_clone (include/linux/skbuff.h:1724 include/linux/skbuff.h:1792 include/linux/skbuff.h:3396 net/core/skbuff.c:2108) bond_xmit_broadcast (drivers/net/bonding/bond_main.c:5334) bond_start_xmit (drivers/net/bonding/bond_main.c:5567 drivers/net/bonding/bond_main.c:5593) dev_hard_start_xmit (include/linux/netdevice.h:5325 include/linux/netdevice.h:5334 net/core/dev.c:3871 net/core/dev.c:3887) __dev_queue_xmit (include/linux/netdevice.h:3601 net/core/dev.c:4838) ip6_finish_output2 (include/net/neighbour.h:540 include/net/neighbour.h:554 net/ipv6/ip6_output.c:136) ip6_finish_output (net/ipv6/ip6_output.c:208 net/ipv6/ip6_output.c:219) ip6_output (net/ipv6/ip6_output.c:250) ip6_send_skb (net/ipv6/ip6_output.c:1985) udp_v6_send_skb (net/ipv6/udp.c:1442) udpv6_sendmsg (net/ipv6/udp.c:1733) __sys_sendto (net/socket.c:730 net/socket.c:742 net/socket.c:2206) __x64_sys_sendto (net/socket.c:2209) do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130) </TASK> Allocated by task 147: Freed by task 147: The buggy address belongs to the object at ffff888100ef8c80 which belongs to the cache skbuff_head_cache of size 224 The buggy address is located 192 bytes inside of freed 224-byte region [ffff888100ef8c80, ffff888100ef8d60) Memory state around the buggy address: ffff888100ef8c00: fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc ffff888100ef8c80: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >ffff888100ef8d00: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc ^ ffff888100ef8d80: fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb fb ffff888100ef8e00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ================================================================== Fixes: 4e5bd03ae346 ("net: bonding: fix bond_xmit_broadcast return value error bug") Reported-by: Weiming Shi <bestswngs@gmail.com> Signed-off-by: Xiang Mei <xmei5@asu.edu> Link: https://patch.msgid.link/20260326075553.3960562-1-xmei5@asu.edu Signed-off-by: Paolo Abeni <pabeni@redhat.com>
9 daysALSA: ctxfi: Don't enumerate SPDIF1 at DAIO initializationTakashi Iwai
The recent refactoring of xfi driver changed the assignment of atc->daios[] at atc_get_resources(); now it loops over all enum DAIOTYP entries while it looped formerly only a part of them. The problem is that the last entry, SPDIF1, is a special type that is used only for hw20k1 CTSB073X model (as a replacement of SPDIFIO), and there is no corresponding definition for hw20k2. Due to the lack of the info, it caused a kernel crash on hw20k2, which was already worked around by the commit b045ab3dff97 ("ALSA: ctxfi: Fix missing SPDIFI1 index handling"). This patch addresses the root cause of the regression above properly, simply by skipping the incorrect SPDIF1 type in the parser loop. For making the change clearer, the code is slightly arranged, too. Fixes: a2dbaeb5c61e ("ALSA: ctxfi: Refactor resource alloc for sparse mappings") Cc: <stable@vger.kernel.org> Link: https://bugzilla.suse.com/show_bug.cgi?id=1259925 Link: https://patch.msgid.link/20260331081227.216134-1-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
9 dayscrypto: authencesn - Do not place hiseq at end of dst for out-of-place ↵Herbert Xu
decryption When decrypting data that is not in-place (src != dst), there is no need to save the high-order sequence bits in dst as it could simply be re-copied from the source. However, the data to be hashed need to be rearranged accordingly. Reported-by: Taeyang Lee <0wn@theori.io> Fixes: 104880a6b470 ("crypto: authencesn - Convert to new AEAD interface") Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Thanks, Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
9 dayscrypto: algif_aead - Revert to operating out-of-placeHerbert Xu
This mostly reverts commit 72548b093ee3 except for the copying of the associated data. There is no benefit in operating in-place in algif_aead since the source and destination come from different mappings. Get rid of all the complexity added for in-place operation and just copy the AD directly. Fixes: 72548b093ee3 ("crypto: algif_aead - copy AAD from src to dst") Reported-by: Taeyang Lee <0wn@theori.io> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
9 daysirqchip/riscv-aplic: Restrict genpd notifier to device tree onlyJessica Liu
On ACPI systems, the aplic's pm_domain is set to acpi_general_pm_domain, which provides its own power management callbacks (e.g., runtime_suspend via acpi_subsys_runtime_suspend). aplic_pm_add() unconditionally calls dev_pm_genpd_add_notifier() when dev->pm_domain is non‑NULL, leading to a comparison between runtime_suspend and genpd_runtime_suspend. This results in the following errors when ACPI is enabled: riscv-aplic RSCV0002:00: failed to create APLIC context riscv-aplic RSCV0002:00: error -ENODEV: failed to setup APLIC in MSI mode Fix this by checking for dev->of_node before adding or removing the genpd notifier, ensuring it is only used for device tree based systems. Fixes: 95a8ddde3660 ("irqchip/riscv-aplic: Preserve APLIC states across suspend/resume") Signed-off-by: Jessica Liu <liu.xuemei1@zte.com.cn> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Link: https://patch.msgid.link/20260331093029749vRpdH-0qoEqjS0Wnn9M4x@zte.com.cn
9 daysx86/platform/geode: Fix on-stack property data use-after-return bugDmitry Torokhov
The PROPERTY_ENTRY_GPIO macro (and by extension PROPERTY_ENTRY_REF) creates a temporary software_node_ref_args structure on the stack when used in a runtime assignment. This results in the property pointing to data that is invalid once the function returns. Fix this by ensuring the GPIO reference data is not stored on stack and using PROPERTY_ENTRY_REF_ARRAY_LEN() to point directly to the persistent reference data. Fixes: 298c9babadb8 ("x86/platform/geode: switch GPIO buttons and LEDs to software properties") Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Rafael J. Wysocki <rafael@kernel.org> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Daniel Scally <djrscally@gmail.com> Cc: Danilo Krummrich <dakr@kernel.org> Cc: Hans de Goede <hansg@kernel.org> Cc: Heikki Krogerus <heikki.krogerus@linux.intel.com> Cc: Sakari Ailus <sakari.ailus@linux.intel.com> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20260329-property-gpio-fix-v2-1-3cca5ba136d8@gmail.com
9 daysALSA: hda/realtek: Add quirk for Lenovo Yoga Slim 7 14AKP10songxiebing
The Pin Complex 0x17 (bass/woofer speakers) is incorrectly reported as unconnected in the BIOS (pin default 0x411111f0 = N/A). This causes the kernel to configure speaker_outs=0, meaning only the tweeters (pin 0x14) are used. The result is very low, tinny audio with no bass. The existing quirk ALC287_FIXUP_YOGA9_14IAP7_BASS_SPK_PIN (already present in patch_realtek.c for SSID 0x17aa3801) fixes the issue completely. Reported-by: Garcicasti <andresgarciacastilla@gmail.com> Link: https://bugzilla.kernel.org/show_bug.cgi?id=221298 Signed-off-by: songxiebing <songxiebing@kylinos.cn> Link: https://patch.msgid.link/20260331033650.285601-1-songxiebing@kylinos.cn Signed-off-by: Takashi Iwai <tiwai@suse.de>
9 daysALSA: hda/realtek: add quirk for HP Laptop 15-fc0xxxZhang Heng
For the HP Laptop 15-fc0xxx with ALC236, the built-in mic 0x12 was not set up, making it unusable; after adding it, it now works properly. Link: https://bugzilla.kernel.org/show_bug.cgi?id=221233 Signed-off-by: Zhang Heng <zhangheng@kylinos.cn> Link: https://patch.msgid.link/20260331013536.13778-1-zhangheng@kylinos.cn Signed-off-by: Takashi Iwai <tiwai@suse.de>
9 daysdrm/i915/dp: Use crtc_state->enhanced_framing properly on ivb/hsw CPU eDPVille Syrjälä
Looks like I missed the drm_dp_enhanced_frame_cap() in the ivb/hsw CPU eDP code when I introduced crtc_state->enhanced_framing. Fix it up so that the state we program to the hardware is guaranteed to match what we computed earlier. Cc: stable@vger.kernel.org Fixes: 3072a24c778a ("drm/i915: Introduce crtc_state->enhanced_framing") Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patch.msgid.link/20260325135849.12603-3-ville.syrjala@linux.intel.com Reviewed-by: Michał Grzelak <michal.grzelak@intel.com> (cherry picked from commit 799fe8dc2af52f35c78c4ac97f8e34994dfd8760) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
9 daysdrm/i915/cdclk: Do the full CDCLK dance for min_voltage_level changesVille Syrjälä
Apparently I forgot about the pipe min_voltage_level when I decoupled the CDCLK calculations from modesets. Even if the CDCLK frequency doesn't need changing we may still need to bump the voltage level to accommodate an increase in the port clock frequency. Currently, even if there is a full modeset, we won't notice the need to go through the full CDCLK calculations/programming, unless the set of enabled/active pipes changes, or the pipe/dbuf min CDCLK changes. Duplicate the same logic we use the pipe's min CDCLK frequency to also deal with its min voltage level. Note that the 'allow_voltage_level_decrease' stuff isn't really useful here since the min voltage level can only change during a full modeset. But I think sticking to the same approach in the three similar parts (pipe min cdclk, pipe min voltage level, dbuf min cdclk) is a good idea. Cc: stable@vger.kernel.org Tested-by: Mikhail Rudenko <mike.rudenko@gmail.com> Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/15826 Fixes: ba91b9eecb47 ("drm/i915/cdclk: Decouple cdclk from state->modeset") Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patch.msgid.link/20260325135849.12603-2-ville.syrjala@linux.intel.com Reviewed-by: Michał Grzelak <michal.grzelak@intel.com> (cherry picked from commit 0f21a14987ebae3c05ad1184ea872e7b7a7b8695) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
9 daysbtrfs: fix incorrect return value after changing leaf in ↵robbieko
lookup_extent_data_ref() After commit 1618aa3c2e01 ("btrfs: simplify return variables in lookup_extent_data_ref()"), the err and ret variables were merged into a single ret variable. However, when btrfs_next_leaf() returns 0 (success), ret is overwritten from -ENOENT to 0. If the first key in the next leaf does not match (different objectid or type), the function returns 0 instead of -ENOENT, making the caller believe the lookup succeeded when it did not. This can lead to operations on the wrong extent tree item, potentially causing extent tree corruption. Fix this by returning -ENOENT directly when the key does not match, instead of relying on the ret variable. Fixes: 1618aa3c2e01 ("btrfs: simplify return variables in lookup_extent_data_ref()") CC: stable@vger.kernel.org # 6.12+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: robbieko <robbieko@synology.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
9 daysbnxt_en: set backing store type from query typePengpeng Hou
bnxt_hwrm_func_backing_store_qcaps_v2() stores resp->type from the firmware response in ctxm->type and later uses that value to index fixed backing-store metadata arrays such as ctx_arr[] and bnxt_bstore_to_trace[]. ctxm->type is fixed by the current backing-store query type and matches the array index of ctx->ctx_arr. Set ctxm->type from the current loop variable instead of depending on resp->type. Also update the loop to advance type from next_valid_type in the for statement, which keeps the control flow simpler for non-valid and unchanged entries. Fixes: 6a4d0774f02d ("bnxt_en: Add support for new backing store query firmware API") Signed-off-by: Pengpeng Hou <pengpeng@iscas.ac.cn> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Tested-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20260328234357.43669-1-pengpeng@iscas.ac.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 daysnet: airoha: Delay offloading until all net_devices are fully registeredLorenzo Bianconi
Netfilter flowtable can theoretically try to offload flower rules as soon as a net_device is registered while all the other ones are not registered or initialized, triggering a possible NULL pointer dereferencing of qdma pointer in airoha_ppe_set_cpu_port routine. Moreover, if register_netdev() fails for a particular net_device, there is a small race if Netfilter tries to offload flowtable rules before all the net_devices are properly unregistered in airoha_probe() error patch, triggering a NULL pointer dereferencing in airoha_ppe_set_cpu_port routine. In order to avoid any possible race, delay offloading until all net_devices are registered in the networking subsystem. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://patch.msgid.link/20260329-airoha-regiser-race-fix-v2-1-f4ebb139277b@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 daysnet: sched: cls_api: fix tc_chain_fill_node to initialize tcm_info to zero ↵Yochai Eisenrich
to prevent an info-leak When building netlink messages, tc_chain_fill_node() never initializes the tcm_info field of struct tcmsg. Since the allocation is not zeroed, kernel heap memory is leaked to userspace through this 4-byte field. The fix simply zeroes tcm_info alongside the other fields that are already initialized. Fixes: 32a4f5ecd738 ("net: sched: introduce chain object to uapi") Signed-off-by: Yochai Eisenrich <echelonh@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://patch.msgid.link/20260328211436.1010152-1-echelonh@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 daysnet: use skb_header_pointer() for TCPv4 GSO frag_off checkGuoyu Su
Syzbot reported a KMSAN uninit-value warning in gso_features_check() called from netif_skb_features() [1]. gso_features_check() reads iph->frag_off to decide whether to clear mangleid_features. Accessing the IPv4 header via ip_hdr()/inner_ip_hdr() can rely on skb header offsets that are not always safe for direct dereference on packets injected from PF_PACKET paths. Use skb_header_pointer() for the TCPv4 frag_off check so the header read is robust whether data is already linear or needs copying. [1] https://syzkaller.appspot.com/bug?extid=1543a7d954d9c6d00407 Link: https://lore.kernel.org/netdev/willemdebruijn.kernel.1a9f35039caab@gmail.com/ Fixes: cbc53e08a793 ("GSO: Add GSO type for fixed IPv4 ID") Reported-by: syzbot+1543a7d954d9c6d00407@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=1543a7d954d9c6d00407 Tested-by: syzbot+1543a7d954d9c6d00407@syzkaller.appspotmail.com Signed-off-by: Guoyu Su <yss2813483011xxl@gmail.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20260327153507.39742-1-yss2813483011xxl@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 daysnet: airoha: Add missing cleanup bits in airoha_qdma_cleanup_rx_queue()Lorenzo Bianconi
In order to properly cleanup hw rx QDMA queues and bring the device to the initial state, reset rx DMA queue head/tail index. Moreover, reset queued DMA descriptor fields. Fixes: 23020f049327 ("net: airoha: Introduce ethernet support for EN7581 SoC") Tested-by: Madhur Agrawal <Madhur.Agrawal@airoha.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://patch.msgid.link/20260327-airoha_qdma_cleanup_rx_queue-fix-v1-1-369d6ab1511a@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 daysipv6: prevent possible UaF in addrconf_permanent_addr()Paolo Abeni
The mentioned helper try to warn the user about an exceptional condition, but the message is delivered too late, accessing the ipv6 after its possible deletion. Reorder the statement to avoid the possible UaF; while at it, place the warning outside the idev->lock as it needs no protection. Reported-by: Jakub Kicinski <kuba@kernel.org> Closes: https://sashiko.dev/#/patchset/8c8bfe2e1a324e501f0e15fef404a77443fd8caf.1774365668.git.pabeni%40redhat.com Fixes: f1705ec197e7 ("net: ipv6: Make address flushing on ifdown optional") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Link: https://patch.msgid.link/ef973c3a8cb4f8f1787ed469f3e5391b9fe95aa0.1774601542.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 daysMerge tag 'libcrypto-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux Pull crypto library fix from Eric Biggers: "Fix missing zeroization of the ChaCha state" * tag 'libcrypto-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux: lib/crypto: chacha: Zeroize permuted_state before it leaves scope
10 daysspi: cadence-qspi: Fix exec_mem_op error handlingEmanuele Ghidoli
cqspi_exec_mem_op() increments the runtime PM usage counter before all refcount checks are performed. If one of these checks fails, the function returns without dropping the PM reference. Move the pm_runtime_resume_and_get() call after the refcount checks so that runtime PM is only acquired when the operation can proceed and drop the inflight_ops refcount if the PM resume fails. Cc: stable@vger.kernel.org Fixes: 7446284023e8 ("spi: cadence-quadspi: Implement refcount to handle unbind during busy") Signed-off-by: Emanuele Ghidoli <emanuele.ghidoli@toradex.com> Link: https://patch.msgid.link/20260313135236.46642-1-ghidoliemanuele@gmail.com Signed-off-by: Mark Brown <broonie@kernel.org>
10 daysdrm/amdkfd: Fix queue preemption/eviction failures by aligning control stack ↵Donet Tom
size to GPU page size The control stack size is calculated based on the number of CUs and waves, and is then aligned to PAGE_SIZE. When the resulting control stack size is aligned to 64 KB, GPU hangs and queue preemption failures are observed while running RCCL unit tests on systems with more than two GPUs. amdgpu 0048:0f:00.0: amdgpu: Queue preemption failed for queue with doorbell_id: 80030008 amdgpu 0048:0f:00.0: amdgpu: Failed to evict process queues amdgpu 0048:0f:00.0: amdgpu: GPU reset begin!. Source: 4 amdgpu 0048:0f:00.0: amdgpu: Queue preemption failed for queue with doorbell_id: 80030008 amdgpu 0048:0f:00.0: amdgpu: Failed to evict process queues amdgpu 0048:0f:00.0: amdgpu: Failed to restore process queues This issue is observed on both 4 KB and 64 KB system page-size configurations. This patch fixes the issue by aligning the control stack size to AMDGPU_GPU_PAGE_SIZE instead of PAGE_SIZE, so the control stack size will not be 64 KB on systems with a 64 KB page size and queue preemption works correctly. Additionally, In the current code, wg_data_size is aligned to PAGE_SIZE, which can waste memory if the system page size is large. In this patch, wg_data_size is aligned to AMDGPU_GPU_PAGE_SIZE. The cwsr_size, calculated from wg_data_size and the control stack size, is aligned to PAGE_SIZE. Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Donet Tom <donettom@linux.ibm.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit a3e14436304392fbada359edd0f1d1659850c9b7)
10 dayshwmon: (occ) Fix missing newline in occ_show_extended()Sanman Pradhan
In occ_show_extended() case 0, when the EXTN_FLAG_SENSOR_ID flag is set, the sysfs_emit format string "%u" is missing the trailing newline that the sysfs ABI expects. The else branch correctly uses "%4phN\n", and all other show functions in this file include the trailing newline. Add the missing "\n" for consistency and correct sysfs output. Fixes: c10e753d43eb ("hwmon (occ): Add sensor types and versions") Signed-off-by: Sanman Pradhan <psanman@juniper.net> Link: https://lore.kernel.org/r/20260326224510.294619-3-sanman.pradhan@hpe.com Signed-off-by: Guenter Roeck <linux@roeck-us.net>
10 daysdrm/amdgpu: Fix wait after reset sequence in S4Lijo Lazar
For a mode-1 reset done at the end of S4 on PSPv11 dGPUs, only check if TOS is unloaded. Fixes: 32f73741d6ee ("drm/amdgpu: Wait for bootloader after PSPv11 reset") Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/4853 Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 2fb4883b884a437d760bd7bdf7695a7e5a60bba3) Cc: stable@vger.kernel.org
10 dayshwmon: (occ) Fix division by zero in occ_show_power_1()Sanman Pradhan
In occ_show_power_1() case 1, the accumulator is divided by update_tag without checking for zero. If no samples have been collected yet (e.g. during early boot when the sensor block is included but hasn't been updated), update_tag is zero, causing a kernel divide-by-zero crash. The 2019 fix in commit 211186cae14d ("hwmon: (occ) Fix division by zero issue") only addressed occ_get_powr_avg() used by occ_show_power_2() and occ_show_power_a0(). This separate code path in occ_show_power_1() was missed. Fix this by reusing the existing occ_get_powr_avg() helper, which already handles the zero-sample case and uses mul_u64_u32_div() to multiply before dividing for better precision. Move the helper above occ_show_power_1() so it is visible at the call site. Fixes: c10e753d43eb ("hwmon (occ): Add sensor types and versions") Cc: stable@vger.kernel.org Signed-off-by: Sanman Pradhan <psanman@juniper.net> Link: https://lore.kernel.org/r/20260326224510.294619-2-sanman.pradhan@hpe.com [groeck: Fix alignment problems reported by checkpatch] Signed-off-by: Guenter Roeck <linux@roeck-us.net>
10 daysdrm/amd/display: Fix NULL pointer dereference in dcn401_init_hw()Srinivasan Shanmugam
dcn401_init_hw() assumes that update_bw_bounding_box() is valid when entering the update path. However, the existing condition: ((!fams2_enable && update_bw_bounding_box) || freq_changed) does not guarantee this, as the freq_changed branch can evaluate to true independently of the callback pointer. This can result in calling update_bw_bounding_box() when it is NULL. Fix this by separating the update condition from the pointer checks and ensuring the callback, dc->clk_mgr, and bw_params are validated before use. Fixes the below: ../dc/hwss/dcn401/dcn401_hwseq.c:367 dcn401_init_hw() error: we previously assumed 'dc->res_pool->funcs->update_bw_bounding_box' could be null (see line 362) Fixes: ca0fb243c3bb ("drm/amd/display: Underflow Seen on DCN401 eGPU") Cc: Daniel Sa <Daniel.Sa@amd.com> Cc: Alvin Lee <alvin.lee2@amd.com> Cc: Roman Li <roman.li@amd.com> Cc: Alex Hung <alex.hung@amd.com> Cc: Tom Chung <chiahsuan.chung@amd.com> Cc: Dan Carpenter <dan.carpenter@linaro.org> Cc: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 86117c5ab42f21562fedb0a64bffea3ee5fcd477) Cc: stable@vger.kernel.org
10 daysdrm/amdgpu: Change AMDGPU_VA_RESERVED_TRAP_SIZE to 64KBDonet Tom
Currently, AMDGPU_VA_RESERVED_TRAP_SIZE is hardcoded to 8KB, while KFD_CWSR_TBA_TMA_SIZE is defined as 2 * PAGE_SIZE. On systems with 4K pages, both values match (8KB), so allocation and reserved space are consistent. However, on 64K page-size systems, KFD_CWSR_TBA_TMA_SIZE becomes 128KB, while the reserved trap area remains 8KB. This mismatch causes the kernel to crash when running rocminfo or rccl unit tests. Kernel attempted to read user page (2) - exploit attempt? (uid: 1001) BUG: Kernel NULL pointer dereference on read at 0x00000002 Faulting instruction address: 0xc0000000002c8a64 Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries CPU: 34 UID: 1001 PID: 9379 Comm: rocminfo Tainted: G E 6.19.0-rc4-amdgpu-00320-gf23176405700 #56 VOLUNTARY Tainted: [E]=UNSIGNED_MODULE Hardware name: IBM,9105-42A POWER10 (architected) 0x800200 0xf000006 of:IBM,FW1060.30 (ML1060_896) hv:phyp pSeries NIP: c0000000002c8a64 LR: c00000000125dbc8 CTR: c00000000125e730 REGS: c0000001e0957580 TRAP: 0300 Tainted: G E MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 24008268 XER: 00000036 CFAR: c00000000125dbc4 DAR: 0000000000000002 DSISR: 40000000 IRQMASK: 1 GPR00: c00000000125d908 c0000001e0957820 c0000000016e8100 c00000013d814540 GPR04: 0000000000000002 c00000013d814550 0000000000000045 0000000000000000 GPR08: c00000013444d000 c00000013d814538 c00000013d814538 0000000084002268 GPR12: c00000000125e730 c000007e2ffd5f00 ffffffffffffffff 0000000000020000 GPR16: 0000000000000000 0000000000000002 c00000015f653000 0000000000000000 GPR20: c000000138662400 c00000013d814540 0000000000000000 c00000013d814500 GPR24: 0000000000000000 0000000000000002 c0000001e0957888 c0000001e0957878 GPR28: c00000013d814548 0000000000000000 c00000013d814540 c0000001e0957888 NIP [c0000000002c8a64] __mutex_add_waiter+0x24/0xc0 LR [c00000000125dbc8] __mutex_lock.constprop.0+0x318/0xd00 Call Trace: 0xc0000001e0957890 (unreliable) __mutex_lock.constprop.0+0x58/0xd00 amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu+0x6fc/0xb60 [amdgpu] kfd_process_alloc_gpuvm+0x54/0x1f0 [amdgpu] kfd_process_device_init_cwsr_dgpu+0xa4/0x1a0 [amdgpu] kfd_process_device_init_vm+0xd8/0x2e0 [amdgpu] kfd_ioctl_acquire_vm+0xd0/0x130 [amdgpu] kfd_ioctl+0x514/0x670 [amdgpu] sys_ioctl+0x134/0x180 system_call_exception+0x114/0x300 system_call_vectored_common+0x15c/0x2ec This patch changes AMDGPU_VA_RESERVED_TRAP_SIZE to 64 KB and KFD_CWSR_TBA_TMA_SIZE to the AMD GPU page size. This means we reserve 64 KB for the trap in the address space, but only allocate 8 KB within it. With this approach, the allocation size never exceeds the reserved area. Fixes: 34a1de0f7935 ("drm/amdkfd: Relocate TBA/TMA to opposite side of VM hole") Reviewed-by: Christian König <christian.koenig@amd.com> Suggested-by: Felix Kuehling <felix.kuehling@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Signed-off-by: Donet Tom <donettom@linux.ibm.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 31b8de5e55666f26ea7ece5f412b83eab3f56dbb) Cc: stable@vger.kernel.org
10 daysdrm/amdgpu/userq: fix memory leak in MQD creation error pathsJunrui Luo
In mes_userq_mqd_create(), the memdup_user() allocations for IP-specific MQD structs are not freed when subsequent VA validation fails. The goto free_mqd label only cleans up the MQD BO object and userq_props. Fix by adding kfree() before each goto free_mqd on VA validation failure in the COMPUTE, GFX, and SDMA branches. Fixes: 9e46b8bb0539 ("drm/amdgpu: validate userq buffer virtual address and size") Reported-by: Yuhao Jiang <danisjiang@gmail.com> Signed-off-by: Junrui Luo <moonafterrain@outlook.com> Reviewed-by: Prike Liang <Prike.Liang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 27f5ff9e4a4150d7cf8b4085aedd3b77ddcc5d08) Cc: stable@vger.kernel.org
10 daysdrm/amd: Fix MQD and control stack alignment for non-4KDonet Tom
For gfxV9, due to a hardware bug ("based on the comments in the code here [1]"), the control stack of a user-mode compute queue must be allocated immediately after the page boundary of its regular MQD buffer. To handle this, we allocate an enlarged MQD buffer where the first page is used as the MQD and the remaining pages store the control stack. Although these regions share the same BO, they require different memory types: the MQD must be UC (uncached), while the control stack must be NC (non-coherent), matching the behavior when the control stack is allocated in user space. This logic works correctly on systems where the CPU page size matches the GPU page size (4K). However, the current implementation aligns both the MQD and the control stack to the CPU PAGE_SIZE. On systems with a larger CPU page size, the entire first CPU page is marked UC—even though that page may contain multiple GPU pages. The GPU treats the second 4K GPU page inside that CPU page as part of the control stack, but it is incorrectly mapped as UC. This patch fixes the issue by aligning both the MQD and control stack sizes to the GPU page size (4K). The first 4K page is correctly marked as UC for the MQD, and the remaining GPU pages are marked NC for the control stack. This ensures proper memory type assignment on systems with larger CPU page sizes. [1]: https://elixir.bootlin.com/linux/v6.18/source/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c#L118 Acked-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Donet Tom <donettom@linux.ibm.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 998d6781410de1c4b787fdbf6c56e851ea7fa553)
10 daysMerge tag 'trace-rtla-v7.0-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull rtla build fix from Steven Rostedt: - Fix build failure when libbpf does not exist RTLA supports building without BPF libraries, but a recent change added a libbpf.h include outside of the BPF protection which caused build failures when libbpf was not installed. * tag 'trace-rtla-v7.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: rtla: Fix build without libbpf header
10 daysdrm/amdkfd: Align expected_queue_size to PAGE_SIZEDonet Tom
The AQL queue size can be 4K, but the minimum buffer object (BO) allocation size is PAGE_SIZE. On systems with a page size larger than 4K, the expected queue size does not match the allocated BO size, causing queue creation to fail. Align the expected queue size to PAGE_SIZE so that it matches the allocated BO size and allows queue creation to succeed. Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Donet Tom <donettom@linux.ibm.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit b01cd158a2f5230b137396c5f8cda3fc780abbc2)
10 daysdrm/amdgpu: fix the idr allocation flagsPrike Liang
Fix the IDR allocation flags by using atomic GFP flags in non‑sleepable contexts to avoid the __might_sleep() complaint. 268.290239] [drm] Initialized amdgpu 3.64.0 for 0000:03:00.0 on minor 0 [ 268.294900] BUG: sleeping function called from invalid context at ./include/linux/sched/mm.h:323 [ 268.295355] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1744, name: modprobe [ 268.295705] preempt_count: 1, expected: 0 [ 268.295886] RCU nest depth: 0, expected: 0 [ 268.296072] 2 locks held by modprobe/1744: [ 268.296077] #0: ffff8c3a44abd1b8 (&dev->mutex){....}-{4:4}, at: __driver_attach+0xe4/0x210 [ 268.296100] #1: ffffffffc1a6ea78 (amdgpu_pasid_idr_lock){+.+.}-{3:3}, at: amdgpu_pasid_alloc+0x26/0xe0 [amdgpu] [ 268.296494] CPU: 12 UID: 0 PID: 1744 Comm: modprobe Tainted: G U OE 6.19.0-custom #16 PREEMPT(voluntary) [ 268.296498] Tainted: [U]=USER, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE [ 268.296499] Hardware name: AMD Majolica-RN/Majolica-RN, BIOS RMJ1009A 06/13/2021 [ 268.296501] Call Trace: Fixes: 8f1de51f49be ("drm/amdgpu: prevent immediate PASID reuse case") Tested-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit ea56aa2625708eaf96f310032391ff37746310ef) Cc: stable@vger.kernel.org
10 daysdrm/amdgpu: validate doorbell_offset in user queue creationJunrui Luo
amdgpu_userq_get_doorbell_index() passes the user-provided doorbell_offset to amdgpu_doorbell_index_on_bar() without bounds checking. An arbitrarily large doorbell_offset can cause the calculated doorbell index to fall outside the allocated doorbell BO, potentially corrupting kernel doorbell space. Validate that doorbell_offset falls within the doorbell BO before computing the BAR index, using u64 arithmetic to prevent overflow. Fixes: f09c1e6077ab ("drm/amdgpu: generate doorbell index for userqueue") Reported-by: Yuhao Jiang <danisjiang@gmail.com> Signed-off-by: Junrui Luo <moonafterrain@outlook.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit de1ef4ffd70e1d15f0bf584fd22b1f28cbd5e2ec) Cc: stable@vger.kernel.org
10 daysdrm/amdgpu/pm: drop SMU driver if version not matched messagesAlex Deucher
It just leads to user confusion. Cc: Yang Wang <kevinyang.wang@amd.com> Cc: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit e471627d56272a791972f25e467348b611c31713) Cc: stable@vger.kernel.org
10 daysPM: EM: Fix NULL pointer dereference when perf domain ID is not foundChangwoo Min
dev_energymodel_nl_get_perf_domains_doit() calls em_perf_domain_get_by_id() but does not check the return value before passing it to __em_nl_get_pd_size(). When a caller supplies a non-existent perf domain ID, em_perf_domain_get_by_id() returns NULL, and __em_nl_get_pd_size() immediately dereferences pd->cpus (struct offset 0x30), causing a NULL pointer dereference. The sister handler dev_energymodel_nl_get_perf_table_doit() already handles this correctly via __em_nl_get_pd_table_id(), which returns NULL and causes the caller to return -EINVAL. Add the same NULL check in the get-perf-domains do handler. Fixes: 380ff27af25e ("PM: EM: Add dump to get-perf-domains in the EM YNL spec") Reported-by: Yi Lai <yi1.lai@linux.intel.com> Closes: https://lore.kernel.org/lkml/aXiySM79UYfk+ytd@ly-workstation/ Signed-off-by: Changwoo Min <changwoo@igalia.com> Cc: 6.19+ <stable@vger.kernel.org> # 6.19+ [ rjw: Subject and changelog edits ] Link: https://patch.msgid.link/20260329073615.649976-1-changwoo@igalia.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>