| Age | Commit message (Collapse) | Author |
|
Move phylink_replay_link_end() as the last locked operation under
sja1105_static_config_reload(). The purpose is to be able to goto
this step from the error path of intermediate steps (we must call
phylink_replay_link_end()).
sja1105_reload_cbs() notably does not depend on port states or link
speeds. See commit 954ad9bf13c4 ("net: dsa: sja1105: fix bandwidth
discrepancy between tc-cbs software and offload") which has discussed
this issue specifically.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20260304220900.3865120-2-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
XDP multi-buf programs can modify the layout of the XDP buffer when the
program calls bpf_xdp_pull_data() or bpf_xdp_adjust_tail(). The
referenced commit in the fixes tag corrected the assumption in the mlx5
driver that the XDP buffer layout doesn't change during a program
execution. However, this fix introduced another issue: the dropped
fragments still need to be counted on the driver side to avoid page
fragment reference counting issues.
Such issue can be observed with the
test_xdp_native_adjst_tail_shrnk_data selftest when using a payload of
3600 and shrinking by 256 bytes (an upcoming selftest patch): the last
fragment gets released by the XDP code but doesn't get tracked by the
driver. This results in a negative pp_ref_count during page release and
the following splat:
WARNING: include/net/page_pool/helpers.h:297 at mlx5e_page_release_fragmented.isra.0+0x4a/0x50 [mlx5_core], CPU#12: ip/3137
Modules linked in: [...]
CPU: 12 UID: 0 PID: 3137 Comm: ip Not tainted 6.19.0-rc3+ #12 NONE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
RIP: 0010:mlx5e_page_release_fragmented.isra.0+0x4a/0x50 [mlx5_core]
[...]
Call Trace:
<TASK>
mlx5e_dealloc_rx_wqe+0xcb/0x1a0 [mlx5_core]
mlx5e_free_rx_descs+0x7f/0x110 [mlx5_core]
mlx5e_close_rq+0x50/0x60 [mlx5_core]
mlx5e_close_queues+0x36/0x2c0 [mlx5_core]
mlx5e_close_channel+0x1c/0x50 [mlx5_core]
mlx5e_close_channels+0x45/0x80 [mlx5_core]
mlx5e_safe_switch_params+0x1a5/0x230 [mlx5_core]
mlx5e_change_mtu+0xf3/0x2f0 [mlx5_core]
netif_set_mtu_ext+0xf1/0x230
do_setlink.isra.0+0x219/0x1180
rtnl_newlink+0x79f/0xb60
rtnetlink_rcv_msg+0x213/0x3a0
netlink_rcv_skb+0x48/0xf0
netlink_unicast+0x24a/0x350
netlink_sendmsg+0x1ee/0x410
__sock_sendmsg+0x38/0x60
____sys_sendmsg+0x232/0x280
___sys_sendmsg+0x78/0xb0
__sys_sendmsg+0x5f/0xb0
[...]
do_syscall_64+0x57/0xc50
This patch fixes the issue by doing page frag counting on all the
original XDP buffer fragments for all relevant XDP actions (XDP_TX ,
XDP_REDIRECT and XDP_PASS). This is basically reverting to the original
counting before the commit in the fixes tag.
As frag_page is still pointing to the original tail, the nr_frags
parameter to xdp_update_skb_frags_info() needs to be calculated
in a different way to reflect the new nr_frags.
Fixes: afd5ba577c10 ("net/mlx5e: RX, Fix generating skb from non-linear xdp_buff for legacy RQ")
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Amery Hung <ameryhung@gmail.com>
Link: https://patch.msgid.link/20260305142634.1813208-6-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
XDP multi-buf programs can modify the layout of the XDP buffer when the
program calls bpf_xdp_pull_data() or bpf_xdp_adjust_tail(). The
referenced commit in the fixes tag corrected the assumption in the mlx5
driver that the XDP buffer layout doesn't change during a program
execution. However, this fix introduced another issue: the dropped
fragments still need to be counted on the driver side to avoid page
fragment reference counting issues.
The issue was discovered by the drivers/net/xdp.py selftest,
more specifically the test_xdp_native_tx_mb:
- The mlx5 driver allocates a page_pool page and initializes it with
a frag counter of 64 (pp_ref_count=64) and the internal frag counter
to 0.
- The test sends one packet with no payload.
- On RX (mlx5e_skb_from_cqe_mpwrq_nonlinear()), mlx5 configures the XDP
buffer with the packet data starting in the first fragment which is the
page mentioned above.
- The XDP program runs and calls bpf_xdp_pull_data() which moves the
header into the linear part of the XDP buffer. As the packet doesn't
contain more data, the program drops the tail fragment since it no
longer contains any payload (pp_ref_count=63).
- mlx5 device skips counting this fragment. Internal frag counter
remains 0.
- mlx5 releases all 64 fragments of the page but page pp_ref_count is
63 => negative reference counting error.
Resulting splat during the test:
WARNING: CPU: 0 PID: 188225 at ./include/net/page_pool/helpers.h:297 mlx5e_page_release_fragmented.isra.0+0xbd/0xe0 [mlx5_core]
Modules linked in: [...]
CPU: 0 UID: 0 PID: 188225 Comm: ip Not tainted 6.18.0-rc7_for_upstream_min_debug_2025_12_08_11_44 #1 NONE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
RIP: 0010:mlx5e_page_release_fragmented.isra.0+0xbd/0xe0 [mlx5_core]
[...]
Call Trace:
<TASK>
mlx5e_free_rx_mpwqe+0x20a/0x250 [mlx5_core]
mlx5e_dealloc_rx_mpwqe+0x37/0xb0 [mlx5_core]
mlx5e_free_rx_descs+0x11a/0x170 [mlx5_core]
mlx5e_close_rq+0x78/0xa0 [mlx5_core]
mlx5e_close_queues+0x46/0x2a0 [mlx5_core]
mlx5e_close_channel+0x24/0x90 [mlx5_core]
mlx5e_close_channels+0x5d/0xf0 [mlx5_core]
mlx5e_safe_switch_params+0x2ec/0x380 [mlx5_core]
mlx5e_change_mtu+0x11d/0x490 [mlx5_core]
mlx5e_change_nic_mtu+0x19/0x30 [mlx5_core]
netif_set_mtu_ext+0xfc/0x240
do_setlink.isra.0+0x226/0x1100
rtnl_newlink+0x7a9/0xba0
rtnetlink_rcv_msg+0x220/0x3c0
netlink_rcv_skb+0x4b/0xf0
netlink_unicast+0x255/0x380
netlink_sendmsg+0x1f3/0x420
__sock_sendmsg+0x38/0x60
____sys_sendmsg+0x1e8/0x240
___sys_sendmsg+0x7c/0xb0
[...]
__sys_sendmsg+0x5f/0xb0
do_syscall_64+0x55/0xc70
The problem applies for XDP_PASS as well which is handled in a different
code path in the driver.
This patch fixes the issue by doing page frag counting on all the
original XDP buffer fragments for all relevant XDP actions (XDP_TX ,
XDP_REDIRECT and XDP_PASS). This is basically reverting to the original
counting before the commit in the fixes tag.
As frag_page is still pointing to the original tail, the nr_frags
parameter to xdp_update_skb_frags_info() needs to be calculated
in a different way to reflect the new nr_frags.
Fixes: 87bcef158ac1 ("net/mlx5e: RX, Fix generating skb from non-linear xdp_buff for striding RQ")
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Cc: Amery Hung <ameryhung@gmail.com>
Reviewed-by: Nimrod Oren <noren@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260305142634.1813208-5-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In case of a TX error CQE, a recovery flow is triggered,
mlx5e_reset_txqsq_cc_pc() resets dma_fifo_cc to 0 but not dma_fifo_pc,
desyncing the DMA FIFO producer and consumer.
After recovery, the producer pushes new DMA entries at the old
dma_fifo_pc, while the consumer reads from position 0.
This causes us to unmap stale DMA addresses from before the recovery.
The DMA FIFO is a purely software construct with no HW counterpart.
At the point of reset, all WQEs have been flushed so dma_fifo_cc is
already equal to dma_fifo_pc. There is no need to reset either counter,
similar to how skb_fifo pc/cc are untouched.
Remove the 'dma_fifo_cc = 0' reset.
This fixes the following WARNING:
WARNING: CPU: 0 PID: 0 at drivers/iommu/dma-iommu.c:1240 iommu_dma_unmap_page+0x79/0x90
Modules linked in: mlx5_vdpa vringh vdpa bonding mlx5_ib mlx5_vfio_pci ipip mlx5_fwctl tunnel4 mlx5_core ib_ipoib geneve ip6_gre ip_gre gre nf_tables ip6_tunnel rdma_ucm ib_uverbs ib_umad vfio_pci vfio_pci_core act_mirred act_skbedit act_vlan vhost_net vhost tap ip6table_mangle ip6table_nat ip6table_filter ip6_tables iptable_mangle cls_matchall nfnetlink_cttimeout act_gact cls_flower sch_ingress vhost_iotlb iptable_raw tunnel6 vfio_iommu_type1 vfio openvswitch nsh rpcsec_gss_krb5 auth_rpcgss oid_registry xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink iptable_nat nf_nat xt_addrtype br_netfilter overlay zram zsmalloc rpcrdma ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_cm ib_core fuse [last unloaded: nf_tables]
CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.13.0-rc5_for_upstream_min_debug_2024_12_30_21_33 #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
RIP: 0010:iommu_dma_unmap_page+0x79/0x90
Code: 2b 4d 3b 21 72 26 4d 3b 61 08 73 20 49 89 d8 44 89 f9 5b 4c 89 f2 4c 89 e6 48 89 ef 5d 41 5c 41 5d 41 5e 41 5f e9 c7 ae 9e ff <0f> 0b 5b 5d 41 5c 41 5d 41 5e 41 5f c3 66 2e 0f 1f 84 00 00 00 00
Call Trace:
<IRQ>
? __warn+0x7d/0x110
? iommu_dma_unmap_page+0x79/0x90
? report_bug+0x16d/0x180
? handle_bug+0x4f/0x90
? exc_invalid_op+0x14/0x70
? asm_exc_invalid_op+0x16/0x20
? iommu_dma_unmap_page+0x79/0x90
? iommu_dma_unmap_page+0x2e/0x90
dma_unmap_page_attrs+0x10d/0x1b0
mlx5e_tx_wi_dma_unmap+0xbe/0x120 [mlx5_core]
mlx5e_poll_tx_cq+0x16d/0x690 [mlx5_core]
mlx5e_napi_poll+0x8b/0xac0 [mlx5_core]
__napi_poll+0x24/0x190
net_rx_action+0x32a/0x3b0
? mlx5_eq_comp_int+0x7e/0x270 [mlx5_core]
? notifier_call_chain+0x35/0xa0
handle_softirqs+0xc9/0x270
irq_exit_rcu+0x71/0xd0
common_interrupt+0x7f/0xa0
</IRQ>
<TASK>
asm_common_interrupt+0x22/0x40
Fixes: db75373c91b0 ("net/mlx5e: Recover Send Queue (SQ) from error state")
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260305142634.1813208-4-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The check on mlx5_esw_host_functions_enabled(esw->dev) for adding VF
peer miss rules is incorrect. These rules match traffic from peer's VFs,
so the local device's host function status is irrelevant. Remove this
check to ensure peer VF traffic is properly handled regardless of local
host configuration.
Also fix the PF peer miss rule deletion to be symmetric with the add
path, so only attempt to delete the rule if it was actually created.
Fixes: 520369ef43a8 ("net/mlx5: Support disabling host PFs")
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260305142634.1813208-3-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When moving to switchdev mode when the device doesn't support IPsec,
we try to clean up the IPsec resources anyway which causes the crash
below, fix that by correctly checking for IPsec support before trying
to clean up its resources.
[27642.515799] WARNING: arch/x86/mm/fault.c:1276 at
do_user_addr_fault+0x18a/0x680, CPU#4: devlink/6490
[27642.517159] Modules linked in: xt_conntrack xt_MASQUERADE
ip6table_nat ip6table_filter ip6_tables iptable_nat nf_nat xt_addrtype
rpcsec_gss_krb5 auth_rpcgss oid_registry overlay mlx5_fwctl nfnetlink
zram zsmalloc mlx5_ib fuse rpcrdma rdma_ucm ib_uverbs ib_iser libiscsi
scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm mlx5_core
ib_core
[27642.521358] CPU: 4 UID: 0 PID: 6490 Comm: devlink Not tainted
6.19.0-rc5_for_upstream_min_debug_2026_01_14_16_47 #1 NONE
[27642.522923] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[27642.524528] RIP: 0010:do_user_addr_fault+0x18a/0x680
[27642.525362] Code: ff 0f 84 75 03 00 00 48 89 ee 4c 89 e7 e8 5e b9 22
00 49 89 c0 48 85 c0 0f 84 a8 02 00 00 f7 c3 60 80 00 00 74 22 31 c9 eb
ae <0f> 0b 48 83 c4 10 48 89 ea 48 89 de 4c 89 f7 5b 5d 41 5c 41 5d
41
[27642.528166] RSP: 0018:ffff88810770f6b8 EFLAGS: 00010046
[27642.529038] RAX: 0000000000000000 RBX: 0000000000000002 RCX:
ffff88810b980f00
[27642.530158] RDX: 00000000000000a0 RSI: 0000000000000002 RDI:
ffff88810770f728
[27642.531270] RBP: 00000000000000a0 R08: 0000000000000000 R09:
0000000000000000
[27642.532383] R10: 0000000000000000 R11: 0000000000000000 R12:
ffff888103f3c4c0
[27642.533499] R13: 0000000000000000 R14: ffff88810770f728 R15:
0000000000000000
[27642.534614] FS: 00007f197c741740(0000) GS:ffff88856a94c000(0000)
knlGS:0000000000000000
[27642.535915] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[27642.536858] CR2: 00000000000000a0 CR3: 000000011334c003 CR4:
0000000000172eb0
[27642.537982] Call Trace:
[27642.538466] <TASK>
[27642.538907] exc_page_fault+0x76/0x140
[27642.539583] asm_exc_page_fault+0x22/0x30
[27642.540282] RIP: 0010:_raw_spin_lock_irqsave+0x10/0x30
[27642.541134] Code: 07 85 c0 75 11 ba ff 00 00 00 f0 0f b1 17 75 06 b8
01 00 00 00 c3 31 c0 c3 90 0f 1f 44 00 00 53 9c 5b fa 31 c0 ba 01 00 00
00 <f0> 0f b1 17 75 05 48 89 d8 5b c3 89 c6 e8 7e 02 00 00 48 89 d8
5b
[27642.543936] RSP: 0018:ffff88810770f7d8 EFLAGS: 00010046
[27642.544803] RAX: 0000000000000000 RBX: 0000000000000202 RCX:
ffff888113ad96d8
[27642.545916] RDX: 0000000000000001 RSI: ffff88810770f818 RDI:
00000000000000a0
[27642.547027] RBP: 0000000000000098 R08: 0000000000000400 R09:
ffff88810b980f00
[27642.548140] R10: 0000000000000001 R11: ffff888101845a80 R12:
00000000000000a8
[27642.549263] R13: ffffffffa02a9060 R14: 00000000000000a0 R15:
ffff8881130d8a40
[27642.550379] complete_all+0x20/0x90
[27642.551010] mlx5e_ipsec_disable_events+0xb6/0xf0 [mlx5_core]
[27642.552022] mlx5e_nic_disable+0x12d/0x220 [mlx5_core]
[27642.552929] mlx5e_detach_netdev+0x66/0xf0 [mlx5_core]
[27642.553822] mlx5e_netdev_change_profile+0x5b/0x120 [mlx5_core]
[27642.554821] mlx5e_vport_rep_load+0x419/0x590 [mlx5_core]
[27642.555757] ? xa_load+0x53/0x90
[27642.556361] __esw_offloads_load_rep+0x54/0x70 [mlx5_core]
[27642.557328] mlx5_esw_offloads_rep_load+0x45/0xd0 [mlx5_core]
[27642.558320] esw_offloads_enable+0xb4b/0xc90 [mlx5_core]
[27642.559247] mlx5_eswitch_enable_locked+0x34e/0x4f0 [mlx5_core]
[27642.560257] ? mlx5_rescan_drivers_locked+0x222/0x2d0 [mlx5_core]
[27642.561284] mlx5_devlink_eswitch_mode_set+0x5ac/0x9c0 [mlx5_core]
[27642.562334] ? devlink_rate_set_ops_supported+0x21/0x3a0
[27642.563220] devlink_nl_eswitch_set_doit+0x67/0xe0
[27642.564026] genl_family_rcv_msg_doit+0xe0/0x130
[27642.564816] genl_rcv_msg+0x183/0x290
[27642.565466] ? __devlink_nl_pre_doit.isra.0+0x160/0x160
[27642.566329] ? devlink_nl_eswitch_get_doit+0x290/0x290
[27642.567181] ? devlink_nl_pre_doit_parent_dev_optional+0x20/0x20
[27642.568147] ? genl_family_rcv_msg_dumpit+0xf0/0xf0
[27642.568966] netlink_rcv_skb+0x4b/0xf0
[27642.569629] genl_rcv+0x24/0x40
[27642.570215] netlink_unicast+0x255/0x380
[27642.570901] ? __alloc_skb+0xfa/0x1e0
[27642.571560] netlink_sendmsg+0x1f3/0x420
[27642.572249] __sock_sendmsg+0x38/0x60
[27642.572911] __sys_sendto+0x119/0x180
[27642.573561] ? __sys_recvmsg+0x5c/0xb0
[27642.574227] __x64_sys_sendto+0x20/0x30
[27642.574904] do_syscall_64+0x55/0xc10
[27642.575554] entry_SYSCALL_64_after_hwframe+0x4b/0x53
[27642.576391] RIP: 0033:0x7f197c85e807
[27642.577050] Code: c7 c0 ff ff ff ff eb be 66 2e 0f 1f 84 00 00 00 00
00 90 f3 0f 1e fa 80 3d 45 08 0d 00 00 41 89 ca 74 10 b8 2c 00 00 00 0f
05 <48> 3d 00 f0 ff ff 77 69 c3 55 48 89 e5 53 48 83 ec 38 44 89 4d
d0
[27642.579846] RSP: 002b:00007ffebd4e2248 EFLAGS: 00000202 ORIG_RAX:
000000000000002c
[27642.581082] RAX: ffffffffffffffda RBX: 000055cfcd9cd2a0 RCX:
00007f197c85e807
[27642.582200] RDX: 0000000000000038 RSI: 000055cfcd9cd490 RDI:
0000000000000003
[27642.583320] RBP: 00007ffebd4e2290 R08: 00007f197c942200 R09:
000000000000000c
[27642.584437] R10: 0000000000000000 R11: 0000000000000202 R12:
0000000000000000
[27642.585555] R13: 000055cfcd9cd490 R14: 00007ffebd4e45d1 R15:
000055cfcd9cd2a0
[27642.586671] </TASK>
[27642.587121] ---[ end trace 0000000000000000 ]---
[27642.587910] BUG: kernel NULL pointer dereference, address:
00000000000000a0
Fixes: 664f76be38a1 ("net/mlx5: Fix IPsec cleanup over MPV device")
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260305142634.1813208-2-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
esw->work_queue executes esw_functions_changed_event_handler ->
esw_vfs_changed_event_handler and acquires the devlink lock.
.eswitch_mode_set (acquires devlink lock in devlink_nl_pre_doit) ->
mlx5_devlink_eswitch_mode_set -> mlx5_eswitch_disable_locked ->
mlx5_eswitch_event_handler_unregister -> flush_workqueue deadlocks
when esw_vfs_changed_event_handler executes.
Fix that by no longer flushing the work to avoid the deadlock, and using
a generation counter to keep track of work relevance. This avoids an old
handler manipulating an esw that has undergone one or more mode changes:
- the counter is incremented in mlx5_eswitch_event_handler_unregister.
- the counter is read and passed to the ephemeral mlx5_host_work struct.
- the work handler takes the devlink lock and bails out if the current
generation is different than the one it was scheduled to operate on.
- mlx5_eswitch_cleanup does the final draining before destroying the wq.
No longer flushing the workqueue has the side effect of maybe no longer
cancelling pending vport_change_handler work items, but that's ok since
those are disabled elsewhere:
- mlx5_eswitch_disable_locked disables the vport eq notifier.
- mlx5_esw_vport_disable disarms the HW EQ notification and marks
vport->enabled under state_lock to false to prevent pending vport
handler from doing anything.
- mlx5_eswitch_cleanup destroys the workqueue and makes sure all events
are disabled/finished.
Fixes: f1bc646c9a06 ("net/mlx5: Use devl_ API in mlx5_esw_offloads_devlink_port_register")
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260305081019.1811100-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Commit c7159e960f14 ("usbnet: limit max_mtu based on device's hard_mtu")
capped net->max_mtu to the device's hard_mtu in usbnet_probe(). While
this correctly prevents oversized packets on standard USB network
devices, it breaks the qmi_wwan driver.
qmi_wwan relies on userspace (e.g. ModemManager) setting a large MTU on
the wwan0 interface to configure rx_urb_size via usbnet_change_mtu().
QMI modems negotiate USB transfer sizes of 16,383 or 32,767 bytes, and
the USB receive buffers must be sized accordingly. With max_mtu capped
to hard_mtu (~1500 bytes), userspace can no longer raise the MTU, the
receive buffers remain small, and download speeds drop from >300 Mbps
to ~0.8 Mbps.
Introduce a FLAG_NOMAXMTU driver flag that allows individual usbnet
drivers to opt out of the max_mtu cap. Set this flag in qmi_wwan's
driver_info structures to restore the previous behavior for QMI devices,
while keeping the safety fix in place for all other usbnet drivers.
Fixes: c7159e960f14 ("usbnet: limit max_mtu based on device's hard_mtu")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/lkml/CAPh3n803k8JcBPV5qEzUB-oKzWkAs-D5CU7z=Vd_nLRCr5ZqQg@mail.gmail.com/
Reported-by: Koen Vandeputte <koen.vandeputte@citymesh.com>
Tested-by: Daniele Palmas <dnlplm@gmail.com>
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Link: https://patch.msgid.link/20260304134338.1785002-1-lvivier@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Before the fixed commit, we check slave->new_link during commit
state, which values are only BOND_LINK_{NOCHANGE, UP, DOWN}. After
the commit, we start using slave->link_new_state, which state also could
be BOND_LINK_{FAIL, BACK}.
For example, when we set updelay/downdelay, after a failover,
the slave->link_new_state could be set to BOND_LINK_{FAIL, BACK} in
bond_miimon_inspect(). And later in bond_miimon_commit(), it will treat
it as invalid and print an error, which would cause confusion for users.
[ 106.440254] bond0: (slave veth2): link status down for interface, disabling it in 200 ms
[ 106.440265] bond0: (slave veth2): invalid new link 1 on slave
[ 106.648276] bond0: (slave veth2): link status definitely down, disabling slave
[ 107.480271] bond0: (slave veth2): link status up, enabling it in 200 ms
[ 107.480288] bond0: (slave veth2): invalid new link 3 on slave
[ 107.688302] bond0: (slave veth2): link status definitely up, 10000 Mbps full duplex
Let's handle BOND_LINK_{FAIL, BACK} as valid link states.
Fixes: 1899bb325149 ("bonding: fix state transition issue in link monitoring")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20260304-b4-bond_updelay-v1-2-f72eb2e454d0@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
After commit e0caeb24f538 ("net: bonding: update the slave array for broadcast mode"),
broadcast mode will also set all_slaves and usable_slaves during
bond_enslave(). But if we also set updelay, during enslave, the
slave init state will be BOND_LINK_BACK. And later
bond_update_slave_arr() will alloc usable_slaves but add nothing.
This will cause bond_miimon_inspect() to have ignore_updelay
always true. So the updelay will be always ignored. e.g.
[ 6.498368] bond0: (slave veth2): link status definitely down, disabling slave
[ 7.536371] bond0: (slave veth2): link status up, enabling it in 0 ms
[ 7.536402] bond0: (slave veth2): link status definitely up, 10000 Mbps full duplex
To fix it, we can either always call bond_update_slave_arr() on every
place when link changes. Or, let's just not set usable_slaves for
broadcast mode.
Fixes: e0caeb24f538 ("net: bonding: update the slave array for broadcast mode")
Reported-by: Liang Li <liali@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20260304-b4-bond_updelay-v1-1-f72eb2e454d0@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Driver core holds a reference to the USB interface and its parent USB
device while the interface is bound to a driver and there is no need to
take additional references unless the structures are needed after
disconnect.
This driver takes a reference to the USB device during probe but does
not to release it on probe failures.
Drop the redundant device reference to fix the leak, reduce cargo
culting, make it easier to spot drivers where an extra reference is
needed, and reduce the risk of further memory leaks.
Fixes: 0791c0327a6e ("net: mctp: Add MCTP USB transport driver")
Cc: stable@vger.kernel.org # 6.15
Signed-off-by: Johan Hovold <johan@kernel.org>
Acked-by: Jeremy Kerr <jk@codeconstruct.com.au>
Link: https://patch.msgid.link/20260305104549.16110-1-johan@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
I found a few more paths that cleanup fails due to a NULL version pointer
on unsupported hardware.
Add NULL checks as applicable.
Fixes: 39fc2bc4da00 ("drm/amdgpu: Protect GPU register accesses in powergated state in some paths")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit f5a05f8414fc10f307eb965f303580c7778f8dd2)
Cc: stable@vger.kernel.org
|
|
Older versions of the MES firmware may cause abnormal GPU power consumption.
When performing inference tasks on the GPU (e.g., with Ollama using ROCm),
the GPU may show abnormal power consumption in idle state and incorrect GPU load information.
This issue has been fixed in firmware version 0x8b and newer.
Closes: https://github.com/ROCm/ROCm/issues/5706
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 4e22a5fe6ea6e0b057e7f246df4ac3ff8bfbc46a)
|
|
The following members of struct amdgpu_mode_info do not have valid
references in the related kernel-doc sections:
- plane_shaper_lut_property
- plane_shaper_lut_size_property,
- plane_lut3d_size_property
Correct all affected comment blocks.
Fixes: f545d82479b4 ("drm/amd/display: add plane shaper LUT and TF driver-specific properties")
Fixes: 671994e3bf33 ("drm/amd/display: add plane 3D LUT driver-specific properties")
Reviewed-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit ec5708d6e547f7efe2f009073bfa98dbc4c5c2ac)
|
|
When GPU initialization fails due to an unsupported HW block
IP blocks may have a NULL version pointer. During cleanup in
amdgpu_device_fini_hw, the code calls amdgpu_device_set_pg_state and
amdgpu_device_set_cg_state which iterate over all IP blocks and access
adev->ip_blocks[i].version without NULL checks, leading to a kernel
NULL pointer dereference.
Add NULL checks for adev->ip_blocks[i].version in both
amdgpu_device_set_cg_state and amdgpu_device_set_pg_state to prevent
dereferencing NULL pointers during GPU teardown when initialization has
failed.
Fixes: 39fc2bc4da00 ("drm/amdgpu: Protect GPU register accesses in powergated state in some paths")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit b7ac77468cda92eecae560b05f62f997a12fe2f2)
Cc: stable@vger.kernel.org
|
|
add missing od setting PP_OD_FEATURE_ZERO_FAN_BIT for smu v14.0.2/14.0.3
Fixes: 9710b84e2a6a ("drm/amd/pm: add overdrive support on smu v14.0.2/3")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/5018
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 1b5cf07d80bb16d1593579ccdb23f08ea4262c14)
|
|
add missing od setting PP_OD_FEATURE_ZERO_FAN_BIT for smu v13.0.0/13.0.7
Fixes: cfffd980bf21 ("drm/amd/pm: add zero RPM OD setting support for SMU13")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/5018
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 576a10797b607ee9e4068218daf367b481564120)
|
|
Pull drm fixes from Dave Airlie:
"Weekly fixes pull.
There is one mm fix in here for a HMM livelock triggered by the xe
driver tests. Otherwise it's a pretty wide range of fixes across the
board, ttm UAF regression fix, amdgpu fixes, nouveau doesn't crash my
laptop anymore fix, and a fair bit of misc.
Seems about right for rc3.
mm:
- mm: Fix a hmm_range_fault() livelock / starvation problem
pagemap:
- Revert "drm/pagemap: Disable device-to-device migration"
ttm:
- fix function return breaking reclaim
- fix build failure on PREEMPT_RT
- fix bo->resource UAF
dma-buf:
- include ioctl.h in uapi header
sched:
- fix kernel doc warning
amdgpu:
- LUT fixes
- VCN5 fix
- Dispclk fix
- SMU 13.x fix
- Fix race in VM acquire
- PSP 15.x fix
- UserQ fix
amdxdna:
- fix invalid payload for failed command
- fix NULL ptr dereference
- fix major fw version check
- avoid inconsistent fw state on error
i915/display:
- Fix for Lenovo T14 G7 display not refreshing
xe:
- Do not preempt fence signaling CS instructions
- Some leak and finalization fixes
- Workaround fix
nouveau:
- avoid runtime suspend oops when using dp aux
panthor:
- fix gem_sync argument ordering
solomon:
- fix incorrect display output
renesas:
- fix DSI divider programming
ethosu:
- fix job submit error clean-up refcount
- fix NPU_OP_ELEMENTWISE validation
- handle possible underflows in IFM size calcs"
* tag 'drm-fixes-2026-03-07' of https://gitlab.freedesktop.org/drm/kernel: (38 commits)
accel: ethosu: Handle possible underflow in IFM size calculations
accel: ethosu: Fix NPU_OP_ELEMENTWISE validation with scalar
accel: ethosu: Fix job submit error clean-up refcount underflows
accel/amdxdna: Split mailbox channel create function
drm/panthor: Correct the order of arguments passed to gem_sync
Revert "drm/syncobj: Fix handle <-> fd ioctls with dirty stack"
drm/ttm: Fix bo resource use-after-free
nouveau/dpcd: return EBUSY for aux xfer if the device is asleep
accel/amdxdna: Fix major version check on NPU1 platform
drm/amdgpu/userq: refcount userqueues to avoid any race conditions
drm/amdgpu/userq: Consolidate wait ioctl exit path
drm/amdgpu/psp: Use Indirect access address for GFX to PSP mailbox
drm/amdgpu: Fix use-after-free race in VM acquire
drm/amd/pm: remove invalid gpu_metrics.energy_accumulator on smu v13.0.x
drm/xe: Fix memory leak in xe_vm_madvise_ioctl
drm/xe/reg_sr: Fix leak on xa_store failure
drm/xe/xe2_hpg: Correct implementation of Wa_16025250150
drm/xe/gsc: Fix GSC proxy cleanup on early initialization failure
Revert "drm/pagemap: Disable device-to-device migration"
drm/i915/psr: Fix for Panel Replay X granularity DPCD register handling
...
|
|
Recently, we round up new_hdisplay once at most, for bonded dsi, we
may need twice, since they are independent links, we should round up
each half separately. This also aligns with the hdisplay we program
later in dsi_timing_setup()
Example:
full_hdisplay = 1904, dsc_bpp = 8, bpc = 8
new_full_hdisplay = DIV_ROUND_UP(1904 * 8, 8 * 3) = 635
if we use half display
new_half_hdisplay = DIV_ROUND_UP(952 * 8, 8 * 3) = 318
new_full_display = 636
Fixes: 7c9e4a554d4a ("drm/msm/dsi: Reduce pclk rate for compression")
Signed-off-by: Pengyu Luo <mitltlatltl@gmail.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Patchwork: https://patchwork.freedesktop.org/patch/709716/
Link: https://lore.kernel.org/r/20260306163255.215456-1-mitltlatltl@gmail.com
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fix from Mark Brown:
"One device specific fix here, it was possible we might end up trying
to dereference an invalid pointer while reporting a transfer timeout
on DesignWare controllers"
* tag 'spi-fix-v7.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: spi-dw-dma: fix print error log when wait finish transaction
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Pull regulator fixes from Mark Brown:
"A couple of small, driver specific fixes which might not even have
much impact if you have the affected devices depending on your setup"
* tag 'regulator-fix-v7.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: pf9453: Respect IRQ trigger settings from firmware
regulator: mt6363: Fix incorrect and redundant IRQ disposal in probe
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid
Pull HID fixes from Benjamin Tissoires:
- fix a few memory leaks (Günther Noack)
- fix potential kernel crashes in cmedia, creative-sb0540 and zydacron
(Greg Kroah-Hartman)
- fix NULL pointer dereference in pidff (Tomasz Pakuła)
- fix battery reporting for Apple Magic Trackpad 2 (Julius Lehmann)
- mcp2221 proper handling of failed read operation (Romain Sioen)
- various device quirks / device ID additions
* tag 'hid-for-linus-2026030601' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
HID: mcp2221: cancel last I2C command on read error
HID: asus: add xg mobile 2023 external hardware support
HID: multitouch: Keep latency normal on deactivate for reactivation gesture
HID: apple: Add EPOMAKER TH87 to the non-apple keyboards list
HID: intel-ish-hid: ipc: Add Nova Lake-H/S PCI device IDs
selftests: hid: tests: test_wacom_generic: add tests for display devices and opaque devices
HID: multitouch: new class MT_CLS_EGALAX_P80H84
HID: magicmouse: fix battery reporting for Apple Magic Trackpad 2
HID: pidff: Fix condition effect bit clearing
HID: Add HID_CLAIMED_INPUT guards in raw_event callbacks missing them
HID: asus: avoid memory leak in asus_report_fixup()
HID: magicmouse: avoid memory leak in magicmouse_report_fixup()
HID: apple: avoid memory leak in apple_report_fixup()
HID: Document memory allocation properties of report_fixup()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver fixes from Ilpo Järvinen:
- alienware-wmi-wmax: Add G-Mode support to m18 laptops
- asus-armoury: Add support for FA401UM, G733QS, GX650RX
- dell-wmi-sysman: Don't hex dump plaintext password data
- hp-bioscfg: Support large number of enumeration attributes
- hp-wmi: Add support for Omen 14-fb1xxx, 16-xd0xxx, 16-wf0xxx, and
Victus-d0xxx
- int3472: Handle GPIO type 0x10 (DOVDD)
- intel-hid:
- Add Dell 14 & 16 Plus 2-in-1 to dmi_vgbs_allow_list
- Enable 5-button array on ThinkPad X1 Fold 16 Gen 1
- mellanox: mlxreg: Fix kernel-doc warnings
- oxpec: Add support for OneXPlayer X1 Air, X1z, APEX, and Aokzoe A2
Pro
- redmi-wmi: Add more Fn hotkey mappings
- thinkpad_acpi: Fix errors reading battery thresholds
- touchscreen_dmi: Add quirk for y-inverted Goodix touchscreen on SUPI
S10
- uniwill-laptop:
- FN lock/super key lock attributes rename
- Fix crash on unexpected battery event
- A special key combination can alter FN lock status so mark it
volatile
- Handle FN lock event
* tag 'platform-drivers-x86-v7.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: (27 commits)
platform/x86: dell-wmi-sysman: Don't hex dump plaintext password data
platform_data/mlxreg: mlxreg.h: fix all kernel-doc warnings
platform/x86: asus-armoury: add support for FA401UM
platform/x86: asus-armoury: add support for GX650RX
platform/x86: hp-bioscfg: Support allocations of larger data
platform/x86: oxpec: Add support for Aokzoe A2 Pro
platform/x86: oxpec: Add support for OneXPlayer X1 Air
platform/x86: oxpec: Add support for OneXPlayer X1z
platform/x86: oxpec: Add support for OneXPlayer APEX
platform/x86: uniwill-laptop: Handle FN lock event
platform/x86: uniwill-laptop: Mark FN lock status as being volatile
platform/x86: uniwill-laptop: Fix crash on unexpected battery event
platform/x86: uniwill-laptop: Rename FN lock and super key lock attrs
platform/x86: redmi-wmi: Add more hotkey mappings
platform/x86: alienware-wmi-wmax: Add G-Mode support to m18 laptops
platform/x86: hp-wmi: add Omen 14-fb1xxx (board 8E41) support
platform/x86: dell-wmi: Add audio/mic mute key codes
platform/x86: hp-wmi: Add Victus 16-d0xxx support
platform/x86: intel-hid: Enable 5-button array on ThinkPad X1 Fold 16 Gen 1
platform/x86: int3472: Handle GPIO type 0x10 (DOVDD)
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm
Pull pmdomain fixes from Ulf Hansson:
- rockchip: Fix PD_VCODEC for RK3588
- bcm: Fix broken reset status read for bcm2835
* tag 'pmdomain-v7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm:
pmdomain: rockchip: Fix PD_VCODEC for RK3588
pmdomain: bcm: bcm2835-power: Fix broken reset status read
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
- Fix use-after-free in ccp
- Fix bug when SEV is disabled in ccp
- Fix tfm_count leak in atmel-sha204a
* tag 'v7.0-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: atmel-sha204a - Fix OOM ->tfm_count leak
crypto: ccp - Fix use-after-free on error path
crypto: ccp - allow callers to use HV-Fixed page API when SEV is disabled
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux
Pull ata fixes from Niklas Cassel:
- Fix a problem where the deferred non-NCQ command would incorrectly
get completed as a failed command, if there was another command that
timed out. Found by Gemini (Guenter)
- The deferred non-NCQ command work is only supposed to run after the
last NCQ command finishes. However, because the work was never
canceled on error (e.g. a timeout), the work could incorrectly run
when commands were still in flight. Found by syzbot (me)
- Add a quirk to make sure that QEMU harddrives can potentially use up
to 32 MiB I/Os (Pedro)
- Add a quirk to disable LPM on Seagate ST1000DM010-2EP102 (Maximilian)
* tag 'ata-7.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux:
ata: libata-eh: Fix detection of deferred qc timeouts
ata: libata-core: Add BRIDGE_OK quirk for QEMU drives
ata: libata: cancel pending work after clearing deferred_qc
ata: libata-core: Disable LPM on ST1000DM010-2EP102
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux
Pull block fixes from Jens Axboe:
- NVMe pull request via Keith:
- Improve quirk visibility and configurability (Maurizio)
- Fix runtime user modification to queue setup (Keith)
- Fix multipath leak on try_module_get failure (Keith)
- Ignore ambiguous spec definitions for better atomics support
(John)
- Fix admin queue leak on controller reset (Ming)
- Fix large allocation in persistent reservation read keys
(Sungwoo Kim)
- Fix fcloop callback handling (Justin)
- Securely free DHCHAP secrets (Daniel)
- Various cleanups and typo fixes (John, Wilfred)
- Avoid a circular lock dependency issue in the sysfs nr_requests or
scheduler store handling
- Fix a circular lock dependency with the pcpu mutex and the queue
freeze lock
- Cleanup for bio_copy_kern(), using __bio_add_page() rather than the
bio_add_page(), as adding a page here cannot fail. The exiting code
had broken cleanup for the error condition, so make it clear that the
error condition cannot happen
- Fix for a __this_cpu_read() in preemptible context splat
* tag 'block-7.0-20260305' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
block: use trylock to avoid lockdep circular dependency in sysfs
nvme: fix memory allocation in nvme_pr_read_keys()
block: use __bio_add_page in bio_copy_kern
block: break pcpu_alloc_mutex dependency on freeze_lock
blktrace: fix __this_cpu_read/write in preemptible context
nvme-multipath: fix leak on try_module_get failure
nvmet-fcloop: Check remoteport port_state before calling done callback
nvme-pci: do not try to add queue maps at runtime
nvme-pci: cap queue creation to used queues
nvme-pci: ensure we're polling a polled queue
nvme: fix memory leak in quirks_param_set()
nvme: correct comment about nvme_ns_remove()
nvme: stop setting namespace gendisk device driver data
nvme: add support for dynamic quirk configuration via module parameter
nvme: fix admin queue leak on controller reset
nvme-fabrics: use kfree_sensitive() for DHCHAP secrets
nvme: stop using AWUPF
nvme: expose active quirks in sysfs
nvme/host: fixup some typos
|
|
According to the FF-A specification (DEN0077, v1.1, §13.7), when
FFA_RXTX_UNMAP is invoked from any instance other than non-secure
physical, the w1 register must be zero (MBZ). If a non-zero value is
supplied in this context, the SPMC must return FFA_INVALID_PARAMETER.
The Arm FF-A driver operates exclusively as a guest or non-secure
physical instance where the partition ID is always zero and is not
invoked from a hypervisor context where w1 carries a VM ID. In this
execution model, the partition ID observed by the driver is always zero,
and passing a VM ID is unnecessary and potentially invalid.
Remove the vm_id parameter from ffa_rxtx_unmap() and ensure that the
SMC call is issued with w1 implicitly zeroed, as required by the
specification. This prevents invalid parameter errors and aligns the
implementation with the defined FF-A ABI behavior.
Fixes: 3bbfe9871005 ("firmware: arm_ffa: Add initial Arm FFA driver support")
Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
Message-Id: <20260304120953.847671-1-yeoreum.yun@arm.com>
Signed-off-by: Sudeep Holla <sudeep.holla@kernel.org>
|
|
Commit e7e222ad73d9 ("cxl: Move devm_cxl_add_nvdimm_bridge() to
cxl_pmem.ko") moves devm_cxl_add_nvdimm_bridge() into the cxl_pmem file,
which has independent config compile options for built-in or module. The
call from cxl_acpi_probe() is guarded by IS_ENABLED(CONFIG_CXL_PMEM),
which evaluates to true for both =y and =m.
When CONFIG_CXL_PMEM=m, a built-in cxl_acpi attempts to reference a
symbol exported by a module, which fails to link. CXL_PMEM cannot simply
be promoted to =y in this configuration because it depends on LIBNVDIMM,
which may itself be =m.
Add a Kconfig dependency to prevent CXL_ACPI from being built-in when
CXL_PMEM is a module. This contrains CXL_ACPI to =m when CXL_PMEM=m,
while still allowing CXL_ACPI to be freely configured when CXL_PMEM is
either built-in or disabled.
[ dj: Fix up commit reference formatting. ]
Fixes: e7e222ad73d9 ("cxl: Move devm_cxl_add_nvdimm_bridge() to cxl_pmem.ko")
Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Link: https://patch.msgid.link/20260305204057.1516948-1-kbusch@meta.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
|
|
The rzt2h_gpio_get_direction() function is called from
gpiod_get_direction(), which ends up being used within the __setup_irq()
call stack when requesting an interrupt.
__setup_irq() holds a raw_spinlock_t with IRQs disabled, which creates
an atomic context. spinlock_t cannot be used within atomic context
when PREEMPT_RT is enabled, since it may become a sleeping lock.
An "[ BUG: Invalid wait context ]" splat is observed when running with
CONFIG_PROVE_LOCKING enabled, describing exactly the aforementioned call
stack.
__setup_irq() needs to hold a raw_spinlock_t with IRQs disabled to
serialize access against a concurrent hard interrupt.
Switch to raw_spinlock_t to fix this.
Fixes: 829dde3369a9 ("pinctrl: renesas: rzt2h: Add GPIO IRQ chip to handle interrupts")
Signed-off-by: Cosmin Tanislav <cosmin-gabriel.tanislav.xa@renesas.com>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/20260205103930.666051-1-cosmin-gabriel.tanislav.xa@renesas.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
|
|
When calling of_parse_phandle_with_fixed_args(), the caller is
responsible for calling of_node_put() to release the device node
reference.
In rzt2h_gpio_register(), the driver fails to call of_node_put() to
release the reference in of_args.np, which causes a memory leak.
Add the missing of_node_put() call to fix the leak.
Fixes: 34d4d093077a ("pinctrl: renesas: Add support for RZ/T2H")
Signed-off-by: Felix Gu <ustc.gu@gmail.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/20260127-rzt2h-v1-1-86472e7421b8@gmail.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
|
|
ublk_ctrl_set_size() unconditionally dereferences ub->ub_disk via
set_capacity_and_notify() without checking if it is NULL.
ub->ub_disk is NULL before UBLK_CMD_START_DEV completes (it is only
assigned in ublk_ctrl_start_dev()) and after UBLK_CMD_STOP_DEV runs
(ublk_detach_disk() sets it to NULL). Since the UBLK_CMD_UPDATE_SIZE
handler performs no state validation, a user can trigger a NULL pointer
dereference by sending UPDATE_SIZE to a device that has been added but
not yet started, or one that has been stopped.
Fix this by checking ub->ub_disk under ub->mutex before dereferencing
it, and returning -ENODEV if the disk is not available.
Fixes: 98b995660bff ("ublk: Add UBLK_U_CMD_UPDATE_SIZE")
Cc: stable@vger.kernel.org
Signed-off-by: Mehul Rao <mehulrao@gmail.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Since the recent additions to PMSR capabilities, it's no longer
sufficient to call parse_pmsr_capa() here since the capabilities
that were added aren't represented/filled by it. Always init the
data to zero to avoid using uninitialized memory.
Fixes: 86c6b6e4d187 ("wifi: nl80211/cfg80211: add new FTM capabilities")
Reported-by: syzbot+c686c6b197d10ff3a749@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/69a67aa3.a70a0220.b118c.000a.GAE@google.com/
Link: https://patch.msgid.link/20260303113739.176403-2-johannes@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes
Cross-subsystem Changes:
- mm: Fix a hmm_range_fault() livelock / starvation problem (Thomas)
Core Changes:
- Revert "drm/pagemap: Disable device-to-device migration" (Thomas)
Driver Changes:
- Do not preempt fence signaling CS instructions (Brost)
- Some leak and finalization fixes (Shuicheng, Tomasz, Varun, Zhanjun)
- Workaround fix (Roper)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patch.msgid.link/aamGvvGRBRtX8-6u@intel.com
|
|
https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes
Another early drm-misc-fixes PR to revert the previous uapi fix sent in
drm-misc-fixes-2026-03-05, together with a UAF fix in TTM, an argument
order fix for panthor, a fix for the firmware getting stuck on
resource allocation error handling for amdxdna, and a few fixes for
ethosu (size calculation and reference underflows, and a validation
fix).
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maxime Ripard <mripard@redhat.com>
Link: https://patch.msgid.link/20260306-grumpy-pegasus-of-witchcraft-6bd2db@houat
|
|
https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes
A return type fix for ttm, a display fix for solomon, several misc fixes
for amdxdna, a DSI clock rate fix for rz-du, a uapi fix for syncobj, a
possible build failure fix for dma-buf, a doc warning fix for sched, a
build failure fix for ttm tests, and a crash fix when suspended for
nouveau.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maxime Ripard <mripard@redhat.com>
Link: https://patch.msgid.link/20260305-ludicrous-quirky-raven-7cdafd@houat
|
|
If the ata_qc_for_each_raw() loop finishes without finding a matching SCSI
command for any QC, the variable qc will hold a pointer to the last element
examined, which has the tag i == ATA_MAX_QUEUE - 1. This qc can match the
port deferred QC (ap->deferred_qc).
If that happens, the condition qc == ap->deferred_qc evaluates to true
despite the loop not breaking with a match on the SCSI command for this QC.
In that case, the error handler mistakenly intercepts a command that has
not been issued yet and that has not timed out, and thus erroneously
returning a timeout error.
Fix the problem by checking for i < ATA_MAX_QUEUE in addition to
qc == ap->deferred_qc.
The problem was found by an experimental code review agent based on
gemini-3.1-pro while reviewing backports into v6.18.y.
Assisted-by: Gemini:gemini-3.1-pro
Fixes: eddb98ad9364 ("ata: libata-eh: correctly handle deferred qc timeouts")
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
[cassel: modified commit log as suggested by Damien]
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Niklas Cassel <cassel@kernel.org>
|
|
https://gitlab.freedesktop.org/drm/i915/kernel into drm-fixes
- Fix for #7284: Lenovo T14 G7 display not refreshing
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patch.msgid.link/aakz17Jx3Ye9Vqci@jlahtine-mobl
|
|
rx_packets should report the number of frames successfully received:
unicast + multicast + broadcast. Subtracting ifOutDiscards (a TX
counter) is incorrect and can undercount RX packets. RX drops are
already reported via rx_dropped (e.g. etherStatsDropEvents), so
there is no need to adjust rx_packets.
This patch removes the subtraction of ifOutDiscards from rx_packets
in rtl8365mb_stats_update().
Link: https://lore.kernel.org/netdev/878777925.105015.1763423928520@mail.yahoo.com/
Fixes: 4af2950c50c8 ("net: dsa: realtek-smi: add rtl8365mb subdriver for RTL8365MB-VC")
Signed-off-by: Mieczyslaw Nalewaj <namiltd@yahoo.com>
Signed-off-by: Luiz Angelo Daros de Luca <luizluca@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Acked-by: Linus Walleij <linusw@kernel.org>
Link: https://patch.msgid.link/20260303-realtek_namiltd_fix2-v1-1-bfa433d3401e@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The hardware revision for TSMC 3nm-based Qualcomm SOCs should be 7.2,
this can be confirmed from REG_DSI_7nm_PHY_CMN_REVISION_ID0, the value
is 0x27, which means hardware revision is 7.2
No functional change.
Fixes: 1337d7ebfb6d ("drm/msm/dsi/phy: Add support for SM8750")
Suggested-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Signed-off-by: Pengyu Luo <mitltlatltl@gmail.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Patchwork: https://patchwork.freedesktop.org/patch/707414/
Link: https://lore.kernel.org/r/20260226122958.22555-2-mitltlatltl@gmail.com
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
|
|
The intr_underrun and intr_vsync indices have been swapped, just simply
corrects them.
Cc: stable@vger.kernel.org
Fixes: b139c80d181c ("drm/msm/dpu: Add SA8775P support")
Signed-off-by: Abhinav Kumar <quic_abhinavk@quicinc.com>
Signed-off-by: Yongxing Mou <yongxing.mou@oss.qualcomm.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Patchwork: https://patchwork.freedesktop.org/patch/709209/
Link: https://lore.kernel.org/r/20260305-mdss_catalog-v5-2-06678ac39ac7@oss.qualcomm.com
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
|
|
To disable l2/l3 swizzling in A8x, set the respective bits in both
GRAS_NC_MODE_CNTL and RB_CCU_NC_MODE_CNTL registers. This is required
for Glymur where it is recommended to keep l2/l3 swizzling disabled.
Fixes: 288a93200892 ("drm/msm/adreno: Introduce A8x GPU Support")
Signed-off-by: Akhil P Oommen <akhilpo@oss.qualcomm.com>
Message-ID: <20260305-a8xx-ubwc-fix-v1-1-d99b6da4c5a9@oss.qualcomm.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
|
|
If the command stream has larger padding sizes than the IFM and OFM
diminsions, then the calculations will underflow to a negative value.
The result is a very large region bounds which is caught on submit, but
it's better to catch it earlier.
Current mesa ethosu driver has a signedness bug which resulted in
padding of 127 (the max) and triggers this issue.
Reviewed-and-Tested-by: Anders Roxell <anders.roxell@linaro.org>
Link: https://patch.msgid.link/20260218-ethos-fixes-v1-3-be3fa3ea9a30@kernel.org
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
|
|
The NPU_OP_ELEMENTWISE instruction uses a scalar value for IFM2 if the
IFM2_BROADCAST "scalar" mode is set. It is a bit (7) on the u65 and
part of a field (bits 3:0) on the u85. The driver was hardcoded to the
u85.
Fixes: 5a5e9c0228e6 ("accel: Add Arm Ethos-U NPU driver")
Reviewed-and-Tested-by: Anders Roxell <anders.roxell@linaro.org>
Link: https://patch.msgid.link/20260218-ethos-fixes-v1-2-be3fa3ea9a30@kernel.org
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
|
|
If the job submit fails before adding the job to the scheduler queue
such as when the GEM buffer bounds checks fail, then doing a
ethosu_job_put() results in a pm_runtime_put_autosuspend() without the
corresponding pm_runtime_resume_and_get(). The dma_fence_put()'s are
also unnecessary, but seem to be harmless.
Split the ethosu_job_cleanup() function into 2 parts for the before
and after the job is queued.
Fixes: 5a5e9c0228e6 ("accel: Add Arm Ethos-U NPU driver")
Reviewed-and-Tested-by: Anders Roxell <anders.roxell@linaro.org>
Link: https://patch.msgid.link/20260218-ethos-fixes-v1-1-be3fa3ea9a30@kernel.org
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI support fixes from Rafael Wysocki:
- Revert a commit related to ACPI device power management that was
not supposed to make any functional difference, but it did so and
introduced a regression (Rafael Wysocki)
- Update the _CPC object definition in ACPICA to match ACPI 6.6 and
prevent the kernel from printing a false-positive warning regarding
_CPC output package format on platforms shipping with firmware based
on ACPI 6.6 (Saket Dumbre)
* tag 'acpi-7.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
Revert "ACPI: PM: Let acpi_dev_pm_attach() skip devices without ACPI PM"
ACPICA: Update the _CPC definition to match ACPI 6.6
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from CAN, netfilter and wireless.
Current release - new code bugs:
- sched: cake: fixup cake_mq rate adjustment for diffserv config
- wifi: fix missing ieee80211_eml_params member initialization
Previous releases - regressions:
- tcp: give up on stronger sk_rcvbuf checks (for now)
Previous releases - always broken:
- net: fix rcu_tasks stall in threaded busypoll
- sched:
- fq: clear q->band_pkt_count[] in fq_reset()
- only allow act_ct to bind to clsact/ingress qdiscs and shared
blocks
- bridge: check relevant per-VLAN options in VLAN range grouping
- xsk: fix fragment node deletion to prevent buffer leak
Misc:
- spring cleanup of inactive maintainers"
* tag 'net-7.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (138 commits)
xdp: produce a warning when calculated tailroom is negative
net: enetc: use truesize as XDP RxQ info frag_size
libeth, idpf: use truesize as XDP RxQ info frag_size
i40e: use xdp.frame_sz as XDP RxQ info frag_size
i40e: fix registering XDP RxQ info
ice: change XDP RxQ frag_size from DMA write length to xdp.frame_sz
ice: fix rxq info registering in mbuf packets
xsk: introduce helper to determine rxq->frag_size
xdp: use modulo operation to calculate XDP frag tailroom
selftests/tc-testing: Add tests exercising act_ife metalist replace behaviour
net/sched: act_ife: Fix metalist update behavior
selftests: net: add test for IPv4 route with loopback IPv6 nexthop
net: ipv6: fix panic when IPv4 route references loopback IPv6 nexthop
net: vxlan: fix nd_tbl NULL dereference when IPv6 is disabled
net: bridge: fix nd_tbl NULL dereference when IPv6 is disabled
MAINTAINERS: remove Thomas Falcon from IBM ibmvnic
MAINTAINERS: remove Claudiu Manoil and Alexandre Belloni from Ocelot switch
MAINTAINERS: replace Taras Chornyi with Elad Nachman for Marvell Prestera
MAINTAINERS: remove Jonathan Lemon from OpenCompute PTP
MAINTAINERS: replace Clark Wang with Frank Li for Freescale FEC
...
|
|
Merge a fix updating the _CPC object definition in ACPICA to avoid
printing a false-positive output package format warning on new
platforms (Saket Dumbre)
* acpica:
ACPICA: Update the _CPC definition to match ACPI 6.6
|
|
The management channel used for firmware control command submission is
currently created after the firmware is started. If channel creation
fails (for example, due to memory allocation failure or workqueue
creation interruption), the firmware remains in a pending state and is
unable to receive any control commands.
To avoid leaving the firmware in this inconsistent state, split
xdna_mailbox_create_channel() into two separate functions so that
resource allocation can be completed before interacting with the
hardware.
xdna_mailbox_alloc_channel()
Allocates memory and initializes the workqueue. This can be called
earlier, before interacting with the hardware.
xdna_mailbox_start_channel()
Performs the hardware interaction required to start the channel.
Rename xdna_mailbox_destroy_channel() to xdna_mailbox_free_channel().
Ensure that xdna_mailbox_stop_channel() and xdna_mailbox_free_channel()
properly unwind the corresponding start and allocation steps, respectively.
Fixes: b87f920b9344 ("accel/amdxdna: Support hardware mailbox")
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Link: https://patch.msgid.link/20260305062041.3954024-1-lizhi.hou@amd.com
|
|
Smatch reports unreachable code in imx_rproc_prepare(), where an early
return inside the reserved-memory parsing loop prevents platform
prepare_ops from being executed.
When of_reserved_mem_region_to_resource() fails, imx_rproc_prepare()
returns immediately, so the platform-specific prepare callback is never
called. As a result, prepare_ops such as imx_rproc_sm_lmm_prepare() on
i.MX95 have no chance to run.
This is problematic when Linux controls the M7 Logical Machine and is
responsible for preparing resources such as TCM. Without running the
platform prepare callback, loading the M7 ELF into TCM may fail if the
bootloader did not power up and initialize TCM.
Fix this by breaking out of the reserved-memory loop instead of
returning, allowing the platform prepare_ops to be executed as intended.
Fixes: edd2a9956055 ("remoteproc: imx_rproc: Introduce prepare ops for imx_rproc_dcfg")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/linux-remoteproc/aYYXAa2Fj36XG4yQ@p14s/T/#t
Signed-off-by: Peng Fan <peng.fan@nxp.com>
Reviewed-by: Daniel Baluta <daniel.baluta@nxp.com>
Link: https://lore.kernel.org/r/20260208-imx-rproc-fix-v1-1-ad74555eb9a4@nxp.com
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
|