summaryrefslogtreecommitdiff
path: root/drivers/net
AgeCommit message (Collapse)Author
4 daysMerge tag 'net-7.0-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from CAN and netfilter. Current release - regressions: - eth: mana: Null service_wq on setup error to prevent double destroy Previous releases - regressions: - nexthop: fix percpu use-after-free in remove_nh_grp_entry - sched: teql: fix NULL pointer dereference in iptunnel_xmit on TEQL slave xmit - bpf: fix nd_tbl NULL dereference when IPv6 is disabled - neighbour: restore protocol != 0 check in pneigh update - tipc: fix divide-by-zero in tipc_sk_filter_connect() - eth: - mlx5: - fix crash when moving to switchdev mode - fix DMA FIFO desync on error CQE SQ recovery - iavf: fix PTP use-after-free during reset - bonding: fix type confusion in bond_setup_by_slave() - lan78xx: fix WARN in __netif_napi_del_locked on disconnect Previous releases - always broken: - core: add xmit recursion limit to tunnel xmit functions - net-shapers: don't free reply skb after genlmsg_reply() - netfilter: - fix stack out-of-bounds read in pipapo_drop() - fix OOB read in nfnl_cthelper_dump_table() - mctp: - fix device leak on probe failure - i2c: fix skb memory leak in receive path - can: keep the max bitrate error at 5% - eth: - bonding: fix nd_tbl NULL dereference when IPv6 is disabled - bnxt_en: fix RSS table size check when changing ethtool channels - amd-xgbe: prevent CRC errors during RX adaptation with AN disabled - octeontx2-af: devlink: fix NIX RAS reporter recovery condition" * tag 'net-7.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (71 commits) net: prevent NULL deref in ip[6]tunnel_xmit() octeontx2-af: devlink: fix NIX RAS reporter to use RAS interrupt status octeontx2-af: devlink: fix NIX RAS reporter recovery condition net: ethernet: ti: am65-cpsw-nuss: Fix rx_filter value for PTP support net/mana: Null service_wq on setup error to prevent double destroy selftests: rtnetlink: add neighbour update test neighbour: restore protocol != 0 check in pneigh update net: dsa: realtek: Fix LED group port bit for non-zero LED group tipc: fix divide-by-zero in tipc_sk_filter_connect() net: dsa: microchip: Fix error path in PTP IRQ setup bpf: bpf_out_neigh_v6: Fix nd_tbl NULL dereference when IPv6 is disabled bpf: bpf_out_neigh_v4: Fix nd_tbl NULL dereference when IPv6 is disabled net: bonding: Fix nd_tbl NULL dereference when IPv6 is disabled ipv6: move the disable_ipv6_mod knob to core code net: bcmgenet: fix broken EEE by converting to phylib-managed state net-shapers: don't free reply skb after genlmsg_reply() net: dsa: mxl862xx: don't set user_mii_bus net: ethernet: arc: emac: quiesce interrupts before requesting IRQ page_pool: store detach_time as ktime_t to avoid false-negatives net: macb: Shuffle the tx ring before enabling tx ...
4 daysocteontx2-af: devlink: fix NIX RAS reporter to use RAS interrupt statusAlok Tiwari
The NIX RAS health report path uses nix_af_rvu_err when handling the NIX_AF_RVU_RAS case, so the report prints the ERR interrupt status rather than the RAS interrupt status. Use nix_af_rvu_ras for the NIX_AF_RVU_RAS report. Fixes: 5ed66306eab6 ("octeontx2-af: Add devlink health reporters for NIX") Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com> Link: https://patch.msgid.link/20260310184824.1183651-2-alok.a.tiwari@oracle.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 daysocteontx2-af: devlink: fix NIX RAS reporter recovery conditionAlok Tiwari
The NIX RAS health reporter recovery routine checks nix_af_rvu_int to decide whether to re-enable NIX_AF_RAS interrupts. This is the RVU interrupt status field and is unrelated to RAS events, so the recovery flow may incorrectly skip re-enabling NIX_AF_RAS interrupts. Check nix_af_rvu_ras instead before writing NIX_AF_RAS_ENA_W1S. Fixes: 5ed66306eab6 ("octeontx2-af: Add devlink health reporters for NIX") Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com> Link: https://patch.msgid.link/20260310184824.1183651-1-alok.a.tiwari@oracle.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 daysnet: ethernet: ti: am65-cpsw-nuss: Fix rx_filter value for PTP supportChintan Vankar
The "rx_filter" member of "hwtstamp_config" structure is an enum field and does not support bitwise OR combination of multiple filter values. It causes error while linuxptp application tries to match rx filter version. Fix this by storing the requested filter type in a new port field. Fixes: 97248adb5a3b ("net: ti: am65-cpsw: Update hw timestamping filter for PTPv1 RX packets") Signed-off-by: Chintan Vankar <c-vankar@ti.com> Link: https://patch.msgid.link/20260310160940.109822-1-c-vankar@ti.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 daysnet/mana: Null service_wq on setup error to prevent double destroyShiraz Saleem
In mana_gd_setup() error path, set gc->service_wq to NULL after destroy_workqueue() to match the cleanup in mana_gd_cleanup(). This prevents a use-after-free if the workqueue pointer is checked after a failed setup. Fixes: f975a0955276 ("net: mana: Fix double destroy_workqueue on service rescan PCI path") Signed-off-by: Shiraz Saleem <shirazsaleem@microsoft.com> Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260309172443.688392-1-kotaranov@linux.microsoft.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 daysMerge branch '100GbE' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2026-03-10 (ice, iavf, i40e, e1000e, e1000) Nikolay Aleksandrov changes return code of RDMA related ice devlink get parameters when irdma is not enabled to -EOPNOTSUPP as current return of -ENODEV causes issues with devlink output. Petr Oros resolves a couple of issues in iavf; freeing PTP resources before reset and disable. Fixing contention issues with the netdev lock between reset and some ethtool operations. Alok Tiwari corrects an incorrect comparison of cloud filter values and adjust some passed arguments to sizeof() for consistency on i40e. Matt Vollrath removes an incorrect decrement for DMA error on e1000 and e1000e drivers. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: e1000/e1000e: Fix leak in DMA error cleanup i40e: fix src IP mask checks and memcpy argument names in cloud filter iavf: fix incorrect reset handling in callbacks iavf: fix PTP use-after-free during reset drivers: net: ice: fix devlink parameters get without irdma ==================== Link: https://patch.msgid.link/20260310205654.4109072-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 daysnet: dsa: realtek: Fix LED group port bit for non-zero LED groupMarek Behún
The rtl8366rb_led_group_port_mask() function always returns LED port bit in LED group 0; the switch statement returns the same thing in all non-default cases. This means that the driver does not currently support configuring LEDs in non-zero LED groups. Fix this. Fixes: 32d617005475a71e ("net: dsa: realtek: add LED drivers for rtl8366rb") Signed-off-by: Marek Behún <kabel@kernel.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20260311111237.29002-1-kabel@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 daysnet: dsa: microchip: Fix error path in PTP IRQ setupBastien Curutchet (Schneider Electric)
If request_threaded_irq() fails during the PTP message IRQ setup, the newly created IRQ mapping is never disposed. Indeed, the ksz_ptp_irq_setup()'s error path only frees the mappings that were successfully set up. Dispose the newly created mapping if the associated request_threaded_irq() fails at setup. Cc: stable@vger.kernel.org Fixes: d0b8fec8ae505 ("net: dsa: microchip: Fix symetry in ksz_ptp_msg_irq_{setup/free}()") Signed-off-by: Bastien Curutchet (Schneider Electric) <bastien.curutchet@bootlin.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Link: https://patch.msgid.link/20260309-ksz-ptp-irq-fix-v1-1-757b3b985955@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 daysnet: bonding: Fix nd_tbl NULL dereference when IPv6 is disabledRicardo B. Marlière
When booting with the 'ipv6.disable=1' parameter, the nd_tbl is never initialized because inet6_init() exits before ndisc_init() is called which initializes it. If bonding ARP/NS validation is enabled, an IPv6 NS/NA packet received on a slave can reach bond_validate_na(), which calls bond_has_this_ip6(). That path calls ipv6_chk_addr() and can crash in __ipv6_chk_addr_and_flags(). BUG: kernel NULL pointer dereference, address: 00000000000005d8 Oops: Oops: 0000 [#1] SMP NOPTI RIP: 0010:__ipv6_chk_addr_and_flags+0x69/0x170 Call Trace: <IRQ> ipv6_chk_addr+0x1f/0x30 bond_validate_na+0x12e/0x1d0 [bonding] ? __pfx_bond_handle_frame+0x10/0x10 [bonding] bond_rcv_validate+0x1a0/0x450 [bonding] bond_handle_frame+0x5e/0x290 [bonding] ? srso_alias_return_thunk+0x5/0xfbef5 __netif_receive_skb_core.constprop.0+0x3e8/0xe50 ? srso_alias_return_thunk+0x5/0xfbef5 ? update_cfs_rq_load_avg+0x1a/0x240 ? srso_alias_return_thunk+0x5/0xfbef5 ? __enqueue_entity+0x5e/0x240 __netif_receive_skb_one_core+0x39/0xa0 process_backlog+0x9c/0x150 __napi_poll+0x30/0x200 ? srso_alias_return_thunk+0x5/0xfbef5 net_rx_action+0x338/0x3b0 handle_softirqs+0xc9/0x2a0 do_softirq+0x42/0x60 </IRQ> <TASK> __local_bh_enable_ip+0x62/0x70 __dev_queue_xmit+0x2d3/0x1000 ? srso_alias_return_thunk+0x5/0xfbef5 ? srso_alias_return_thunk+0x5/0xfbef5 ? packet_parse_headers+0x10a/0x1a0 packet_sendmsg+0x10da/0x1700 ? kick_pool+0x5f/0x140 ? srso_alias_return_thunk+0x5/0xfbef5 ? __queue_work+0x12d/0x4f0 __sys_sendto+0x1f3/0x220 __x64_sys_sendto+0x24/0x30 do_syscall_64+0x101/0xf80 ? exc_page_fault+0x6e/0x170 ? srso_alias_return_thunk+0x5/0xfbef5 entry_SYSCALL_64_after_hwframe+0x77/0x7f </TASK> Fix this by checking ipv6_mod_enabled() before dispatching IPv6 packets to bond_na_rcv(). If IPv6 is disabled, return early from bond_rcv_validate() and avoid the path to ipv6_chk_addr(). Suggested-by: Fernando Fernandez Mancera <fmancera@suse.de> Fixes: 4e24be018eb9 ("bonding: add new parameter ns_targets") Signed-off-by: Ricardo B. Marlière <rbm@suse.com> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://patch.msgid.link/20260307-net-nd_tbl_fixes-v4-2-e2677e85628c@suse.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 daysnet: bcmgenet: fix broken EEE by converting to phylib-managed stateNicolai Buchwitz
The bcmgenet EEE implementation is broken in several ways. phy_support_eee() is never called, so the PHY never advertises EEE and phylib never sets phydev->enable_tx_lpi. bcmgenet_mac_config() checks priv->eee.eee_enabled to decide whether to enable the MAC LPI logic, but that field is never initialised to true, so the MAC never enters Low Power Idle even when EEE is negotiated - wasting the power savings EEE is designed to provide. The only way to get EEE working at all is a manual 'ethtool --set-eee eth0 eee on' after every link-up, and even then bcmgenet_get_eee() immediately clobbers the reported state because phy_ethtool_get_eee() overwrites eee_enabled and tx_lpi_enabled with the uninitialised PHY eee_cfg values. Finally, bcmgenet_mac_config() is only called on link-up, so EEE is never disabled in hardware on link-down. Fix all of this by removing the MAC-side EEE state tracking (priv->eee) and aligning with the pattern used by other non-phylink MAC drivers such as FEC. Call phy_support_eee() in bcmgenet_mii_probe() so the PHY advertises EEE link modes and phylib tracks negotiation state. Move the EEE hardware control to bcmgenet_mii_setup(), which is called on every link event, and drive it directly from phydev->enable_tx_lpi - the flag phylib sets when EEE is negotiated and the user has not disabled it. This enables EEE automatically once the link partner agrees and disables it cleanly on link-down. Make bcmgenet_get_eee() and bcmgenet_set_eee() pure passthroughs to phy_ethtool_get_eee() and phy_ethtool_set_eee(), with the MAC hardware register read/written for tx_lpi_timer. Drop struct ethtool_keee eee from struct bcmgenet_priv. Fixes: fe0d4fd9285e ("net: phy: Keep track of EEE configuration") Link: https://lore.kernel.org/netdev/d352039f-4cbb-41e6-9aeb-0b4f3941b54c@lunn.ch/ Suggested-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Nicolai Buchwitz <nb@tipi-net.de> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Tested-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://patch.msgid.link/20260310054935.1238594-1-nb@tipi-net.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 daysnet: dsa: mxl862xx: don't set user_mii_busDaniel Golle
The PHY addresses in the MII bus are not equal to the port addresses, so the bus cannot be assigned as user_mii_bus. Falling back on the user_mii_bus in case a PHY isn't declared in device tree will result in using the wrong (in this case: off-by-+1) PHY. Remove the wrong assignment. Fixes: 23794bec1cb60 ("net: dsa: add basic initial driver for MxL862xx switches") Suggested-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Daniel Golle <daniel@makrotopia.org> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Link: https://patch.msgid.link/0f0df310fd8cab57e0e5e3d0831dd057fd05bcd5.1773103271.git.daniel@makrotopia.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 daysnet: ethernet: arc: emac: quiesce interrupts before requesting IRQFan Wu
Normal RX/TX interrupts are enabled later, in arc_emac_open(), so probe should not see interrupt delivery in the usual case. However, hardware may still present stale or latched interrupt status left by firmware or the bootloader. If probe later unwinds after devm_request_irq() has installed the handler, such a stale interrupt can still reach arc_emac_intr() during teardown and race with release of the associated net_device. Avoid that window by putting the device into a known quiescent state before requesting the IRQ: disable all EMAC interrupt sources and clear any pending EMAC interrupt status bits. This keeps the change hardware-focused and minimal, while preventing spurious IRQ delivery from leftover state. Fixes: e4f2379db6c6 ("ethernet/arc/arc_emac - Add new driver") Cc: stable@vger.kernel.org Signed-off-by: Fan Wu <fanwu01@zju.edu.cn> Link: https://patch.msgid.link/20260309132409.584966-1-fanwu01@zju.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 daysnet: macb: Shuffle the tx ring before enabling txKevin Hao
Quanyang observed that when using an NFS rootfs on an AMD ZynqMp board, the rootfs may take an extended time to recover after a suspend. Upon investigation, it was determined that the issue originates from a problem in the macb driver. According to the Zynq UltraScale TRM [1], when transmit is disabled, the transmit buffer queue pointer resets to point to the address specified by the transmit buffer queue base address register. In the current implementation, the code merely resets `queue->tx_head` and `queue->tx_tail` to '0'. This approach presents several issues: - Packets already queued in the tx ring are silently lost, leading to memory leaks since the associated skbs cannot be released. - Concurrent write access to `queue->tx_head` and `queue->tx_tail` may occur from `macb_tx_poll()` or `macb_start_xmit()` when these values are reset to '0'. - The transmission may become stuck on a packet that has already been sent out, with its 'TX_USED' bit set, but has not yet been processed. However, due to the manipulation of 'queue->tx_head' and 'queue->tx_tail', `macb_tx_poll()` incorrectly assumes there are no packets to handle because `queue->tx_head == queue->tx_tail`. This issue is only resolved when a new packet is placed at this position. This is the root cause of the prolonged recovery time observed for the NFS root filesystem. To resolve this issue, shuffle the tx ring and tx skb array so that the first unsent packet is positioned at the start of the tx ring. Additionally, ensure that updates to `queue->tx_head` and `queue->tx_tail` are properly protected with the appropriate lock. [1] https://docs.amd.com/v/u/en-US/ug1085-zynq-ultrascale-trm Fixes: bf9cf80cab81 ("net: macb: Fix tx/rx malfunction after phy link down and up") Reported-by: Quanyang Wang <quanyang.wang@windriver.com> Signed-off-by: Kevin Hao <haokexin@gmail.com> Cc: stable@vger.kernel.org Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260307-zynqmp-v2-1-6ef98a70e1d0@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 dayse1000/e1000e: Fix leak in DMA error cleanupMatt Vollrath
If an error is encountered while mapping TX buffers, the driver should unmap any buffers already mapped for that skb. Because count is incremented after a successful mapping, it will always match the correct number of unmappings needed when dma_error is reached. Decrementing count before the while loop in dma_error causes an off-by-one error. If any mapping was successful before an unsuccessful mapping, exactly one DMA mapping would leak. In these commits, a faulty while condition caused an infinite loop in dma_error: Commit 03b1320dfcee ("e1000e: remove use of skb_dma_map from e1000e driver") Commit 602c0554d7b0 ("e1000: remove use of skb_dma_map from e1000 driver") Commit c1fa347f20f1 ("e1000/e1000e/igb/igbvf/ixgb/ixgbe: Fix tests of unsigned in *_tx_map()") fixed the infinite loop, but introduced the off-by-one error. This issue may still exist in the igbvf driver, but I did not address it in this patch. Fixes: c1fa347f20f1 ("e1000/e1000e/igb/igbvf/ixgb/ixgbe: Fix tests of unsigned in *_tx_map()") Assisted-by: Claude:claude-4.6-opus Signed-off-by: Matt Vollrath <tactii@gmail.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
6 daysi40e: fix src IP mask checks and memcpy argument names in cloud filterAlok Tiwari
Fix following issues in the IPv4 and IPv6 cloud filter handling logic in both the add and delete paths: - The source-IP mask check incorrectly compares mask.src_ip[0] against tcf.dst_ip[0]. Update it to compare against tcf.src_ip[0]. This likely goes unnoticed because the check is in an "else if" path that only executes when dst_ip is not set, most cloud filter use cases focus on destination-IP matching, and the buggy condition can accidentally evaluate true in some cases. - memcpy() for the IPv4 source address incorrectly uses ARRAY_SIZE(tcf.dst_ip) instead of ARRAY_SIZE(tcf.src_ip), although both arrays are the same size. - The IPv4 memcpy operations used ARRAY_SIZE(tcf.dst_ip) and ARRAY_SIZE (tcf.src_ip), Update these to use sizeof(cfilter->ip.v4.dst_ip) and sizeof(cfilter->ip.v4.src_ip) to ensure correct and explicit copy size. - In the IPv6 delete path, memcmp() uses sizeof(src_ip6) when comparing dst_ip6 fields. Replace this with sizeof(dst_ip6) to make the intent explicit, even though both fields are struct in6_addr. Fixes: e284fc280473 ("i40e: Add and delete cloud filter") Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
6 daysiavf: fix incorrect reset handling in callbacksPetr Oros
Three driver callbacks schedule a reset and wait for its completion: ndo_change_mtu(), ethtool set_ringparam(), and ethtool set_channels(). Waiting for reset in ndo_change_mtu() and set_ringparam() was added by commit c2ed2403f12c ("iavf: Wait for reset in callbacks which trigger it") to fix a race condition where adding an interface to bonding immediately after MTU or ring parameter change failed because the interface was still in __RESETTING state. The same commit also added waiting in iavf_set_priv_flags(), which was later removed by commit 53844673d555 ("iavf: kill "legacy-rx" for good"). Waiting in set_channels() was introduced earlier by commit 4e5e6b5d9d13 ("iavf: Fix return of set the new channel count") to ensure the PF has enough time to complete the VF reset when changing channel count, and to return correct error codes to userspace. Commit ef490bbb2267 ("iavf: Add net_shaper_ops support") added net_shaper_ops to iavf, which required reset_task to use _locked NAPI variants (napi_enable_locked, napi_disable_locked) that need the netdev instance lock. Later, commit 7e4d784f5810 ("net: hold netdev instance lock during rtnetlink operations") and commit 2bcf4772e45a ("net: ethtool: try to protect all callback with netdev instance lock") started holding the netdev instance lock during ndo and ethtool callbacks for drivers with net_shaper_ops. Finally, commit 120f28a6f314 ("iavf: get rid of the crit lock") replaced the driver's crit_lock with netdev_lock in reset_task, causing incorrect behavior: the callback holds netdev_lock and waits for reset_task, but reset_task needs the same lock: Thread 1 (callback) Thread 2 (reset_task) ------------------- --------------------- netdev_lock() [blocked on workqueue] ndo_change_mtu() or ethtool op iavf_schedule_reset() iavf_wait_for_reset() iavf_reset_task() waiting... netdev_lock() <- blocked This does not strictly deadlock because iavf_wait_for_reset() uses wait_event_interruptible_timeout() with a 5-second timeout. The wait eventually times out, the callback returns an error to userspace, and after the lock is released reset_task completes the reset. This leads to incorrect behavior: userspace sees an error even though the configuration change silently takes effect after the timeout. Fix this by extracting the reset logic from iavf_reset_task() into a new iavf_reset_step() function that expects netdev_lock to be already held. The three callbacks now call iavf_reset_step() directly instead of scheduling the work and waiting, performing the reset synchronously in the caller's context which already holds netdev_lock. This eliminates both the incorrect error reporting and the need for iavf_wait_for_reset(), which is removed along with the now-unused reset_waitqueue. The workqueue-based iavf_reset_task() becomes a thin wrapper that acquires netdev_lock and calls iavf_reset_step(), preserving its use for PF-initiated resets. The callbacks may block for several seconds while iavf_reset_step() polls hardware registers, but this is acceptable since netdev_lock is a per-device mutex and only serializes operations on the same interface. v3: - Remove netif_running() guard from iavf_set_channels(). Unlike set_ringparam where descriptor counts are picked up by iavf_open() directly, num_req_queues is only consumed during iavf_reinit_interrupt_scheme() in the reset path. Skipping the reset on a down device would silently discard the channel count change. - Remove dead reset_waitqueue code (struct field, init, and all wake_up calls) since iavf_wait_for_reset() was the only consumer. Fixes: 120f28a6f314 ("iavf: get rid of the crit lock") Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Petr Oros <poros@redhat.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
6 daysiavf: fix PTP use-after-free during resetPetr Oros
Commit 7c01dbfc8a1c5f ("iavf: periodically cache PHC time") introduced a worker to cache PHC time, but failed to stop it during reset or disable. This creates a race condition where `iavf_reset_task()` or `iavf_disable_vf()` free adapter resources (AQ) while the worker is still running. If the worker triggers `iavf_queue_ptp_cmd()` during teardown, it accesses freed memory/locks, leading to a crash. Fix this by calling `iavf_ptp_release()` before tearing down the adapter. This ensures `ptp_clock_unregister()` synchronously cancels the worker and cleans up the chardev before the backing resources are destroyed. Fixes: 7c01dbfc8a1c5f ("iavf: periodically cache PHC time") Signed-off-by: Petr Oros <poros@redhat.com> Reviewed-by: Ivan Vecera <ivecera@redhat.com> Acked-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
6 daysdrivers: net: ice: fix devlink parameters get without irdmaNikolay Aleksandrov
If CONFIG_IRDMA isn't enabled but there are ice NICs in the system, the driver will prevent full devlink dev param show dump because its rdma get callbacks return ENODEV and stop the dump. For example: $ devlink dev param show pci/0000:82:00.0: name msix_vec_per_pf_max type generic values: cmode driverinit value 2 name msix_vec_per_pf_min type generic values: cmode driverinit value 2 kernel answers: No such device Returning EOPNOTSUPP allows the dump to continue so we can see all devices' devlink parameters. Fixes: c24a65b6a27c ("iidc/ice/irdma: Update IDC to support multiple consumers") Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
6 daysMerge tag 'linux-can-fixes-for-7.0-20260310' of ↵Paolo Abeni
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2026-03-10 this is a pull request of 2 patches for net/main. Haibo Chen's patch fixes the maximum allowed bit rate error, which was broken in v6.19. Wenyuan Li contributes a patch for the hi311x driver that adds missing error checking in the caller of the hi3110_power_enable() function, hi3110_open(). linux-can-fixes-for-7.0-20260310 * tag 'linux-can-fixes-for-7.0-20260310' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can: can: hi311x: hi3110_open(): add check for hi3110_power_enable() return value can: dev: keep the max bitrate error at 5% ==================== Link: https://patch.msgid.link/20260310103547.2299403-1-mkl@pengutronix.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>
6 daysamd-xgbe: reset PHY settings before starting PHYRaju Rangoju
commit f93505f35745 ("amd-xgbe: let the MAC manage PHY PM") moved xgbe_phy_reset() from xgbe_open() to xgbe_start(), placing it after phy_start(). As a result, the PHY settings were being reset after the PHY had already started. Reorder the calls so that the PHY settings are reset before phy_start() is invoked. Fixes: f93505f35745 ("amd-xgbe: let the MAC manage PHY PM") Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com> Link: https://patch.msgid.link/20260306111629.1515676-4-Raju.Rangoju@amd.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
6 daysamd-xgbe: prevent CRC errors during RX adaptation with AN disabledRaju Rangoju
When operating in 10GBASE-KR mode with auto-negotiation disabled and RX adaptation enabled, CRC errors can occur during the RX adaptation process. This happens because the driver continues transmitting and receiving packets while adaptation is in progress. Fix this by stopping TX/RX immediately when the link goes down and RX adaptation needs to be re-triggered, and only re-enabling TX/RX after adaptation completes and the link is confirmed up. Introduce a flag to track whether TX/RX was disabled for adaptation so it can be restored correctly. This prevents packets from being transmitted or received during the RX adaptation window and avoids CRC errors from corrupted frames. The flag tracking the data path state is synchronized with hardware state in xgbe_start() to prevent stale state after device restarts. This ensures that after a restart cycle (where xgbe_stop disables TX/RX and xgbe_start re-enables them), the flag correctly reflects that the data path is active. Fixes: 4f3b20bfbb75 ("amd-xgbe: add support for rx-adaptation") Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com> Link: https://patch.msgid.link/20260306111629.1515676-3-Raju.Rangoju@amd.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
6 daysamd-xgbe: fix link status handling in xgbe_rx_adaptationRaju Rangoju
The link status bit is latched low to allow detection of momentary link drops. If the status indicates that the link is already down, read it again to obtain the current state. Fixes: 4f3b20bfbb75 ("amd-xgbe: add support for rx-adaptation") Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com> Link: https://patch.msgid.link/20260306111629.1515676-2-Raju.Rangoju@amd.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
6 daysbonding: fix type confusion in bond_setup_by_slave()Jiayuan Chen
kernel BUG at net/core/skbuff.c:2306! Oops: invalid opcode: 0000 [#1] SMP KASAN NOPTI RIP: 0010:pskb_expand_head+0xa08/0xfe0 net/core/skbuff.c:2306 RSP: 0018:ffffc90004aff760 EFLAGS: 00010293 RAX: 0000000000000000 RBX: ffff88807e3c8780 RCX: ffffffff89593e0e RDX: ffff88807b7c4900 RSI: ffffffff89594747 RDI: ffff88807b7c4900 RBP: 0000000000000820 R08: 0000000000000005 R09: 0000000000000000 R10: 00000000961a63e0 R11: 0000000000000000 R12: ffff88807e3c8780 R13: 00000000961a6560 R14: dffffc0000000000 R15: 00000000961a63e0 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fe1a0ed8df0 CR3: 000000002d816000 CR4: 00000000003526f0 Call Trace: <TASK> ipgre_header+0xdd/0x540 net/ipv4/ip_gre.c:900 dev_hard_header include/linux/netdevice.h:3439 [inline] packet_snd net/packet/af_packet.c:3028 [inline] packet_sendmsg+0x3ae5/0x53c0 net/packet/af_packet.c:3108 sock_sendmsg_nosec net/socket.c:727 [inline] __sock_sendmsg net/socket.c:742 [inline] ____sys_sendmsg+0xa54/0xc30 net/socket.c:2592 ___sys_sendmsg+0x190/0x1e0 net/socket.c:2646 __sys_sendmsg+0x170/0x220 net/socket.c:2678 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0x106/0xf80 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7fe1a0e6c1a9 When a non-Ethernet device (e.g. GRE tunnel) is enslaved to a bond, bond_setup_by_slave() directly copies the slave's header_ops to the bond device: bond_dev->header_ops = slave_dev->header_ops; This causes a type confusion when dev_hard_header() is later called on the bond device. Functions like ipgre_header(), ip6gre_header(),all use netdev_priv(dev) to access their device-specific private data. When called with the bond device, netdev_priv() returns the bond's private data (struct bonding) instead of the expected type (e.g. struct ip_tunnel), leading to garbage values being read and kernel crashes. Fix this by introducing bond_header_ops with wrapper functions that delegate to the active slave's header_ops using the slave's own device. This ensures netdev_priv() in the slave's header functions always receives the correct device. The fix is placed in the bonding driver rather than individual device drivers, as the root cause is bond blindly inheriting header_ops from the slave without considering that these callbacks expect a specific netdev_priv() layout. The type confusion can be observed by adding a printk in ipgre_header() and running the following commands: ip link add dummy0 type dummy ip addr add 10.0.0.1/24 dev dummy0 ip link set dummy0 up ip link add gre1 type gre local 10.0.0.1 ip link add bond1 type bond mode active-backup ip link set gre1 master bond1 ip link set gre1 up ip link set bond1 up ip addr add fe80::1/64 dev bond1 Fixes: 1284cd3a2b74 ("bonding: two small fixes for IPoIB support") Suggested-by: Jay Vosburgh <jv@jvosburgh.net> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com> Link: https://patch.msgid.link/20260306021508.222062-1-jiayuan.chen@linux.dev Signed-off-by: Paolo Abeni <pabeni@redhat.com>
6 dayscan: hi311x: hi3110_open(): add check for hi3110_power_enable() return valueWenyuan Li
In hi3110_open(), the return value of hi3110_power_enable() is not checked. If power enable fails, the device may not function correctly, while the driver still returns success. Add a check for the return value and propagate the error accordingly. Signed-off-by: Wenyuan Li <2063309626@qq.com> Link: https://patch.msgid.link/tencent_B5E2E7528BB28AA8A2A56E16C49BD58B8B07@qq.com Fixes: 57e83fb9b746 ("can: hi311x: Add Holt HI-311x CAN driver") [mkl: adjust subject, commit message and jump label] Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
6 dayscan: dev: keep the max bitrate error at 5%Haibo Chen
Commit b360a13d44db ("can: dev: print bitrate error with two decimal digits") changed calculation of the bit rate error from on-tenth of a percent to on-hundredth of a percent, but forgot to adjust the scale of the CAN_CALC_MAX_ERROR constant. Keeping the existing logic unchanged: Only when the bitrate error exceeds 5% should an error be returned. Otherwise, simply output a warning log. Fixes: b360a13d44db ("can: dev: print bitrate error with two decimal digits") Signed-off-by: Haibo Chen <haibo.chen@nxp.com> Link: https://patch.msgid.link/20260306-can-fix-v1-1-ac526cec6777@nxp.com Cc: stable@kernel.org [mkl: improve commit message] Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
6 daysmctp: i2c: fix skb memory leak in receive pathHaiyue Wang
When 'midev->allow_rx' is false, the newly allocated skb isn't consumed by netif_rx(), it needs to free the skb directly. Fixes: f5b8abf9fc3d ("mctp i2c: MCTP I2C binding driver") Signed-off-by: Haiyue Wang <haiyuewa@163.com> Link: https://patch.msgid.link/20260305143240.97592-1-haiyuewa@163.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
6 daysnet: enetc: do not skip setting LaBCR[MDIO_PHYAD_PRTAD] for addr 0Wei Fang
Given that some platforms may use PHY address 0 (I suppose the PHY may not treat address 0 as a broadcast address or default response address). It is possible for some boards to connect multiple PHYs to the same ENETC MAC, for example: - a PHY with a non-zero address connects to ENETC MAC through SGMII interface (selected via DTS_A) - a PHY with address 0 connects to ENETC MAC through RGMII interface (selected via DTS_B) For the case where the ENETC port MDIO is used to manage the PHY, when switching from DTS_A to DTS_B via soft reboot, LaBCR[MDIO_PHYAD_PRTAD] must be updated to 0 because the NETCMIX block is not reset during soft reboot. However, the current driver explicitly skips configuring address 0, causing LaBCR[MDIO_PHYAD_PRTAD] to retain its old value. Therefore, remove the special-case skip of PHY address 0 so that valid configurations using address 0 are properly supported. Fixes: 6633df05f3ad ("net: enetc: set the external PHY address in IERB for port MDIO usage") Fixes: 50bfd9c06f0f ("net: enetc: set external PHY address in IERB for i.MX94 ENETC") Reviewed-by: Clark Wang <xiaoning.wang@nxp.com> Signed-off-by: Wei Fang <wei.fang@nxp.com> Link: https://patch.msgid.link/20260305031211.904812-3-wei.fang@nxp.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
6 daysnet: enetc: fix incorrect fallback PHY address handlingWei Fang
The current netc_get_phy_addr() implementation falls back to PHY address 0 when the "mdio" node or the PHY child node is missing. On i.MX95, this causes failures when a real PHY is actually assigned address 0 and is managed through the EMDIO interface. Because the bit 0 of phy_mask will be set, leading imx95_enetc_mdio_phyaddr_config() to return an error, and the netc_blk_ctrl driver probe subsequently fails. Fix this by returning -ENODEV when neither an "mdio" node nor any PHY node is present, it means that ENETC port MDIO is not used to manage the PHY, so there is no need to configure LaBCR[MDIO_PHYAD_PRTAD]. Reported-by: Alexander Stein <alexander.stein@ew.tq-group.com> Closes: https://lore.kernel.org/all/7825188.GXAFRqVoOG@steina-w Fixes: 6633df05f3ad ("net: enetc: set the external PHY address in IERB for port MDIO usage") Reviewed-by: Clark Wang <xiaoning.wang@nxp.com> Tested-by: Alexander Stein <alexander.stein@ew.tq-group.com> Signed-off-by: Wei Fang <wei.fang@nxp.com> Link: https://patch.msgid.link/20260305031211.904812-2-wei.fang@nxp.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
6 daysbnxt_en: Fix RSS table size check when changing ethtool channelsPavan Chebbi
When changing channels, the current check in bnxt_set_channels() is not checking for non-default RSS contexts when the RSS table size changes. The current check for IFF_RXFH_CONFIGURED is only sufficient for the default RSS context. Expand the check to include the presence of any non-default RSS contexts. Allowing such change will result in incorrect configuration of the context's RSS table when the table size changes. Fixes: b3d0083caf9a ("bnxt_en: Support RSS contexts in ethtool .{get|set}_rxfh()") Reported-by: Björn Töpel <bjorn@kernel.org> Link: https://lore.kernel.org/netdev/20260303181535.2671734-1-bjorn@kernel.org/ Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20260306225854.3575672-1-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 daysnet: usb: lan78xx: fix WARN in __netif_napi_del_locked on disconnectOleksij Rempel
Remove redundant netif_napi_del() call from disconnect path. A WARN may be triggered in __netif_napi_del_locked() during USB device disconnect: WARNING: CPU: 0 PID: 11 at net/core/dev.c:7417 __netif_napi_del_locked+0x2b4/0x350 This happens because netif_napi_del() is called in the disconnect path while NAPI is still enabled. However, it is not necessary to call netif_napi_del() explicitly, since unregister_netdev() will handle NAPI teardown automatically and safely. Removing the redundant call avoids triggering the warning. Full trace: lan78xx 1-1:1.0 enu1: Failed to read register index 0x000000c4. ret = -ENODEV lan78xx 1-1:1.0 enu1: Failed to set MAC down with error -ENODEV lan78xx 1-1:1.0 enu1: Link is Down lan78xx 1-1:1.0 enu1: Failed to read register index 0x00000120. ret = -ENODEV ------------[ cut here ]------------ WARNING: CPU: 0 PID: 11 at net/core/dev.c:7417 __netif_napi_del_locked+0x2b4/0x350 Modules linked in: flexcan can_dev fuse CPU: 0 UID: 0 PID: 11 Comm: kworker/0:1 Not tainted 6.16.0-rc2-00624-ge926949dab03 #9 PREEMPT Hardware name: SKOV IMX8MP CPU revC - bd500 (DT) Workqueue: usb_hub_wq hub_event pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : __netif_napi_del_locked+0x2b4/0x350 lr : __netif_napi_del_locked+0x7c/0x350 sp : ffffffc085b673c0 x29: ffffffc085b673c0 x28: ffffff800b7f2000 x27: ffffff800b7f20d8 x26: ffffff80110bcf58 x25: ffffff80110bd978 x24: 1ffffff0022179eb x23: ffffff80110bc000 x22: ffffff800b7f5000 x21: ffffff80110bc000 x20: ffffff80110bcf38 x19: ffffff80110bcf28 x18: dfffffc000000000 x17: ffffffc081578940 x16: ffffffc08284cee0 x15: 0000000000000028 x14: 0000000000000006 x13: 0000000000040000 x12: ffffffb0022179e8 x11: 1ffffff0022179e7 x10: ffffffb0022179e7 x9 : dfffffc000000000 x8 : 0000004ffdde8619 x7 : ffffff80110bcf3f x6 : 0000000000000001 x5 : ffffff80110bcf38 x4 : ffffff80110bcf38 x3 : 0000000000000000 x2 : 0000000000000000 x1 : 1ffffff0022179e7 x0 : 0000000000000000 Call trace: __netif_napi_del_locked+0x2b4/0x350 (P) lan78xx_disconnect+0xf4/0x360 usb_unbind_interface+0x158/0x718 device_remove+0x100/0x150 device_release_driver_internal+0x308/0x478 device_release_driver+0x1c/0x30 bus_remove_device+0x1a8/0x368 device_del+0x2e0/0x7b0 usb_disable_device+0x244/0x540 usb_disconnect+0x220/0x758 hub_event+0x105c/0x35e0 process_one_work+0x760/0x17b0 worker_thread+0x768/0xce8 kthread+0x3bc/0x690 ret_from_fork+0x10/0x20 irq event stamp: 211604 hardirqs last enabled at (211603): [<ffffffc0828cc9ec>] _raw_spin_unlock_irqrestore+0x84/0x98 hardirqs last disabled at (211604): [<ffffffc0828a9a84>] el1_dbg+0x24/0x80 softirqs last enabled at (211296): [<ffffffc080095f10>] handle_softirqs+0x820/0xbc8 softirqs last disabled at (210993): [<ffffffc080010288>] __do_softirq+0x18/0x20 ---[ end trace 0000000000000000 ]--- lan78xx 1-1:1.0 enu1: failed to kill vid 0081/0 Fixes: e110bc825897 ("net: usb: lan78xx: Convert to PHYLINK for improved PHY and MAC management") Cc: stable@vger.kernel.org Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://patch.msgid.link/20260305143429.530909-5-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 daysnet: usb: lan78xx: skip LTM configuration for LAN7850Oleksij Rempel
Do not configure Latency Tolerance Messaging (LTM) on USB 2.0 hardware. The LAN7850 is a High-Speed (USB 2.0) only device and does not support SuperSpeed features like LTM. Currently, the driver unconditionally attempts to configure LTM registers during initialization. On the LAN7850, these registers do not exist, resulting in writes to invalid or undocumented memory space. This issue was identified during a port to the regmap API with strict register validation enabled. While no functional issues or crashes have been observed from these invalid writes, bypassing LTM initialization on the LAN7850 ensures the driver strictly adheres to the hardware's valid register map. Fixes: 55d7de9de6c3 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet device driver") Cc: stable@vger.kernel.org Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://patch.msgid.link/20260305143429.530909-4-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 daysnet: usb: lan78xx: fix TX byte statistics for small packetsOleksij Rempel
Account for hardware auto-padding in TX byte counters to reflect actual wire traffic. The LAN7850 hardware automatically pads undersized frames to the minimum Ethernet frame length (ETH_ZLEN, 60 bytes). However, the driver tracks the network statistics based on the unpadded socket buffer length. This results in the tx_bytes counter under-reporting the actual physical bytes placed on the Ethernet wire for small packets (like short ARP or ICMP requests). Use max_t() to ensure the transmission statistics accurately account for the hardware-generated padding. Fixes: d383216a7efe ("lan78xx: Introduce Tx URB processing improvements") Cc: stable@vger.kernel.org Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://patch.msgid.link/20260305143429.530909-3-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 daysnet: usb: lan78xx: fix silent drop of packets with checksum errorsOleksij Rempel
Do not drop packets with checksum errors at the USB driver level; pass them to the network stack. Previously, the driver dropped all packets where the 'Receive Error Detected' (RED) bit was set, regardless of the specific error type. This caused packets with only IP or TCP/UDP checksum errors to be dropped before reaching the kernel, preventing the network stack from accounting for them or performing software fallback. Add a mask for hard hardware errors to safely drop genuinely corrupt frames, while allowing checksum-errored frames to pass with their ip_summed field explicitly set to CHECKSUM_NONE. Fixes: 55d7de9de6c3 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet device driver") Cc: stable@vger.kernel.org Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://patch.msgid.link/20260305143429.530909-2-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 daysserial: caif: hold tty->link reference in ldisc_open and ser_releaseShuangpeng Bai
A reproducer triggers a KASAN slab-use-after-free in pty_write_room() when caif_serial's TX path calls tty_write_room(). The faulting access is on tty->link->port. Hold an extra kref on tty->link for the lifetime of the caif_serial line discipline: get it in ldisc_open() and drop it in ser_release(), and also drop it on the ldisc_open() error path. With this change applied, the reproducer no longer triggers the UAF in my testing. Link: https://gist.github.com/shuangpengbai/c898debad6bdf170a84be7e6b3d8707f Link: https://lore.kernel.org/netdev/20260301220525.1546355-1-shuangpeng.kernel@gmail.com Fixes: e31d5a05948e ("caif: tty's are kref objects so take a reference") Signed-off-by: Shuangpeng Bai <shuangpeng.kernel@gmail.com> Reviewed-by: Jiayuan Chen <jiayuan.chen@linux.dev> Link: https://patch.msgid.link/20260306034006.3395740-1-shuangpeng.kernel@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 daysnet: sfp: improve Huawei MA5671a fixupÁlvaro Fernández Rojas
With the current sfp_fixup_ignore_tx_fault() fixup we ignore the TX_FAULT signal, but we also need to apply sfp_fixup_ignore_los() in order to be able to communicate with the module even if the fiber isn't connected for configuration purposes. This is needed for all the MA5671a firmwares, excluding the FS modded firmware. Fixes: 2069624dac19 ("net: sfp: Add tx-fault workaround for Huawei MA5671A SFP ONT") Signed-off-by: Álvaro Fernández Rojas <noltari@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20260306125139.213637-1-noltari@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 daysMerge tag 'for-linus-7.0-rc3-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: - a cleanup of arch/x86/kernel/head_64.S removing the pre-built page tables for Xen guests - a small comment update - another cleanup for Xen PVH guests mode - fix an issue with Xen PV-devices backed by driver domains * tag 'for-linus-7.0-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/xenbus: better handle backend crash xenbus: add xenbus_device parameter to xenbus_read_driver_state() x86/PVH: Use boot params to pass RSDP address in start_info page x86/xen: update outdated comment xen/acpi-processor: fix _CST detection using undersized evaluation buffer x86/xen: Build identity mapping page tables dynamically for XENPV
9 daysnet: spacemit: Fix error handling in emac_tx_mem_map()Vivian Wang
The DMA mappings were leaked on mapping error. Free them with the existing emac_free_tx_buf() function. Fixes: bfec6d7f2001 ("net: spacemit: Add K1 Ethernet MAC") Signed-off-by: Vivian Wang <wangruikang@iscas.ac.cn> Link: https://patch.msgid.link/20260305-k1-ethernet-more-fixes-v2-2-e4e434d65055@iscas.ac.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 daysnet: spacemit: Fix error handling in emac_alloc_rx_desc_buffers()Vivian Wang
Even if we get a dma_mapping_error() while mapping an RX buffer, we should still update rx_ring->head to ensure that the buffers we were able to allocate and map are used. Fix this by breaking out to the existing code after the loop, analogous to the existing handling for skb allocation failure. Fixes: bfec6d7f2001 ("net: spacemit: Add K1 Ethernet MAC") Signed-off-by: Vivian Wang <wangruikang@iscas.ac.cn> Link: https://patch.msgid.link/20260305-k1-ethernet-more-fixes-v2-1-e4e434d65055@iscas.ac.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 daysnet: dsa: sja1105: ensure phylink_replay_link_end() will not be missedVladimir Oltean
Most errors that can occur in sja1105_static_config_reload() are fatal (example: fail to communicate with hardware), but not all are. For example, sja1105_static_config_upload() -> kcalloc() may fail, and if that happens, we have called phylink_replay_link_begin() but never phylink_replay_link_end(). Under that circumstance, all port phylink instances are left in a state where the resolver is stopped with the PHYLINK_DISABLE_REPLAY bit set. We have effectively disabled link management with no way to recover from this condition. Avoid that situation by ensuring phylink_replay_link_begin() is always paired with phylink_replay_link_end(), regardless of whether we faced any errors during switch reset, configuration reload and general state reload. Fixes: 0b2edc531e0b ("net: dsa: sja1105: let phylink help with the replay of link callbacks") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20260304220900.3865120-3-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 daysnet: dsa: sja1105: reorder sja1105_reload_cbs() and phylink_replay_link_end()Vladimir Oltean
Move phylink_replay_link_end() as the last locked operation under sja1105_static_config_reload(). The purpose is to be able to goto this step from the error path of intermediate steps (we must call phylink_replay_link_end()). sja1105_reload_cbs() notably does not depend on port states or link speeds. See commit 954ad9bf13c4 ("net: dsa: sja1105: fix bandwidth discrepancy between tc-cbs software and offload") which has discussed this issue specifically. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20260304220900.3865120-2-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 daysnet/mlx5e: RX, Fix XDP multi-buf frag counting for legacy RQDragos Tatulea
XDP multi-buf programs can modify the layout of the XDP buffer when the program calls bpf_xdp_pull_data() or bpf_xdp_adjust_tail(). The referenced commit in the fixes tag corrected the assumption in the mlx5 driver that the XDP buffer layout doesn't change during a program execution. However, this fix introduced another issue: the dropped fragments still need to be counted on the driver side to avoid page fragment reference counting issues. Such issue can be observed with the test_xdp_native_adjst_tail_shrnk_data selftest when using a payload of 3600 and shrinking by 256 bytes (an upcoming selftest patch): the last fragment gets released by the XDP code but doesn't get tracked by the driver. This results in a negative pp_ref_count during page release and the following splat: WARNING: include/net/page_pool/helpers.h:297 at mlx5e_page_release_fragmented.isra.0+0x4a/0x50 [mlx5_core], CPU#12: ip/3137 Modules linked in: [...] CPU: 12 UID: 0 PID: 3137 Comm: ip Not tainted 6.19.0-rc3+ #12 NONE Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 RIP: 0010:mlx5e_page_release_fragmented.isra.0+0x4a/0x50 [mlx5_core] [...] Call Trace: <TASK> mlx5e_dealloc_rx_wqe+0xcb/0x1a0 [mlx5_core] mlx5e_free_rx_descs+0x7f/0x110 [mlx5_core] mlx5e_close_rq+0x50/0x60 [mlx5_core] mlx5e_close_queues+0x36/0x2c0 [mlx5_core] mlx5e_close_channel+0x1c/0x50 [mlx5_core] mlx5e_close_channels+0x45/0x80 [mlx5_core] mlx5e_safe_switch_params+0x1a5/0x230 [mlx5_core] mlx5e_change_mtu+0xf3/0x2f0 [mlx5_core] netif_set_mtu_ext+0xf1/0x230 do_setlink.isra.0+0x219/0x1180 rtnl_newlink+0x79f/0xb60 rtnetlink_rcv_msg+0x213/0x3a0 netlink_rcv_skb+0x48/0xf0 netlink_unicast+0x24a/0x350 netlink_sendmsg+0x1ee/0x410 __sock_sendmsg+0x38/0x60 ____sys_sendmsg+0x232/0x280 ___sys_sendmsg+0x78/0xb0 __sys_sendmsg+0x5f/0xb0 [...] do_syscall_64+0x57/0xc50 This patch fixes the issue by doing page frag counting on all the original XDP buffer fragments for all relevant XDP actions (XDP_TX , XDP_REDIRECT and XDP_PASS). This is basically reverting to the original counting before the commit in the fixes tag. As frag_page is still pointing to the original tail, the nr_frags parameter to xdp_update_skb_frags_info() needs to be calculated in a different way to reflect the new nr_frags. Fixes: afd5ba577c10 ("net/mlx5e: RX, Fix generating skb from non-linear xdp_buff for legacy RQ") Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Amery Hung <ameryhung@gmail.com> Link: https://patch.msgid.link/20260305142634.1813208-6-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 daysnet/mlx5e: RX, Fix XDP multi-buf frag counting for striding RQDragos Tatulea
XDP multi-buf programs can modify the layout of the XDP buffer when the program calls bpf_xdp_pull_data() or bpf_xdp_adjust_tail(). The referenced commit in the fixes tag corrected the assumption in the mlx5 driver that the XDP buffer layout doesn't change during a program execution. However, this fix introduced another issue: the dropped fragments still need to be counted on the driver side to avoid page fragment reference counting issues. The issue was discovered by the drivers/net/xdp.py selftest, more specifically the test_xdp_native_tx_mb: - The mlx5 driver allocates a page_pool page and initializes it with a frag counter of 64 (pp_ref_count=64) and the internal frag counter to 0. - The test sends one packet with no payload. - On RX (mlx5e_skb_from_cqe_mpwrq_nonlinear()), mlx5 configures the XDP buffer with the packet data starting in the first fragment which is the page mentioned above. - The XDP program runs and calls bpf_xdp_pull_data() which moves the header into the linear part of the XDP buffer. As the packet doesn't contain more data, the program drops the tail fragment since it no longer contains any payload (pp_ref_count=63). - mlx5 device skips counting this fragment. Internal frag counter remains 0. - mlx5 releases all 64 fragments of the page but page pp_ref_count is 63 => negative reference counting error. Resulting splat during the test: WARNING: CPU: 0 PID: 188225 at ./include/net/page_pool/helpers.h:297 mlx5e_page_release_fragmented.isra.0+0xbd/0xe0 [mlx5_core] Modules linked in: [...] CPU: 0 UID: 0 PID: 188225 Comm: ip Not tainted 6.18.0-rc7_for_upstream_min_debug_2025_12_08_11_44 #1 NONE Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:mlx5e_page_release_fragmented.isra.0+0xbd/0xe0 [mlx5_core] [...] Call Trace: <TASK> mlx5e_free_rx_mpwqe+0x20a/0x250 [mlx5_core] mlx5e_dealloc_rx_mpwqe+0x37/0xb0 [mlx5_core] mlx5e_free_rx_descs+0x11a/0x170 [mlx5_core] mlx5e_close_rq+0x78/0xa0 [mlx5_core] mlx5e_close_queues+0x46/0x2a0 [mlx5_core] mlx5e_close_channel+0x24/0x90 [mlx5_core] mlx5e_close_channels+0x5d/0xf0 [mlx5_core] mlx5e_safe_switch_params+0x2ec/0x380 [mlx5_core] mlx5e_change_mtu+0x11d/0x490 [mlx5_core] mlx5e_change_nic_mtu+0x19/0x30 [mlx5_core] netif_set_mtu_ext+0xfc/0x240 do_setlink.isra.0+0x226/0x1100 rtnl_newlink+0x7a9/0xba0 rtnetlink_rcv_msg+0x220/0x3c0 netlink_rcv_skb+0x4b/0xf0 netlink_unicast+0x255/0x380 netlink_sendmsg+0x1f3/0x420 __sock_sendmsg+0x38/0x60 ____sys_sendmsg+0x1e8/0x240 ___sys_sendmsg+0x7c/0xb0 [...] __sys_sendmsg+0x5f/0xb0 do_syscall_64+0x55/0xc70 The problem applies for XDP_PASS as well which is handled in a different code path in the driver. This patch fixes the issue by doing page frag counting on all the original XDP buffer fragments for all relevant XDP actions (XDP_TX , XDP_REDIRECT and XDP_PASS). This is basically reverting to the original counting before the commit in the fixes tag. As frag_page is still pointing to the original tail, the nr_frags parameter to xdp_update_skb_frags_info() needs to be calculated in a different way to reflect the new nr_frags. Fixes: 87bcef158ac1 ("net/mlx5e: RX, Fix generating skb from non-linear xdp_buff for striding RQ") Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Cc: Amery Hung <ameryhung@gmail.com> Reviewed-by: Nimrod Oren <noren@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260305142634.1813208-5-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 daysnet/mlx5e: Fix DMA FIFO desync on error CQE SQ recoveryGal Pressman
In case of a TX error CQE, a recovery flow is triggered, mlx5e_reset_txqsq_cc_pc() resets dma_fifo_cc to 0 but not dma_fifo_pc, desyncing the DMA FIFO producer and consumer. After recovery, the producer pushes new DMA entries at the old dma_fifo_pc, while the consumer reads from position 0. This causes us to unmap stale DMA addresses from before the recovery. The DMA FIFO is a purely software construct with no HW counterpart. At the point of reset, all WQEs have been flushed so dma_fifo_cc is already equal to dma_fifo_pc. There is no need to reset either counter, similar to how skb_fifo pc/cc are untouched. Remove the 'dma_fifo_cc = 0' reset. This fixes the following WARNING: WARNING: CPU: 0 PID: 0 at drivers/iommu/dma-iommu.c:1240 iommu_dma_unmap_page+0x79/0x90 Modules linked in: mlx5_vdpa vringh vdpa bonding mlx5_ib mlx5_vfio_pci ipip mlx5_fwctl tunnel4 mlx5_core ib_ipoib geneve ip6_gre ip_gre gre nf_tables ip6_tunnel rdma_ucm ib_uverbs ib_umad vfio_pci vfio_pci_core act_mirred act_skbedit act_vlan vhost_net vhost tap ip6table_mangle ip6table_nat ip6table_filter ip6_tables iptable_mangle cls_matchall nfnetlink_cttimeout act_gact cls_flower sch_ingress vhost_iotlb iptable_raw tunnel6 vfio_iommu_type1 vfio openvswitch nsh rpcsec_gss_krb5 auth_rpcgss oid_registry xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink iptable_nat nf_nat xt_addrtype br_netfilter overlay zram zsmalloc rpcrdma ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_cm ib_core fuse [last unloaded: nf_tables] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.13.0-rc5_for_upstream_min_debug_2024_12_30_21_33 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:iommu_dma_unmap_page+0x79/0x90 Code: 2b 4d 3b 21 72 26 4d 3b 61 08 73 20 49 89 d8 44 89 f9 5b 4c 89 f2 4c 89 e6 48 89 ef 5d 41 5c 41 5d 41 5e 41 5f e9 c7 ae 9e ff <0f> 0b 5b 5d 41 5c 41 5d 41 5e 41 5f c3 66 2e 0f 1f 84 00 00 00 00 Call Trace: <IRQ> ? __warn+0x7d/0x110 ? iommu_dma_unmap_page+0x79/0x90 ? report_bug+0x16d/0x180 ? handle_bug+0x4f/0x90 ? exc_invalid_op+0x14/0x70 ? asm_exc_invalid_op+0x16/0x20 ? iommu_dma_unmap_page+0x79/0x90 ? iommu_dma_unmap_page+0x2e/0x90 dma_unmap_page_attrs+0x10d/0x1b0 mlx5e_tx_wi_dma_unmap+0xbe/0x120 [mlx5_core] mlx5e_poll_tx_cq+0x16d/0x690 [mlx5_core] mlx5e_napi_poll+0x8b/0xac0 [mlx5_core] __napi_poll+0x24/0x190 net_rx_action+0x32a/0x3b0 ? mlx5_eq_comp_int+0x7e/0x270 [mlx5_core] ? notifier_call_chain+0x35/0xa0 handle_softirqs+0xc9/0x270 irq_exit_rcu+0x71/0xd0 common_interrupt+0x7f/0xa0 </IRQ> <TASK> asm_common_interrupt+0x22/0x40 Fixes: db75373c91b0 ("net/mlx5e: Recover Send Queue (SQ) from error state") Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260305142634.1813208-4-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 daysnet/mlx5: Fix peer miss rules host disabled checksCarolina Jubran
The check on mlx5_esw_host_functions_enabled(esw->dev) for adding VF peer miss rules is incorrect. These rules match traffic from peer's VFs, so the local device's host function status is irrelevant. Remove this check to ensure peer VF traffic is properly handled regardless of local host configuration. Also fix the PF peer miss rule deletion to be symmetric with the add path, so only attempt to delete the rule if it was actually created. Fixes: 520369ef43a8 ("net/mlx5: Support disabling host PFs") Signed-off-by: Carolina Jubran <cjubran@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260305142634.1813208-3-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 daysnet/mlx5: Fix crash when moving to switchdev modePatrisious Haddad
When moving to switchdev mode when the device doesn't support IPsec, we try to clean up the IPsec resources anyway which causes the crash below, fix that by correctly checking for IPsec support before trying to clean up its resources. [27642.515799] WARNING: arch/x86/mm/fault.c:1276 at do_user_addr_fault+0x18a/0x680, CPU#4: devlink/6490 [27642.517159] Modules linked in: xt_conntrack xt_MASQUERADE ip6table_nat ip6table_filter ip6_tables iptable_nat nf_nat xt_addrtype rpcsec_gss_krb5 auth_rpcgss oid_registry overlay mlx5_fwctl nfnetlink zram zsmalloc mlx5_ib fuse rpcrdma rdma_ucm ib_uverbs ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm mlx5_core ib_core [27642.521358] CPU: 4 UID: 0 PID: 6490 Comm: devlink Not tainted 6.19.0-rc5_for_upstream_min_debug_2026_01_14_16_47 #1 NONE [27642.522923] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 [27642.524528] RIP: 0010:do_user_addr_fault+0x18a/0x680 [27642.525362] Code: ff 0f 84 75 03 00 00 48 89 ee 4c 89 e7 e8 5e b9 22 00 49 89 c0 48 85 c0 0f 84 a8 02 00 00 f7 c3 60 80 00 00 74 22 31 c9 eb ae <0f> 0b 48 83 c4 10 48 89 ea 48 89 de 4c 89 f7 5b 5d 41 5c 41 5d 41 [27642.528166] RSP: 0018:ffff88810770f6b8 EFLAGS: 00010046 [27642.529038] RAX: 0000000000000000 RBX: 0000000000000002 RCX: ffff88810b980f00 [27642.530158] RDX: 00000000000000a0 RSI: 0000000000000002 RDI: ffff88810770f728 [27642.531270] RBP: 00000000000000a0 R08: 0000000000000000 R09: 0000000000000000 [27642.532383] R10: 0000000000000000 R11: 0000000000000000 R12: ffff888103f3c4c0 [27642.533499] R13: 0000000000000000 R14: ffff88810770f728 R15: 0000000000000000 [27642.534614] FS: 00007f197c741740(0000) GS:ffff88856a94c000(0000) knlGS:0000000000000000 [27642.535915] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [27642.536858] CR2: 00000000000000a0 CR3: 000000011334c003 CR4: 0000000000172eb0 [27642.537982] Call Trace: [27642.538466] <TASK> [27642.538907] exc_page_fault+0x76/0x140 [27642.539583] asm_exc_page_fault+0x22/0x30 [27642.540282] RIP: 0010:_raw_spin_lock_irqsave+0x10/0x30 [27642.541134] Code: 07 85 c0 75 11 ba ff 00 00 00 f0 0f b1 17 75 06 b8 01 00 00 00 c3 31 c0 c3 90 0f 1f 44 00 00 53 9c 5b fa 31 c0 ba 01 00 00 00 <f0> 0f b1 17 75 05 48 89 d8 5b c3 89 c6 e8 7e 02 00 00 48 89 d8 5b [27642.543936] RSP: 0018:ffff88810770f7d8 EFLAGS: 00010046 [27642.544803] RAX: 0000000000000000 RBX: 0000000000000202 RCX: ffff888113ad96d8 [27642.545916] RDX: 0000000000000001 RSI: ffff88810770f818 RDI: 00000000000000a0 [27642.547027] RBP: 0000000000000098 R08: 0000000000000400 R09: ffff88810b980f00 [27642.548140] R10: 0000000000000001 R11: ffff888101845a80 R12: 00000000000000a8 [27642.549263] R13: ffffffffa02a9060 R14: 00000000000000a0 R15: ffff8881130d8a40 [27642.550379] complete_all+0x20/0x90 [27642.551010] mlx5e_ipsec_disable_events+0xb6/0xf0 [mlx5_core] [27642.552022] mlx5e_nic_disable+0x12d/0x220 [mlx5_core] [27642.552929] mlx5e_detach_netdev+0x66/0xf0 [mlx5_core] [27642.553822] mlx5e_netdev_change_profile+0x5b/0x120 [mlx5_core] [27642.554821] mlx5e_vport_rep_load+0x419/0x590 [mlx5_core] [27642.555757] ? xa_load+0x53/0x90 [27642.556361] __esw_offloads_load_rep+0x54/0x70 [mlx5_core] [27642.557328] mlx5_esw_offloads_rep_load+0x45/0xd0 [mlx5_core] [27642.558320] esw_offloads_enable+0xb4b/0xc90 [mlx5_core] [27642.559247] mlx5_eswitch_enable_locked+0x34e/0x4f0 [mlx5_core] [27642.560257] ? mlx5_rescan_drivers_locked+0x222/0x2d0 [mlx5_core] [27642.561284] mlx5_devlink_eswitch_mode_set+0x5ac/0x9c0 [mlx5_core] [27642.562334] ? devlink_rate_set_ops_supported+0x21/0x3a0 [27642.563220] devlink_nl_eswitch_set_doit+0x67/0xe0 [27642.564026] genl_family_rcv_msg_doit+0xe0/0x130 [27642.564816] genl_rcv_msg+0x183/0x290 [27642.565466] ? __devlink_nl_pre_doit.isra.0+0x160/0x160 [27642.566329] ? devlink_nl_eswitch_get_doit+0x290/0x290 [27642.567181] ? devlink_nl_pre_doit_parent_dev_optional+0x20/0x20 [27642.568147] ? genl_family_rcv_msg_dumpit+0xf0/0xf0 [27642.568966] netlink_rcv_skb+0x4b/0xf0 [27642.569629] genl_rcv+0x24/0x40 [27642.570215] netlink_unicast+0x255/0x380 [27642.570901] ? __alloc_skb+0xfa/0x1e0 [27642.571560] netlink_sendmsg+0x1f3/0x420 [27642.572249] __sock_sendmsg+0x38/0x60 [27642.572911] __sys_sendto+0x119/0x180 [27642.573561] ? __sys_recvmsg+0x5c/0xb0 [27642.574227] __x64_sys_sendto+0x20/0x30 [27642.574904] do_syscall_64+0x55/0xc10 [27642.575554] entry_SYSCALL_64_after_hwframe+0x4b/0x53 [27642.576391] RIP: 0033:0x7f197c85e807 [27642.577050] Code: c7 c0 ff ff ff ff eb be 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 80 3d 45 08 0d 00 00 41 89 ca 74 10 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 69 c3 55 48 89 e5 53 48 83 ec 38 44 89 4d d0 [27642.579846] RSP: 002b:00007ffebd4e2248 EFLAGS: 00000202 ORIG_RAX: 000000000000002c [27642.581082] RAX: ffffffffffffffda RBX: 000055cfcd9cd2a0 RCX: 00007f197c85e807 [27642.582200] RDX: 0000000000000038 RSI: 000055cfcd9cd490 RDI: 0000000000000003 [27642.583320] RBP: 00007ffebd4e2290 R08: 00007f197c942200 R09: 000000000000000c [27642.584437] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000 [27642.585555] R13: 000055cfcd9cd490 R14: 00007ffebd4e45d1 R15: 000055cfcd9cd2a0 [27642.586671] </TASK> [27642.587121] ---[ end trace 0000000000000000 ]--- [27642.587910] BUG: kernel NULL pointer dereference, address: 00000000000000a0 Fixes: 664f76be38a1 ("net/mlx5: Fix IPsec cleanup over MPV device") Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260305142634.1813208-2-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 daysnet/mlx5: Fix deadlock between devlink lock and esw->wqCosmin Ratiu
esw->work_queue executes esw_functions_changed_event_handler -> esw_vfs_changed_event_handler and acquires the devlink lock. .eswitch_mode_set (acquires devlink lock in devlink_nl_pre_doit) -> mlx5_devlink_eswitch_mode_set -> mlx5_eswitch_disable_locked -> mlx5_eswitch_event_handler_unregister -> flush_workqueue deadlocks when esw_vfs_changed_event_handler executes. Fix that by no longer flushing the work to avoid the deadlock, and using a generation counter to keep track of work relevance. This avoids an old handler manipulating an esw that has undergone one or more mode changes: - the counter is incremented in mlx5_eswitch_event_handler_unregister. - the counter is read and passed to the ephemeral mlx5_host_work struct. - the work handler takes the devlink lock and bails out if the current generation is different than the one it was scheduled to operate on. - mlx5_eswitch_cleanup does the final draining before destroying the wq. No longer flushing the workqueue has the side effect of maybe no longer cancelling pending vport_change_handler work items, but that's ok since those are disabled elsewhere: - mlx5_eswitch_disable_locked disables the vport eq notifier. - mlx5_esw_vport_disable disarms the HW EQ notification and marks vport->enabled under state_lock to false to prevent pending vport handler from doing anything. - mlx5_eswitch_cleanup destroys the workqueue and makes sure all events are disabled/finished. Fixes: f1bc646c9a06 ("net/mlx5: Use devl_ API in mlx5_esw_offloads_devlink_port_register") Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260305081019.1811100-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 daysqmi_wwan: allow max_mtu above hard_mtu to control rx_urb_sizeLaurent Vivier
Commit c7159e960f14 ("usbnet: limit max_mtu based on device's hard_mtu") capped net->max_mtu to the device's hard_mtu in usbnet_probe(). While this correctly prevents oversized packets on standard USB network devices, it breaks the qmi_wwan driver. qmi_wwan relies on userspace (e.g. ModemManager) setting a large MTU on the wwan0 interface to configure rx_urb_size via usbnet_change_mtu(). QMI modems negotiate USB transfer sizes of 16,383 or 32,767 bytes, and the USB receive buffers must be sized accordingly. With max_mtu capped to hard_mtu (~1500 bytes), userspace can no longer raise the MTU, the receive buffers remain small, and download speeds drop from >300 Mbps to ~0.8 Mbps. Introduce a FLAG_NOMAXMTU driver flag that allows individual usbnet drivers to opt out of the max_mtu cap. Set this flag in qmi_wwan's driver_info structures to restore the previous behavior for QMI devices, while keeping the safety fix in place for all other usbnet drivers. Fixes: c7159e960f14 ("usbnet: limit max_mtu based on device's hard_mtu") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/lkml/CAPh3n803k8JcBPV5qEzUB-oKzWkAs-D5CU7z=Vd_nLRCr5ZqQg@mail.gmail.com/ Reported-by: Koen Vandeputte <koen.vandeputte@citymesh.com> Tested-by: Daniele Palmas <dnlplm@gmail.com> Signed-off-by: Laurent Vivier <lvivier@redhat.com> Link: https://patch.msgid.link/20260304134338.1785002-1-lvivier@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 daysbonding: handle BOND_LINK_FAIL, BOND_LINK_BACK as valid link statesHangbin Liu
Before the fixed commit, we check slave->new_link during commit state, which values are only BOND_LINK_{NOCHANGE, UP, DOWN}. After the commit, we start using slave->link_new_state, which state also could be BOND_LINK_{FAIL, BACK}. For example, when we set updelay/downdelay, after a failover, the slave->link_new_state could be set to BOND_LINK_{FAIL, BACK} in bond_miimon_inspect(). And later in bond_miimon_commit(), it will treat it as invalid and print an error, which would cause confusion for users. [ 106.440254] bond0: (slave veth2): link status down for interface, disabling it in 200 ms [ 106.440265] bond0: (slave veth2): invalid new link 1 on slave [ 106.648276] bond0: (slave veth2): link status definitely down, disabling slave [ 107.480271] bond0: (slave veth2): link status up, enabling it in 200 ms [ 107.480288] bond0: (slave veth2): invalid new link 3 on slave [ 107.688302] bond0: (slave veth2): link status definitely up, 10000 Mbps full duplex Let's handle BOND_LINK_{FAIL, BACK} as valid link states. Fixes: 1899bb325149 ("bonding: fix state transition issue in link monitoring") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://patch.msgid.link/20260304-b4-bond_updelay-v1-2-f72eb2e454d0@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 daysbonding: do not set usable_slaves for broadcast modeHangbin Liu
After commit e0caeb24f538 ("net: bonding: update the slave array for broadcast mode"), broadcast mode will also set all_slaves and usable_slaves during bond_enslave(). But if we also set updelay, during enslave, the slave init state will be BOND_LINK_BACK. And later bond_update_slave_arr() will alloc usable_slaves but add nothing. This will cause bond_miimon_inspect() to have ignore_updelay always true. So the updelay will be always ignored. e.g. [ 6.498368] bond0: (slave veth2): link status definitely down, disabling slave [ 7.536371] bond0: (slave veth2): link status up, enabling it in 0 ms [ 7.536402] bond0: (slave veth2): link status definitely up, 10000 Mbps full duplex To fix it, we can either always call bond_update_slave_arr() on every place when link changes. Or, let's just not set usable_slaves for broadcast mode. Fixes: e0caeb24f538 ("net: bonding: update the slave array for broadcast mode") Reported-by: Liang Li <liali@redhat.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://patch.msgid.link/20260304-b4-bond_updelay-v1-1-f72eb2e454d0@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
9 daysnet: mctp: fix device leak on probe failureJohan Hovold
Driver core holds a reference to the USB interface and its parent USB device while the interface is bound to a driver and there is no need to take additional references unless the structures are needed after disconnect. This driver takes a reference to the USB device during probe but does not to release it on probe failures. Drop the redundant device reference to fix the leak, reduce cargo culting, make it easier to spot drivers where an extra reference is needed, and reduce the risk of further memory leaks. Fixes: 0791c0327a6e ("net: mctp: Add MCTP USB transport driver") Cc: stable@vger.kernel.org # 6.15 Signed-off-by: Johan Hovold <johan@kernel.org> Acked-by: Jeremy Kerr <jk@codeconstruct.com.au> Link: https://patch.msgid.link/20260305104549.16110-1-johan@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>